Oct 17, 2024 | AI
A Holistic Approach to Advancing Generative AI Solutions
A guest post from David Wong, chief product officer, Thomson Reuters
At Thomson Reuters, our vision is to deliver an AI assistant for every professional we serve. As part of that, our focus is on delivering benefits for our customers across the breadth of our AI- and non-AI-powered features. We know that our solutions deliver benefits to customers in many ways, including AI-powered automation.
In April of this year, we shared our vision to provide a GenAI assistant for each professional we serve. CoCounsel embodies our ongoing efforts to augment professionals’ work with GenAI skills, enabling professionals to accelerate and streamline entire workflows to increase efficiency, produce better work, and deliver more value for their clients. Our continued investment in GenAI is driven to enable professionals across industries to accelerate and streamline entire workflows through a single GenAI assistant.
We believe our investment in GenAI – along with our integration to customer data as well as third-party integrations – extends the value customers derive from CoCounsel beyond our connected experience and our verified and trusted content. Our work with Microsoft, for example, includes CoCounsel integrations across Word, Outlook and Teams – meeting professionals where they’re already working.
AI and large language models are proving to be powerful tools that deliver efficiency gains and strengthen research practices for our customers. Yet our efforts to redefine work with GenAI are rooted in our strong foundation of editorial enhancements, authoritative content and technological expertise, alongside our long history of working closely with customers. That’s why we continue to build out AI- and non-AI-powered solutions to help with the entire workflow for legal, tax, and risk and compliance professionals. While AI may not be perfect, it can significantly help professionals reduce the amount of work and manage more complex and substantive work more efficiently. We collaborate with our customers to help them understand that AI is an accelerant rather than a replacement for their own research.
Benchmarking expectations
As a leader in innovation and AI research, we recognize the role that independent benchmarking brings in ensuring the accuracy, transparency, and accountability of evolving GenAI solutions. We believe that benchmarking can improve both the development and the adoption of AI. We also see it as one component in a broad range of ways we consider and understand the benefits AI delivers for our customers. We work with our customers as their trusted partners for change, helping them to confidently understand and adopt new technologies, looking at both their immediate value and role in long-term transformation, and leveraging our deep understanding of their businesses.
At Thomson Reuters, our understanding of the holistic value of our products is based on customers’ usage and the benefits they derive. Our customers have run more than 2.5M searches through AI-Assisted Research on Westlaw Precision since its launch late last year, and they tell us it’s saving time and improving productivity. Similarly, internal testing of CoCounsel’s skills has yielded impressive results, particularly with regards to CoCounsel’s document review capabilities.
Our benchmarking support is reflected in our participation in studies including Vals.ai as well as two consortium efforts – from Stanford and Litig – exploring how to best evaluate legal AI. We are submitting CoCounsel AI skills to the Vals.ai benchmarking study in five areas of evaluation – Doc Q&A, Data Extraction, Document Summarization, Chronology Generation, and E-Discovery.
Vals.ai is a first attempt at establishing a standard, and so we should view this work as the first iteration and an opportunity to learn versus treating it like a gold standard. For example, one limitation of the benchmarking methodology is that each vendor’s results are evaluated based on the text output alone, removed from the interface and experiences of the individual products. This discounts the work each vendor has done to design interfaces and safety features to minimize the harms of errors. This reinforces the need for a holistic evaluation of each product being tested, ideally as designed for the user.
Looking ahead, my expectation is that, while accuracy will continue to improve, no products will produce answers entirely free of errors. And as we’ve shared with our customers, every AI product requires human expertise for verification and review – regardless of the accuracy rate. As the current approach to benchmarking rates an accuracy percentage – we need to be very clear on this point – whether the product produces a score in the low or high 90th percentile, all answers still must be checked 100% of the time.
I look forward to our ongoing collaboration with customers and industry partners as we continue our work towards minimizing inaccuracies and increasing the usefulness of the research outcomes for GenAI tools and all our solutions.