Legal Tech’s Future: How AI, Data Quality, and HITL Lead the Way

Legal Tech’s Future: How AI, Data Quality, and HITL Lead the Way

Doug Gapinski

October 18, 2024

Generative AI has quickly become a crowded and highly competitive field within legal tech. Beyond changing law services and legal practice in a number of ways, firms and startups alike are rushing to adopt and innovate with this technology. As they do, many are vying for a leading position by laying claim to faster, more accurate, and more niche AI solutions. The influx of new players and the pace of technological advancements mean that simply having AI capabilities no longer helps firms and products stand out. 

The competitive advantage lies with companies investing in custom AI models, adopting Retrieval-Augmented Generation (RAG), and ensuring that humans stay involved in the process. The quality of data, seamless integration into existing workflows, transparency, and the ability to adapt quickly to new laws will determine which companies truly thrive. This article examines why these factors hone the leading edge in legal tech.

 

Proprietary Parameters

In AI, parameters are internal variables (billions of them!) that a model adjusts during training to learn how to transform input data into accurate outputs. There are differences between open and proprietary parameters. Open parameters, used in models like GPT, are available for public use and can be adapted by anyone. In contrast, proprietary parameters developed and fine-tuned by a company for specific use cases provide a competitive edge through specialization and enhanced control over data privacy.

Westlaw, LexisNexis, and Bloomberg Law each use their own proprietary parameters and models. Each platform uses unique methodologies and systems to cater specifically to the legal industry. These models understand the nuances of legal language, interpret precedents, and navigate regulatory frameworks more accurately than general-purpose AI. Custom models are built on a different and potentially higher-quality volume of data — and provide domain-specific-insights, allowing companies to offer more precise outcomes. 

Proprietary models may also offer greater control over data privacy — essential in maintaining client trust and meeting regulatory compliance. Because these models adapt to specific jurisdictions and geographies, they are particularly valuable for firms working across different regions or handling complex, multi-jurisdictional cases.

→ Takeaway: Leveraging proprietary parameters provides not only more tailored results, but increased data privacy, meeting the complex nuances and compliance regulations of the legal industry.

 

Retrieval-Augmented Generation (RAG)

RAG takes AI to the next level by blending generative capabilities with more accurate knowledge retrieval. In the legal world, this means that AI can pull information from external sources — such as case law databases, internal records, or statutes — as preferred sources for generating responses to specific queries. 

Harvey.AI is using RAG in several ways. Through their partnership with Voyage AI, Harvey developed custom embeddings specifically tailored to understand legal texts, starting from a base model ("voyage-law-2") and training it on over 20 billion tokens of US case law. This allows Harvey to retrieve relevant legal information based on deeper semantic understanding, rather than simple keyword matching. The custom model reduces irrelevant search results significantly and optimizes for efficiency, improving both retrieval accuracy and speed. 

→ Takeaway: By blending advanced retrieval methods with generation, companies using RAG are poised to deliver more accurate, trustworthy insights.. Better outputs lead to better decision-making and time savings, providing an edge over competitors.

 

Human-in-the-Loop (HITL) Processes

AI in legal tech can’t stand alone — human expertise remains essential to ensure accuracy and quality. By involving humans in the review process, firms verify the AI’s outputs, which is particularly crucial when drafting contracts or legal opinions that require precision. Human-in-the-loop (HITL) processes also provide valuable feedback that helps refine and improve AI models over time. HITL processes ensure AI continues to learn and gets better at understanding complex legal tasks. Moreover, having human oversight means that ethical considerations are addressed, helping the AI remain fair and unbiased when dealing with sensitive cases.

Though not specific to legal tech, CloudFactory employs various models to deploy large-scale HITL processes for quality control and model refinement. This includes using human reviewers to validate and correct outputs from AI models for agriculture and sports. In these cases, human reviewers assess challenging edge cases and provide feedback that helps retrain and improve the AI model over time, ensuring higher accuracy and adaptability to complex scenarios​.

→ Takeaway: Building HITL into processes that leverage AI ensures outputs remain fair and unbiased.

 

Data Quality and Labeling

The quality of AI’s outputs is only as good as the data it’s trained on. Legal tech companies that focus on curating high-quality, well-labeled datasets — like annotated case law or contract clauses — develop models that better understand the intricacies of legal language. This focus on data quality means that AI is more likely to produce accurate and reliable results, which is essential for maintaining trust with attorneys or their clients. Additionally, using diverse data sources from multiple jurisdictions helps models become more versatile, allowing firms to address a broader range of legal scenarios and challenges.

Thomson Reuters’ CoCounsel enables users to upload briefs or motions, which the AI then analyzes to find relevant precedents. This process benefits significantly from the precise labeling of legal data behind-the-scenes. 

→ Takeaway: Emphasis on data quality and labeling makes the AI more effective in understanding complex legal queries and producing useful insights for attorneys.

 

Emphasis on Explainability and Transparency 

In the legal field, trust in AI depends on transparency. Legal professionals need to understand how an AI arrived at its conclusions before they rely on its insights for advising clients or making case arguments. This makes explainability a critical factor in AI adoption. Models that clearly articulate their reasoning are more likely to gain the confidence of legal practitioners. Furthermore, transparency is essential for navigating regulatory requirements around AI, helping companies avoid legal issues and maintain their reputation for reliability.

vLex’s Vincent AI includes generative AI capabilities that link its outputs directly to relevant legal materials, such as case law or statutes. This approach allows users to verify the AI's recommendations and summaries directly against the original documents, ensuring that the conclusions reached by the AI are well-supported and accurate. This feature is designed to provide legal professionals with the assurance they need when using AI for legal research and drafting.

→ Takeaway: Building in transparency establishes trust and confidence in a highly detailed industry.

 

Iterating Based on New Legislation 

The legal landscape constantly evolves and cases like the federal class action lawsuit filed against Cigna Healthcare or Mata v. Avianca may change legislation around training data or explainability requirements for the law. The companies or firms that keep up with these changes are the ones that will stay ahead. 

→ Takeaway: Companies that adapt quickly to legislative shifts aren’t just staying compliant — they’re positioning themselves as leaders who can offer their clients up-to-date, cutting-edge insights.

 

Conclusion

In the crowded and rapidly evolving world of legal tech, the companies that will succeed are those that skillfully combine innovation with the specialized needs of the legal profession. Building the best proprietary models, using well-tailored RAG, and maintaining a human-in-the-loop approach can create a strong foundation for accuracy.

Focusing on high-quality data, ensuring transparency in AI outputs, and swiftly adapting to new legal frameworks will be essential to maintaining a competitive edge. By prioritizing these elements, companies can deliver AI solutions that not only enhance efficiency but also inspire confidence among legal professionals and their client base, setting the stage for long-term leadership.

Looking for a legal tech partner to develop a software solution with you? Learn more about our legal tech expertise and AI Software Solutions, or contact us for an exploratory conversation.

Doug Gapinski

Account Director

With over a decade of experience as a team lead and project manager, Doug Gapinski is a Seattle-based Account Director managing long-term product builds with volatile scope. He advocates for quality, transparency, and a shared understanding of project constraints while applying agile methodologies across decentralized teams.