Copy link

Vardaan Ahluwalia, Aman Gupta and Sumedha Kalra, share their experience of onboarding a custom-built AI system

Imagine you are negotiating a complex JV proposal back in the 1980s. The deal has a lot of contentious points and, naturally, involves significant back and forth on the commercials and then drafting to capture the commercial intent.

Any iteration of the JV agreement would necessarily involve multiple hand markups and reels of facsimile within your organisation and with your counsel before you can send it across by fax (or by hand?) to your counterparty. It would be a time-consuming and inherently unscalable affair. It makes one wonder how one did anything urgent back then.

Fast forward to the late 1990s, with emails in the picture, reducing transaction costs and time significantly. Then, during covid-19, we saw how people can work and transact without having physical meetings, with video calls, bringing further savings on costs and time spent on deals.

Safe to say, over the years technological advances have increased productivity and changed behaviour. Now it is customary to request Zoom calls instead of travelling across time zones to have exploratory discussions.

This article delves into the potential AI has as a strategic business partner, offering insights into enhanced decision-making and efficiency gains. Drawing from on-ground experiences with developing an internal LLM (large language model) for daily data extraction and contract analysis tasks, the authors highlight the potential bottlenecks in AI implementation. AI’s full potential as a trusted business ally requires a nuanced understanding of these challenges, and business perseverance (in terms of capital and focus) to overcome them.

AI and the practice of law

AI holds the promise of increasing accuracy and reducing man hours involved in legal transaction and advisory practice. It is only a matter of time before AI becomes embedded in various aspects of the legal services industry, not only helping in legal operations, but also improving business outcomes, as the use of AI expands to critical aspects such as contract generation, contract analysis, legal research and litigation prediction.

That said, the authors are cognisant that the effectiveness of any AI-enabled system is subject to various operational and substantive risk factors. AI is not a magic wand, it is just like any other manufacturing exercise where the quality of the input determines the quality of the output.

Let’s consider specifically the authors’ experience in developing Druid.AI, an LLM for extraction of data, inference and cognitive analysis. The team noticed that while the structuring of responses and binary outcome is easily obtained, the nuanced cognitive analysis becomes challenging.

In this domain, solutions include enhancing the model on specific tasks related to cognitive analysis, incorporating diverse training data that spans a wide spectrum of language intricacies, and embracing human-in-the-loop approaches to harness both artificial intelligence and human expertise. Early in the development journey, the team realised that for the deployment of an LLM to have the desired business outcome, one needs to be cognisant of the following points.

Presence of a comprehensive data set. If there is a lack of a comprehensive and diverse data set, it will make it tougher to implement any reliable e-machine learning project. To counter this, the team implemented innovative methods such as augmenting existing data with various transformations and stacking, and applying legal domain knowledge (such as building a synonyms glossary, etc.) to enhance the LLM’s correlational capability. These approaches serve as a foundation for overcoming the scarcity of data and fostering a more robust training environment. That said, in the case of closed data-set-trained LLMs, the designers may also consider exposing the LLM to synthetic data sets to help augment the accuracy of the predictive models.

Data needs to be appropriately structured, stacked or configured. Sufficiency of data, however, is not enough and it is equally important to index, stack and configure such high-quality data for the task required.

LLM needs to be rigorously tested to cut out hallucinations and inherent biases. After the configuration of the data, we need to train the LLM by testing it. Then, we work to cut out hallucinations and inherent biases of the LLM – which can be a time-consuming exercise. Lastly, rigorous testing (at least 200 hours per contract type, and at least 100 prompts), regular maintenance and updates all form a part of the process to keeping the LLM relevant.

Prompt engineering is an art in itself. Prompt engineering emerges as another critical concern, where the careful design of prompts becomes imperative to prevent biases and guide the model towards desired outcomes. For LLMs meant to analyse legal contracts, the impact of different drafting styles, industry-specific nomenclature and their synonyms, defined terms and cross-referenced concepts, all add layers of complexity, and need to be resolved to generate commercially valuable inputs.

Limitations of traditional token embeddings in legal analysis. Legal analysis requires reliable accuracy; an incomplete interpretation can result in significant business misjudgements and undesirable outcomes. The team understood a response of Druid to be correct only when it was entirely accurate and complete in the view of the legal team. Early on the team realised that despite rigorous testing and data set, often our responses missed legal nuances and were deemed not reliable. Accordingly, we decided that we need to consider ways to augment and enable the existing technology to deliver desirable accuracy.

Given the way things stand today, we believe

Augmentation of AI model

Now, this may get technical, but to explain the technicalities, our LLM failed to provide us with accurate responses and struggled with contract clause structures. Our initial tests displayed an accuracy of 50% at best, and completely incorrect at worst. So, our data science team enhanced the retrieval-augmented generation (RAG) technology with several key modifications.

First, we incorporated techniques to divide the document basis in its table of contents to make the context easier for the AI model to read (also known as custom chunking). Second, we refined our prompts and the LLM’s ability to understand it by training the LLM on a list of legal jargon and synonyms. Third, we adjusted algorithms to search for the most probable right answer and used keyword matching techniques to improve accuracy.

Additionally, we introduced context-driven decomposition techniques to breakdown complex queries effectively.

This evolution in approach required the legal team to flesh out the typical documentation architecture, build an exhaustive glossary of terms and legal jargon, explain the likely cross-references to extract the complete context, and formulate a detailed list of test questions, keeping in mind the various permutations of a provision etc.

With these approaches finessed, the team saw a significant increase in the accuracy of the responses. We went through four stages of upgrades, with each stage involving rigorous testing. This evolution saw us move from using langchain and GPT 3 to Llama Index, and Pinecone to custom chunking and GPT 3.5 Turbo, with eventual adoption of Druid’s context-aware RAG architecture and GPT 4.

With this, the team refined our product into a robust Gen AI-driven legal contract analysis tool. Druid.AI’s achievement of 85% accuracy involved collaboration between five senior data scientists and four lawyers over eight months, underscoring deep collaboration across functions. Eventually, we plan to release Druid on Hugging Face, a collaborative AI/ML development platform, to facilitate further learning and development.


In Richard Susskind’s book The Future of Law, published in 1996, he predicted that in the future lawyers and clients would communicate via email. This prediction was considered blasphemous at the time, especially by senior lawyers in the UK.

The authors anticipate that in the not so distant future, LLMs will move from data extraction to conducting due diligence, preparing first drafts of transaction document basis term sheets, providing negotiation pointers when a mark-up is received, and generating reports for review with an accuracy level that may even be acceptable to regulators.

With these possibilities on the horizon, the desire to disrupt the conservative and precedent-driven legal industry is understandable. Although the opportunity is there, product designers and startups will need to appreciate that unless AI models are able to promise high accuracy levels, it will be difficult to develop a commercially feasible clientele. Given the way things stand today, we believe augmentation of existing legal services providers with AI is an easier and perhaps even more appropriate sell, than looking to replace traditional law firms with AI. Take US-based Atrium for instance, which raised USD75 million to build a software-assisted legal firm, and despite having a top-notch founding team and capital, failed to find the right market fit.

The authors are convinced that AI is here to stay, and the sooner it is embraced in the legal services industry, the more benefits the industry shall reap. But we should also acknowledge that the industry is still some distance away from technology significantly replacing legal practice at scale.

VARDAAN AHLUWALIA is the head of legal, AMAN GUPTA is a senior data scientist and SUMEDHA KALRA is a secondee at Premji Invest. The author acknowledges TK Kurien, CIO at Premji Invest for his guidance on this topic. The article has also benefited from discussions with Rishab Bardia, investment professional and Ramesh Nagrajan, head of data science at Premji Invest along with Rashmi Menon, an advocate at Menon Associates.

Copy link