Can I reduce my rent? Finding arguments in court decisions with Question Answering

00 Blog Kirsch Warum bist du schuldig 1 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)
© Siberian Art/stock.adobe.com & ML2R

An essential part of a lawyer’s work in preparing a lawsuit is comparing similar, already litigated cases. Judges also compare cases to ensure that the jurisprudence remains consistent. This comparison serves as a guideline for appropriate claims and aids in preparing arguments in court. In certain specialized areas of law, there are already reference books that categorize historical rulings and derive jurisprudence patterns, such as for calculating compensation for pain and suffering. These books simplify the search for rulings, helping to determine appropriate compensation amounts and providing supporting arguments for these assessments.

However, such reference works are not available for most areas of law due to the enormous effort required to create them. Instead, lawyers must manually search legal databases for rulings, sift through them based on key facts, and derive a basis for evaluation. This time-consuming process can be accelerated using Machine Learning methods.

The analysis problem: Classifying documents and tagging sequences with Machine Learning

With over 1.4 million rulings, Wolters Kluwer, a specialist information and software company, is one of the most important legal database providers in Germany. Since 2021, we have been working with Wolters Kluwer on methods for analyzing judgment documents, extracting key facts, and particularly finding arguments that justify the ruling decision. Initially, we are focusing on rulings in rental law dealing with rent reductions due to defects in the rental property. The relevant core information includes:

  1. Key facts that allow finding similar cases, such as the specific defect in the rental property (e.g., mold, noise, or a defective infrastructure). Knowing this, lawyers can later quickly find similar cases based on these key facts.
  2. The court’s decision: Did the court grant the rent reduction, and if so, what was the reduction amount? Did the tenant act as the plaintiff or the defendant in the case?
  3. The reasons that led the court to grant or deny the rent reduction. When these reasons are clearly extracted, lawyers can review rulings more quickly and gather arguments for the current case.

While the information described in (1) and (2) can be identified through document or word classification, as described in the following blog post, extracting arguments presents a unique challenge.

Argument mining with Question Answering models

The extraction of arguments mentioned in (3) is summarized under the term “argumentation mining” (or less commonly, “argument mining”).

Extracting arguments is challenging for several reasons: Arguments can span one or multiple sentences, making sentence-level classification often ineffective. Additionally, many rulings, along with the reasons provided for decisions, relate to multiple issues. However, our users are only interested in arguments relevant to the subject at hand—in this case, rent reduction. Therefore, it is crucial to consider the context of each ruling, which derives from the specific defect in the rental property.

To address these challenges, we use a Question Answering approach. This model learns to find the correct answers to any question within a document based on question-answer pairs. Specifically, it is a classification task where the learning objective is to classify two words in the document as the start and end of an answer.

During training, this model is trained using judgment texts, posed questions, and start/end words of correct answers as inputs.

We can build on a language model that has already been fine-tuned on generic question-answer pairs, which is capable of recognizing answer structures in texts. To reliably answer questions in our legal context, we continue to train the model on our data.

Asking the right questions: Extracting arguments with question templates

For each judgment, we create a question template filled ad hoc with the information extracted in (1) and (2):

Template: “What are arguments for or against a rent reduction by the tenant as [role] due to [rental defect]?”

Specific example: “What are arguments for or against a rent reduction by the tenant as the defendant due to balcony staining?”

At the application stage, as shown in Figure 1, the process proceeds as follows: First, information about the rental defect and the tenant’s role is automatically extracted from the ruling. In the second step, the question template is filled with the extracted information. The question, along with the ruling text, is used as input for the Question Answering model, which then returns the start and end words of all answers matching the question.

Abbildung Question Answering Modell en - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)
© Lamarr-Institut

Practical challenges – Training data generation and evaluation

The project presents difficulties not only from a Machine Learning perspective but also for implementation due to the following reasons outlined:

  • To train the models, labeled training data is needed—specifically, judgment texts annotated with the information described in (1), (2), and (3). Using these examples, the model learns to reproduce human decisions as closely as possible and generalize to unseen rulings. Together, we develop an annotation schema. Experts from Wolters Kluwer prepare and evaluate the data accordingly. We use the AnEx annotation software, developed by Fraunhofer IAIS, for annotation. This annotation work covers much of the project duration and is a crucial step for project success. The final model’s performance hinges on the quality of the manually generated judgment examples.
  • Qualitative evaluation of extraction results: For some of the extracted information, we can measure model accuracy using metrics such as precision, recall, and F1 score on an annotated test set, e.g., when predicting the case outcome. Other information, such as arguments, is harder to evaluate. Often, a model selects different passages for arguments than the expert annotating the text. These passages are semantically similar to the manually annotated ones, and both, though different, can be considered correct. Thus, metrics in such cases are not informative.

For this reason, our extraction models are directly evaluated by domain experts at Wolters Kluwer. We provide them through a web service, where rulings can be uploaded, analyzed, and the results displayed. Using the Streamlit framework, we create functional web applications with minimal effort.

To deliver the application to Wolters Kluwer, it is packaged in a Docker image. Docker enables the rapid creation of standalone applications, with an image containing the entire application ready to be run independently. This service can then be easily integrated into Wolters Kluwer’s infrastructure and tested.

Summary and conclusion

A large portion of lawyers’ working time goes into case research, often involving painstaking searches and information preparation. The use of Machine Learning methods can reduce this effort.

The task can be translated into various Machine Learning problems, including document and word classification as well as Question Answering. In all areas, we benefit from pre-trained language models that can recognize semantic similarities. Question Answering models also allow us to incorporate the specific context of a ruling to find the correct information.

Currently, our analysis is limited to selected ruling types in rental law. Our long-term goal is to develop a cross-disciplinary solution while keeping the effort required to generate training data low.

This project, implemented in partnership, represents a first step towards automated judgment summarization and preparation.

Birgit Kirsch

Birgit Kirsch is a research associate at the Lamarr site of the Fraunhofer Institute IAIS in Sankt Augustin. Her current research focuses on the application of statistical relational learning methods to problems in the field of natural language processing.

More blog posts