---
language:
- en
multilinguality:
- monolingual
task_categories:
- text-retrieval
source_datasets:
- https://github.com/lauramanor/legal_summarization
task_ids:
- document-retrieval
config_names:
- corpus
tags:
- text-retrieval
dataset_info:
  - config_name: default
    features:
      - name: query-id
        dtype: string
      - name: corpus-id
        dtype: string
      - name: score
        dtype: float64
    splits:
      - name: test
        num_examples: 439
  - config_name: corpus
    features:
      - name: _id
        dtype: string
      - name: title
        dtype: string
      - name: text
        dtype: string
    splits:
      - name: corpus
        num_examples: 438
  - config_name: queries
    features:
      - name: _id
        dtype: string
      - name: text
        dtype: string
    splits:
      - name: queries
        num_examples: 284
configs:
  - config_name: default
    data_files:
      - split: test
        path: qrels/test.jsonl
  - config_name: corpus
    data_files:
      - split: corpus
        path: corpus.jsonl
  - config_name: queries
    data_files:
      - split: queries
        path: queries.jsonl
---

**Legal_summarization**

- Original link: https://github.com/lauramanor/legal_summarization
- The dataset consistes of 439 pairs of contracts and their summarizations from [https://tldrlegal.com](https://tldrlegal.com/) and https://tosdr.org/.
- The query set consists of contract summaries. There are 284 queries.
- The corpus set comprises the contracts. There are 438 contracts in the corpus.

**Usage**
```
import datasets

# Download the dataset
queries = datasets.load_dataset("mteb/legal_summarization", "queries")
documents = datasets.load_dataset("mteb/legal_summarization", "corpus")
pair_labels = datasets.load_dataset("mteb/legal_summarization", "default")
```