VOOZH about

URL: https://huggingface.co/datasets/joelniklaus/lextreme

⇱ joelniklaus/lextreme · Datasets at Hugging Face


Dataset Viewer

Dataset Card for LEXTREME: A Multilingual Legal Benchmark for Natural Language Understanding

Dataset Summary

The dataset consists of 12 diverse multilingual legal NLU datasets. 6 datasets have one single configuration, 5 datasets have two or three configurations, and 1 dataset has three temporal epoch configurations. This leads to a total of 21 tasks (11 single-label text classification tasks, 5 multi-label text classification tasks and 5 token-classification tasks).

Use the dataset like this:

from datasets import load_dataset
dataset = load_dataset("joelito/lextreme", "swiss_judgment_prediction")

Supported Tasks and Leaderboards

The dataset supports the tasks of text classification and token classification. In detail, we support the folliwing tasks and configurations:

task task type configurations link
Brazilian Court Decisions Judgment Prediction (judgment, unanimity) joelito/brazilian_court_decisions
Swiss Judgment Prediction Judgment Prediction default joelito/swiss_judgment_prediction
German Argument Mining Argument Mining default joelito/german_argument_mining
Greek Legal Code Topic Classification (volume, chapter, subject) greek_legal_code
Online Terms of Service Unfairness Classification (unfairness level, clause topic) online_terms_of_service
Covid 19 Emergency Event Event Classification default covid19_emergency_event
MultiEURLEX Topic Classification (level 1, level 2, level 3) multi_eurlex
LeNER BR Named Entity Recognition default lener_br
LegalNERo Named Entity Recognition default legalnero
Greek Legal NER Named Entity Recognition default greek_legal_ner
MAPA Named Entity Recognition (coarse, fine) mapa
Ukrainian Court Decisions Judgment Prediction (pre_war, hybrid_war, full_scale) overthelex/ukrainian-court-decisions

Languages

The following languages are supported: bg, cs, da, de, el, en, es, et, fi, fr, ga, hr, hu, it, lt, lv, mt, nl, pl, pt, ro, sk, sl, sv, uk

Dataset Structure

Data Instances

The file format is jsonl and three data splits are present for each configuration (train, validation and test).

Data Fields

[More Information Needed]

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

How can I contribute a dataset to lextreme? Please follow the following steps:

  1. Make sure your dataset is available on the huggingface hub and has a train, validation and test split.
  2. Create a pull request to the lextreme repository by adding the following to the lextreme.py file:
    • Create a dict _{YOUR_DATASET_NAME} (similar to _BRAZILIAN_COURT_DECISIONS_JUDGMENT) containing all the necessary information about your dataset (task_type, input_col, label_col, etc.)
    • Add your dataset to the BUILDER_CONFIGS list: LextremeConfig(name="{your_dataset_name}", **_{YOUR_DATASET_NAME})
    • Test that it works correctly by loading your subset with load_dataset("lextreme", "{your_dataset_name}") and inspecting a few examples.

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

@misc{niklaus2023lextreme,
 title={LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain},
 author={Joel Niklaus and Veton Matoshi and Pooja Rani and Andrea Galassi and Matthias Stürmer and Ilias Chalkidis},
 year={2023},
 eprint={2301.13126},
 archivePrefix={arXiv},
 primaryClass={cs.CL}
}

Contributions

Thanks to @JoelNiklaus for adding this dataset.

Downloads last month
1,145

Models trained or fine-tuned on joelniklaus/lextreme

Paper for joelniklaus/lextreme