Question 1

What is the LEDGAR dataset?

Accepted Answer

LEDGAR is a multilabel corpus of legal provisions in contracts suited for text classification in the legal domain (legaltech). It features over 1.8M+ provisions and a set of 180K+ labels. A smaller, cleaned version of the corpus is also available.

Question 2

Is LEDGAR a benchmark?

Accepted Answer

LEDGAR is a dataset for training or evaluation; it isn't tracked as a standard LLM benchmark in our catalog.

Question 3

Where can I download LEDGAR?

Accepted Answer

LEDGAR is available at its source: https://drive.switch.ch/index.php/s/j9S0GRMAbGZKa1A.

LEDGAR

About LEDGAR

Details

Related Classification datasets

FAQ