allenai/MADLAD-400
Text GenerationEnglishodc-by
The allenai/MADLAD-400 dataset is a English text generation resource from allenai at 2023. With 45.1K downloads and 170 likes, it is actively used by the community. It is released under the odc-by license and is a n>1T-scale dataset.
About allenai/MADLAD-400
MADLAD-400
Dataset and Introduction
MADLAD-400 (Multilingual Audited Dataset: Low-resource And Document-level) is
a document-level multilingual dataset based on Common Crawl, covering 419
languages in total. This uses all snapshots o...
Details
- Task
- Text Generation
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- n>1T
- Creator
- allenai
- Year
- 2023
- License
- odc-by
- Downloads
- 45129
- Likes
- 170