ai4bharat/sangraha
Text GenerationAS, BN, GUcc-by-4.0
Ai4bharat/sangraha is a text generation dataset in AS, BN, GU from ai4bharat in Parquet format. It is distributed under the cc-by-4.0 license and falls in the 100M<n<1B size category, and has been downloaded 7.4K times.
About ai4bharat/sangraha
Sangraha
Sangraha is the largest high-quality, cleaned Indic language pretraining data containing 251B tokens summed up over 22 languages, extracted from curated sources, existing multilingual corpora and large scale translations.
More in...
Details
- Task
- Text Generation
- Language
- AS, BN, GU
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100M<n<1B
- Creator
- ai4bharat
- Year
- 2024
- License
- cc-by-4.0
- Downloads
- 7405
- Likes
- 74