Skip to content

ai4bharat/sangraha

Text GenerationAS, BN, GUcc-by-4.0

Ai4bharat/sangraha is a text generation dataset in AS, BN, GU from ai4bharat in Parquet format. It is distributed under the cc-by-4.0 license and falls in the 100M<n<1B size category, and has been downloaded 7.4K times.

About ai4bharat/sangraha

Sangraha Sangraha is the largest high-quality, cleaned Indic language pretraining data containing 251B tokens summed up over 22 languages, extracted from curated sources, existing multilingual corpora and large scale translations. More in...

Details

Task
Text Generation
Language
AS, BN, GU
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
ai4bharat
Year
2024
License
cc-by-4.0
Downloads
7405
Likes
74
Download Homepage

Related Text Generation datasets

FAQ