Skip to content

ai4bharat/BPCC

General NLPEnglish

The ai4bharat/BPCC dataset is a English General NLP resource from ai4bharat at 2025. With 511 downloads and 32 likes, it is actively used by the community and is a 100M<n<1B-scale dataset.

About ai4bharat/BPCC

BPCC Dataset Training Bharat Parallel Corpus Collection (BPCC) is a comprehensive and publicly available parallel corpus that includes both existing and new data for all 22 scheduled Indic languages. It is comprised of two parts: BPC...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
ai4bharat
Year
2025
Downloads
511
Likes
32
Download Homepage

Related General NLP datasets

FAQ