AlgorithmicResearchGroup/s2orc_full
Text GenerationFeature ExtractionText ClassificationENBenchmarkodc-by
Created by AlgorithmicResearchGroup at 2024, the AlgorithmicResearchGroup/s2orc_full is a text generation benchmark dataset in EN containing 14,515,649 records in Parquet format. With 39.7K downloads and 0 likes, it is actively used by the community. It is released under the odc-by license and is a 10M<n<100M-scale dataset.
📊 This dataset is used as an LLM benchmark. See model leaderboards →
About AlgorithmicResearchGroup/s2orc_full
S2ORC Full — Semantic Scholar Open Research Corpus
A complete redistribution of the S2ORC dataset in Parquet format on Hugging Face, containing 14.5 million academic papers with full text, structured metadata, and citation information.
...
Details
- Task
- Text Generation, Feature Extraction, Text Classification
- Language
- EN
- Format
- Parquet
- Rows / instances
- 14515649
- Size
- 10M<n<100M
- Creator
- AlgorithmicResearchGroup
- Year
- 2024
- License
- odc-by
- Downloads
- 39656
- Likes
- 0