apollo-research/Skylion007-openwebtext-tokenizer-gpt2
General NLPEnglishBenchmark
Apollo-research/Skylion007-openwebtext-tokenizer-gpt2 is a General NLP benchmark dataset in English from apollo-research with 8,824,092 records in Parquet format. And falls in the 1M<n<10M size category, and has been downloaded 17.6K times.
📊 This dataset is used as an LLM benchmark. See model leaderboards →
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- 8824092
- Size
- 1M<n<10M
- Creator
- apollo-research
- Year
- 2024
- Downloads
- 17588
- Likes
- 3