allenai/dolma3_mix-6T
Text GenerationENodc-by
The allenai/dolma3_mix-6T dataset is a EN text generation resource from allenai at 2025. With 71.5K downloads and 33 likes, it is actively used by the community. It is released under the odc-by license.
About allenai/dolma3_mix-6T
Dolma 3 Mix (6T)
The Dolma 3 Mix (6T) is the collection of data used during the pretraining stage to train the Olmo-3-1125-32B model. This dataset is made up of ~6 trillion tokens from a diverse mix of web content, academic publications, code, ...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- allenai
- Year
- 2025
- License
- odc-by
- Downloads
- 71487
- Likes
- 33