allenai/dolma3_longmino_mix-100B-1125
General NLPENodc-by
Allenai/dolma3_longmino_mix-100B-1125 is a General NLP dataset in EN from allenai in Parquet format. It is distributed under the odc-by license, and has been downloaded 18.3K times.
About allenai/dolma3_longmino_mix-100B-1125
Dolma 3 Longmino Mix (100B)
The Dolma 3 Longmino Mix (100B) is the mixture of data used for the third stage of training for Olmo 3 32B model.
Dataset Sources
Source
Type
LC-s2pdf-REX 32k-64k
Synth PDFs
LC-s2pdf-CWE 32k-...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- allenai
- Year
- 2025
- License
- odc-by
- Downloads
- 18313
- Likes
- 17