Skip to content

allenai/dolma3_longmino_mix-50B-1025

General NLPENodc-by

Created by allenai at 2025, the allenai/dolma3_longmino_mix-50B-1025 is a General NLP dataset in EN in Parquet format. With 33K downloads and 10 likes, it is actively used by the community. It is released under the odc-by license.

About allenai/dolma3_longmino_mix-50B-1025

Dolma 3 Longmino Mix (50B) The Dolma 3 Longmino Mix (50B) is the mixture of data used for the third stage of training for Olmo 3 7B model. Dataset Sources Source Type Tokens Docs LC-s2pdf-REX 32k-64k Synth PDFs 6.08B (12....

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
allenai
Year
2025
License
odc-by
Downloads
32957
Likes
10
Download Homepage

Related General NLP datasets

FAQ