Skip to content

English Datasets

We catalog 200 English datasets for NLP and machine learning, including 3 benchmarks. Browse the list below or narrow down by task.

This page covers English, the most widely represented language in NLP research and the default for most large language models. Our directory includes 200 datasets in English.

Updated June 2026

What tasks do English datasets cover?

Datasets in other languages

Frequently asked questions