AI & NLP Datasets
Search thousands of machine-learning and NLP datasets — filter by task, language, or benchmark status. Each dataset links to its source, paper, and download.
- Datasets
- 5,413
- Benchmarks
- 229
- Tasks
- 179
1000 datasets
- A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs (DROP)Question Answering, Reading ComprehensionEnglishBenchmark
- aaaad1/banned-historical-archivesGeneral NLPEnglishBenchmark
- ACL Anthology Reference Corpus (ACL ARC)Text CorporaEnglishBenchmark
- AgentBenchAgentsEnglishBenchmark
- AI-MO/aimo-validation-aimeGeneral NLPEnglishBenchmark
- AI2 Reasoning Challenge (ARC)Question Answering, Reading ComprehensionEnglishBenchmark
- AI45Research/ATBenchGeneral NLPEnglishBenchmark
- aicrowd/arc-whestbench-public-2026OtherCODEBenchmark
- ajibawa-2023/Software-ArchitectureGeneral NLPENBenchmark
- AlgorithmicResearchGroup/arxiv_s2orc_parsedText Generation, Zero Shot ClassificationENBenchmark
- AlgorithmicResearchGroup/s2orc_arxivText Generation, Summarization, Feature ExtractionENBenchmark
- AlgorithmicResearchGroup/s2orc_fullText Generation, Feature Extraction, Text ClassificationENBenchmark
- Alibaba-NLP/ZeroSearch_datasetQuestion AnsweringEnglishBenchmark
- alibayram/turkish_mmluText Generation, Text Classification, Table Question AnsweringTRBenchmark
- allenai/ai2_arcQuestion AnsweringENBenchmark
- allenai/RLVR-IFevalGeneral NLPEnglishBenchmark
- allenai/winograndeGeneral NLPENBenchmark
- amphora/ResearchMath-14kText Generation, Question AnsweringENBenchmark
- apollo-research/Skylion007-openwebtext-tokenizer-gpt2General NLPEnglishBenchmark
- Arabic Reading Comprehension Dataset (ARCD)Question Answering, Reading ComprehensionArabicBenchmark
- arcee-ai/agent-dataGeneral NLPEnglishBenchmark
- arcee-ai/EvolKit-20kGeneral NLPEnglishBenchmark
- arcee-ai/EvolKit-75KGeneral NLPEnglishBenchmark
- arcee-ai/The-TomeGeneral NLPEnglishBenchmark
- arcinstitute/opengenome2Text GenerationEnglishBenchmark
- arcinstitute/Stack-CellxGene45MGeneral NLPEnglishBenchmark
- arcinstitute/Stack-scBaseCount189MGeneral NLPEnglishBenchmark
- argilla/ifeval-like-dataText GenerationENBenchmark
- artillerywu/DeepResearch-9KGeneral NLPEnglishBenchmark
- ATH-MaaS/Marco_LongspeechAutomatic Speech Recognition, Audio Classification, Text GenerationEN, ZHBenchmark
- Augmentiv/ArchiveDataGeneral NLPEnglishBenchmark
- baber/piqaGeneral NLPEnglishBenchmark
- banned-historical-archives/banned-historical-archivesGeneral NLPEnglishBenchmark
- banned-historical-archives/cankaoxiaoxiGeneral NLPEnglishBenchmark
- banned-historical-archives/huabao-before-1949General NLPEnglishBenchmark
- banned-historical-archives/low-priorityGeneral NLPEnglishBenchmark
- banned-historical-archives/zhongyangribaoGeneral NLPEnglishBenchmark
- bigcode/humanevalpackGeneral NLPCODEBenchmark
- bigcode/starcoderdataText GenerationCODEBenchmark
- BoolQQuestion AnsweringEnglishBenchmark
- ByteDance-Seed/Multi-SWE-benchText GenerationEnglishBenchmark
- ByteDance-Seed/WideSearchGeneral NLPZH, ENBenchmark
- cais/mmluQuestion AnsweringENBenchmark
- CMU_ARCTICSpeech RecognitionEnglishBenchmark
- code-search-net/code_search_netText Generation, Fill MaskCODEBenchmark
- codeparrot/self-instruct-starcoderGeneral NLPENBenchmark
- CodeSearchNet CorpusText CorporaEnglishBenchmark
- CodeXGLUE: CodeSearchNet, AdvTestCode SearchCoding Lang: PythonBenchmark
- CodeXGLUE: NL Code Search WebQueryCode SearchCoding Lang: PythonBenchmark
- CohereLabs/Global-MMLUGeneral NLPEN, AR, BNBenchmark
- CohereLabs/Global-MMLU-LiteGeneral NLPAR, BN, CSBenchmark
- corbyrosset/researchy_questionsQuestion AnsweringENBenchmark
- COVID-19 Open Research Dataset (CORD-19)Text CorporaEnglishBenchmark
- data-archetype/cc12_imagenet21k_recap_hq_bucketedText To ImageENBenchmark
- DeliberatorArchiver/asmr-archive-data-02General NLPJABenchmark
- dell-research-harvard/AmericanStoriesText Classification, Text Generation, Text Retrieval, Summarization, Question AnsweringENBenchmark
- dell-research-harvard/newswireText Classification, Text Generation, Text Retrieval, Summarization, Question AnsweringENBenchmark
- di-zhang-fdu/AIME_1983_2024General NLPEnglishBenchmark
- domenicrosati/TruthfulQAQuestion AnsweringENBenchmark
- Dragonegg2026/banned-historical-archivesGeneral NLPEnglishBenchmark
- echodict/KakologArchives_duplicateText ClassificationJABenchmark
- edinburgh-dawg/mmlu-reduxQuestion AnsweringENBenchmark
- edinburgh-dawg/mmlu-redux-2.0Question AnsweringENBenchmark
- efficient-deep-research/key_pointGeneral NLPEnglishBenchmark
- efficient-deep-research/synthesized_datasetGeneral NLPEnglishBenchmark
- EleutherAI/dropGeneral NLPEnglishBenchmark
- evalplus/humanevalplusGeneral NLPENBenchmark
- evalplus/mbppplusGeneral NLPEnglishBenchmark
- facebook/research-plan-genGeneral NLPEnglishBenchmark
- farhanhubble/jfk-archivesQuestion AnsweringEnglishBenchmark
- FractalAIResearch/DeepResearch-SFTGeneral NLPEnglishBenchmark
- gaia-benchmark/GAIAGeneral NLPENBenchmark
- Glint-Research/Fable-5-tracesText GenerationENBenchmark
- google-research-datasets/conceptual_captionsImage To TextENBenchmark
- google-research-datasets/go_emotionsText ClassificationENBenchmark
- google-research-datasets/mbppGeneral NLPENBenchmark
- google-research-datasets/natural_questionsQuestion AnsweringENBenchmark
- google-research-datasets/nq_openQuestion AnsweringENBenchmark
- google-research-datasets/pawsText ClassificationENBenchmark
- google-research-datasets/paws-xText ClassificationDE, EN, ESBenchmark
- google-research-datasets/tydiqaQuestion AnsweringAR, BN, ENBenchmark
- google/bigbenchMultiple Choice, Question Answering, Text Classification, Text Generation, Zero Shot Classification, OtherENBenchmark
- google/boolqText ClassificationENBenchmark
- google/deepsearchqaQuestion AnsweringENBenchmark
- google/IFEvalText GenerationENBenchmark
- google/spiqaQuestion AnsweringEnglishBenchmark
- gretelai/gretel-math-gsm8k-v1Question AnsweringENBenchmark
- Gryphe/Sonnet3.5-Charcard-RoleplayText GenerationENBenchmark
- HAERAE-HUB/KMMLUMultiple ChoiceKOBenchmark
- hails/bigbenchGeneral NLPEnglishBenchmark
- hails/mmlu_no_trainQuestion AnsweringENBenchmark
- hbXNov/distill_r1_qwen_math_1.5b_128_solns_aime_verificationsGeneral NLPEnglishBenchmark
- HellaSwagCommonsense ReasoningEnglishBenchmark
- hendrycks/competition_mathGeneral NLPENBenchmark
- Home Depot Product Search RelevanceClassificationEnglishBenchmark
- horde-research/kaz-vision-50kGeneral NLPEnglishBenchmark
- HuggingFaceH4/aime_2024General NLPEnglishBenchmark
- HumanEvalCodeEnglishBenchmark
- ibm-research/AssetOpsBenchQuestion Answering, Time Series ForecastingENBenchmark
- ibm-research/duorcQuestion AnsweringENBenchmark
- ibm-research/VAKRAQuestion Answering, Text Retrieval, Text GenerationENBenchmark
- Idavidrein/gpqaQuestion Answering, Text GenerationENBenchmark
- ikala/tmmluplusQuestion AnsweringZHBenchmark
- IliaLarchenko/behavior_224_rgbGeneral NLPEnglishBenchmark
- inclusionAI/ASearcher-Local-KnowledgeGeneral NLPEnglishBenchmark
- Insta360-Research/DAP_dataGeneral NLPEnglishBenchmark
- Insta360-Research/OmniRoomsDepth Estimation, Image To 3DEnglishBenchmark
- Irony Sarcasm Analysis CorpusClassification, Sentiment AnalysisEnglishBenchmark
- jackluoluo/ArchCADVisual Question Answering, Image To TextEnglishBenchmark
- Ji-Pengliang/Harbor-Parity-Test-ARC-AGI-2General NLPEnglishBenchmark
- KakologArchives/KakologArchivesOtherJapaneseBenchmark
- klieret/swe-bench-dummy-test-datasetGeneral NLPEnglishBenchmark
- kunishou/J-ResearchCorpusGeneral NLPJABenchmark
- kylemontgomery/deepresearch-tasksGeneral NLPEnglishBenchmark
- laion/relaion2B-en-researchGeneral NLPEnglishBenchmark
- laion/relaion2B-en-research-safeGeneral NLPEnglishBenchmark
- laion/relaion2B-multi-research-safeGeneral NLPEnglishBenchmark
- LGAI-EXAONE/KMMLU-ProGeneral NLPKOBenchmark
- lighteval/mmluQuestion AnsweringENBenchmark
- linux-cn/archiveGeneral NLPZHBenchmark
- livecodebench/code_generation_liteGeneral NLPEnglishBenchmark
- lmarena-ai/search-arena-24kGeneral NLPEnglishBenchmark
- lmarena-ai/search-arena-v1-7kGeneral NLPEnglishBenchmark
- lmlmcat/cmmluMultiple Choice, Question AnsweringZHBenchmark
- lmms-lab/MMMUGeneral NLPEnglishBenchmark
- Looogic/DeepResearch_wiki_dataGeneral NLPENBenchmark
- lukaemon/mmluGeneral NLPEnglishBenchmark
- m-a-p/SuperGPQAGeneral NLPENBenchmark
- madrylab/gsm8k-platinumGeneral NLPENBenchmark
- manycore-research/SpatialLM-TestsetGeneral NLPEnglishBenchmark
- marcelbinz/Psych-101General NLPENBenchmark
- marcodsn/academic-chainsGeneral NLPENBenchmark
- marcosv/ffhq-datasetGeneral NLPEnglishBenchmark
- math-ai/aime25General NLPEnglishBenchmark
- MathArena/aime_2025General NLPENBenchmark
- MathArena/aime_2026General NLPENBenchmark
- Matthijs/cmu-arctic-xvectorsText To Speech, Audio To AudioEnglishBenchmark
- maveriq/bigbenchhardQuestion Answering, Token Classification, Text ClassificationENBenchmark
- Maxwell-Jia/AIME_2024Text GenerationENBenchmark
- maya-research/IndicVaultQuestion Answering, Text GenerationHI, TE, ENBenchmark
- MBZUAI/ArabicMMLUQuestion AnsweringARBenchmark
- meta-agents-research-environments/gaia2Reinforcement LearningENBenchmark
- Metanova/Submission-ArchiveGeneral NLPEnglishBenchmark
- Microsoft Machine Reading COmprehension Dataset (MS MARCO)Question Answering, Reading ComprehensionEnglishBenchmark
- Microsoft Research Paraphrase Corpus (MRPC)Paraphrasing IdentificationEnglishBenchmark
- Microsoft Research Social Media Conversation CorpusGraph AnalysisEnglishBenchmark
- microsoft/hnm-search-dataText Ranking, Text Retrieval, Text ClassificationENBenchmark
- microsoft/ms_marcoGeneral NLPENBenchmark
- milashkaarshif/MoeGirlPedia_wikitext_raw_archiveText GenerationZH, JA, ENBenchmark
- MMMU/MMMUQuestion Answering, Visual Question Answering, Multiple ChoiceENBenchmark
- MMMU/MMMU_ProQuestion Answering, Visual Question Answering, Multiple ChoiceENBenchmark
- mrlbenchmarks/global-piqa-nonparallelQuestion Answering, Multiple ChoiceACM, ACQ, AEBBenchmark
- Multimodal Sarcasm Detection Dataset (MUStARD)Multi-Modal LearningEnglishBenchmark
- muset-ai/DeepResearch-Bench-II-DatasetText Generation, Text ClassificationZH, ENBenchmark
- Nan-Do/instructional_code-search-net-pythonText GenerationENBenchmark
- nebius/SWE-bench-extraGeneral NLPEnglishBenchmark
- News Headlines Dataset for Sarcasm DetectionClustering, Events, Language DetectionEnglishBenchmark
- nmayorga7/gpqa_diamondGeneral NLPEnglishBenchmark
- NousResearch/CharacterCodexGeneral NLPENBenchmark
- NousResearch/Hermes-3-DatasetGeneral NLPEnglishBenchmark
- NousResearch/hermes-function-calling-v1Text Generation, Question Answering, Feature ExtractionENBenchmark
- NousResearch/json-mode-evalGeneral NLPEnglishBenchmark
- NousResearch/RLVR_Coding_ProblemsGeneral NLPEnglishBenchmark
- nyuuzyou/google-code-archiveText GenerationCODE, ENBenchmark
- OALL/details_deep-analysis-research__D2IL-Arabic-Qwen2.5-72B-Instruct-v0.2_v2General NLPEnglishBenchmark
- omni-research/DREAM-1KGeneral NLPEnglishBenchmark
- OmniAICreator/ASMR-Archive-ProcessedAutomatic Speech Recognition, Text To SpeechJABenchmark
- Open Research CorpusText CorporaEnglishBenchmark
- Open Resource for Click Analysis in Search (ORCAS)Document RankingEnglishBenchmark
- open-index/arcticText Generation, Text Classification, Feature ExtractionENBenchmark
- openai/gsm8kText GenerationENBenchmark
- openai/MMMLUQuestion AnsweringAR, BN, DEBenchmark
- openai/openai_humanevalGeneral NLPENBenchmark
- opencompass/AIME2025Question AnsweringENBenchmark
- OpenResearcher/OpenResearcher-DatasetGeneral NLPEnglishBenchmark
- ParCorFullMachine Translation, Coreference ResolutionGerman, EnglishBenchmark
- Pinkstackorg/dedup-deepresearch-9.4kGeneral NLPENBenchmark
- princeton-nlp/SWE-benchGeneral NLPEnglishBenchmark
- princeton-nlp/SWE-bench_LiteGeneral NLPEnglishBenchmark
- princeton-nlp/SWE-bench_VerifiedGeneral NLPEnglishBenchmark
- pwc-archive/evaluation-tablesGeneral NLPEnglishBenchmark
- pwc-archive/filesGeneral NLPEnglishBenchmark
- reazon-research/reazonspeechAutomatic Speech RecognitionJABenchmark
- RJT1990/GeneralThoughtArchiveGeneral NLPENBenchmark
- RonaldoDD/banned-historical-archivesGeneral NLPEnglishBenchmark
- Rowan/hellaswagGeneral NLPENBenchmark
- SakanaAI/AI-CUDA-Engineer-ArchiveGeneral NLPEnglishBenchmark
- SALT-Research/DeepDialogue-orpheusAudio Classification, Automatic Speech RecognitionENBenchmark
- ScaleAI/SWE-bench_ProGeneral NLPEnglishBenchmark
- ScienceOne-AI/S1-DeepResearch-15kGeneral NLPEN, ZHBenchmark
- SciPhi/AgentSearch-V1Text GenerationENBenchmark
- SearchQAQuestion Answering, Reading ComprehensionEnglishBenchmark
- Self-Annotated Reddit Corpus (SARC)Text Corpora, Sarcasm DetectionEnglishBenchmark
- Semantic Parsing in Context (SParC)Semantic Parsing, SQL-to-TextEnglishBenchmark
- ShapeNet/ShapeNetCore-archiveGeneral NLPENBenchmark
- Shaping Answers with Rules through Conversation (ShARC)Question Answering, Reading ComprehensionEnglishBenchmark
- solarchive/solarchiveGeneral NLPENBenchmark
- somaliscan/spending-archiveTabular Classification, Tabular RegressionENBenchmark
- SWE-benchCodeEnglishBenchmark
- SWE-bench/SWE-benchGeneral NLPEnglishBenchmark
- SWE-bench/SWE-bench_LiteGeneral NLPEnglishBenchmark
- SWE-bench/SWE-bench_MultilingualGeneral NLPENBenchmark
- SWE-bench/SWE-bench_VerifiedGeneral NLPEnglishBenchmark
- SWE-bench/SWE-smithText GenerationENBenchmark
- SWE-bench/SWE-smith-trajectoriesText GenerationENBenchmark
- tasksource/bigbenchMultiple Choice, Question Answering, Text Classification, Text Generation, Zero Shot ClassificationENBenchmark
- tasksource/mmluText Classification, Multiple Choice, Question AnsweringENBenchmark
- TencentARC/MiraDataImage To Video, Text To Image, Text To Video, Video ClassificationENBenchmark
- TencentARC/Plot2CodeText Generation, Text To Image, Image To Text, Image To ImageENBenchmark
- The Semantic Scholar Open Research Corpus (S2ORC)Text Corpora, Knowledge BaseEnglishBenchmark
- TIGER-Lab/MMLU-ProQuestion AnsweringENBenchmark
- TigerResearch/pretrain_zhGeneral NLPEnglishBenchmark
- TigerResearch/sft_zhGeneral NLPZHBenchmark
- TMarcus/MindCraftVisual Question Answering, RoboticsENBenchmark
- truthfulqa/truthful_qaMultiple Choice, Text Generation, Question AnsweringENBenchmark
- Tunisian Arabish Corpus (TArC)Classification, Part-of-Speech (POS)TunisianBenchmark
- ucinlp/dropQuestion AnsweringENBenchmark
- unicamp-dl/mmarcoGeneral NLPEnglishBenchmark
- VibeSearchBench/VibeSearchBenchGeneral NLPEnglishBenchmark
- wenge-research/yayi_uie_sft_dataGeneral NLPZH, ENBenchmark
- wenge-research/yayi2_pretrain_dataGeneral NLPZH, ENBenchmark
- WinoGrandeCommonsense ReasoningEnglishBenchmark
- YaojieShen/hhtools_parc_msRoboticsEN, ZHBenchmark
- Yarina/Meta_Kaggle_Dataset_Archive_2026-03-12General NLPEnglishBenchmark
- yatin-superintelligence/Edge-Agent-Reasoning-WebSearch-260KText Generation, Question Answering, Any To Any, RoboticsENBenchmark
- ybisk/piqaQuestion AnsweringENBenchmark
- yentinglin/aime_2025General NLPEnglishBenchmark
- zai-org/humaneval-xText GenerationCODEBenchmark
- Zhengbo-Zhang/search-imagesGeneral NLPEnglishBenchmark
- 0-hero/Matter-0.1General NLPEnglish
- 0x3/DNS-ChallengeGeneral NLPEnglish
- 0xDing/wikipedia-cn-20230720-filteredText GenerationZH
- 0xJustin/Dungeons-and-DiffusionGeneral NLPEnglish
- 1 Billion Word Language Model Benchmark (lm1b)Language ModelingEnglish
- 1.5 billion Words Arabic CorpusText CorporaArabic
- 1098020341z/vlGeneral NLPEnglish
- 10wind/Mono-Label-120kAudio ClassificationEN
- 1111xxx/zoengjyutgaaiAutomatic Speech Recognition, Text To Speech, Text Generation, Feature Extraction, Audio To Audio, Audio Classification, Text To AudioYUE
- 2WikiMultihopQAQuestion AnsweringEnglish
- 3DReflecNet/3DReflecNetImage To 3DEN
- 3rdn4/terminal-bench-2-leaderboardGeneral NLPEnglish
- 4141ms/CADS-datasetImage SegmentationEnglish
- 5551z/VisCoR-55KGeneral NLPEnglish
- 57xj5SHr/Tui9DGhpGeneral NLPEnglish
- 5CD-AI/LLaVA-CoT-o1-InstructGeneral NLPEnglish
- 5CD-AI/Viet-Handwriting-OCR-v2Image To TextVI
- 5CD-AI/Viet-LAION-Gemini-VQAVisual Question AnsweringVI
- 5CD-AI/Viet-ShareGPT-4o-Text-VQAGeneral NLPEnglish
- 5CD-AI/Vietnamese-395k-meta-math-MetaMathQA-gg-translatedQuestion AnsweringEN, VI
- 5CD-AI/Vietnamese-Intel-orca_dpo_pairs-gg-translatedGeneral NLPEN, VI
- 64bits/lima_vicuna_formatText GenerationEN
- 7zkk/PanoEnvVisual Question Answering, Image Text To TextEN
- A Conversational Question Answering Challenge (CoQA)Question Answering, Reading ComprehensionEnglish
- A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning (CLEVR & CoGenT)Question Answering, VisualEnglish
- A Multi-Turn, Multi-Domain Dialogue Dataset (KVRET)DialogueEnglish
- A Novel Approach to a Semantically-Aware Representation of Items (NASARI)Semantic Textual SimilarityMulti-Lingual
- a-m-team/AM-DeepSeek-Distilled-40MText GenerationZH, EN
- a-m-team/AM-DeepSeek-R1-0528-DistilledText GenerationEN, ZH
- a-m-team/AM-DeepSeek-R1-Distilled-1.4MText GenerationZH, EN
- a-m-team/AM-Thinking-v1-DistilledText GenerationEN, ZH
- a2015003713/military-aircraft-detection-datasetObject Detection, Image Classification, Image Feature ExtractionEnglish
- a686d380/h-corpus-2023General NLPZH
- a686d380/h-corpus-rawGeneral NLPZH
- a686d380/h-evalGeneral NLPZH
- a686d380/sis-novelGeneral NLPEnglish
- aanonyyy/F5I9N7A1General NLPEnglish
- Aasdfip/habitat_web_pose_trainGeneral NLPEnglish
- abacusai/SystemChatGeneral NLPEnglish
- abacusai/SystemChat-1.1General NLPEnglish
- AbbasABC/HFL-DatasetGeneral NLPEnglish
- ABC Australia News CorpusText CorporaEnglish
- Abductive Natural Language Inference (aNLI)Classification, CommonsenseEnglish
- abdullah/IUG-CourseTranscriptsGeneral NLPEnglish
- Aber-r/SA-1B_backupGeneral NLPEnglish
- AbhishekBhandari/Devanagari-OCR-ICL-BenchmarkText Generation, Image To TextHI, MR
- Abirate/english_quotesText ClassificationEN
- abisee/cnn_dailymailSummarizationEN
- abiyo27/BibleTTS_Ewe-BibleGeneral NLPEnglish
- Abrumu/Fashion_controlnet_dataset_V3General NLPEnglish
- Abstract Meaning Respresentation (AMR) BankInformation Extraction, Semantic Role LabelingEnglish
- Abstractive Sentence Simplification Evaluation and Tuning (ASSET)Sentence SimplificationEnglish
- Abtinzandi/Obstacle-Detection-Dataset-YOLOObject DetectionEN
- abuelkhair-corpus/arabic_billion_wordsText Generation, Fill MaskAR
- AcademicSemantic Parsing, Text-to-SQLEnglish
- ACCC1380/private-modelGeneral NLPCH
- acheong08/nsfw_redditGeneral NLPEnglish
- ACL Citation Coreference CorpusCoreference ResolutionEnglish
- acon96/Home-Assistant-RequestsQuestion Answering, Text GenerationEN
- Acronym Detection DatasetAcronym DisambiguationEnglish
- Acronym IdentificationAcronym IdentificationEnglish
- actava/chi-benchText GenerationEN
- Action Learning From Realistic Environments and Directives (ALFRED)Multi-Modal LearningEnglish
- activegalaxy/UVH-26Object DetectionUND
- Activitynet-QAQuestion Answering, Visual, CommonsenseEnglish
- ad1t7a/10Kh-RealOmin-OpenDataRobotics, Reinforcement LearningEN, ZH
- adadai3132/PanoHK360Depth Estimation, Image To ImageEN
- adams-story/datacomp200mGeneral NLPEnglish
- AdaptLLM/finance-tasksText Classification, Question Answering, Zero Shot ClassificationEN
- AdaptLLM/law-tasksText Classification, Question Answering, Zero Shot ClassificationEN
- AdaptLLM/medicine-tasksText Classification, Question Answering, Zero Shot ClassificationEN
- addisonwu05/llm-polysemy-outputsText To Image, Text GenerationEN, TR, FR
- ade-benchmark-corpus/ade_corpus_v2Text Classification, Token ClassificationEN
- AdithyaSK/RAG_EvalGeneral NLPEnglish
- adrianmele/computer-use-largeVideo Classification, RoboticsEN
- ADSKAILab/Zero-To-CAD-1mText To 3D, Image To 3DEN, CODE
- Adversarial NLI (ANLI)Natural Language Inference (NLI)English
- Adverse Drug Effect (ADE) CorpusInformation ExtractionEnglish
- AdvisingSemantic Parsing, Text-to-SQLEnglish
- adyen/DABstepGeneral NLPEnglish
- Aeala/ShareGPT_Vicuna_unfilteredGeneral NLPEN
- Aesthetics Text CorpusText CorporaHindi
- afaji/cvqaQuestion AnsweringID, SU, JA
- Affective TextEmotion ClassificationEnglish
- AG NewsClassificationEnglish
- AGBonnet/augmented-clinical-notesText GenerationEN
- Agent-Ark/Toucan-1.5MGeneral NLPEnglish
- agentica-org/DeepCoder-Preview-DatasetGeneral NLPEN
- agentica-org/DeepScaleR-Preview-DatasetGeneral NLPEN
- agentlans/high-quality-english-sentencesText Classification, Text Generation, Feature Extraction, Sentence SimilarityEN
- Agentrix212AI/RessourcesGeneral NLPEnglish
- agents-course/certificatesGeneral NLPEnglish
- agents-course/course-imagesGeneral NLPEnglish
- agents-course/unit4-students-scoresGeneral NLPEnglish
- agents-last-exam/agents-last-examGeneral NLPEN
- agents-last-exam/agents-last-exam-dataGeneral NLPEN
- agents-last-exam/agents-last-exam-referenceGeneral NLPEN
- agentsea/wave-ui-25kGeneral NLPEnglish
- agibot-world/AgiBotDigitalWorldOtherEN
- agibot-world/AgiBotWorld-AlphaRobotics, OtherEN
- agibot-world/AgiBotWorld-BetaOther, RoboticsEN
- agibot-world/AgiBotWorld2026RoboticsEN
- agkphysics/AudioSetAudio ClassificationEN
- agungpambudi/math-dataset-measuring-mathematical-problem-solvingQuestion AnsweringEN
- aharley/rvl_cdipImage ClassificationEN
- ahmed-masry/ChartQAGeneral NLPEnglish
- Ahnuf/Military_Aircraft_Detection_Classification_Image_DatasetObject Detection, Image ClassificationEnglish
- ai-for-good-lab/ai4g-flood-datasetGeneral NLPEnglish
- ai-hyz/MemoryAgentBenchQuestion Answering, Zero Shot Classification, Summarization, Text Classification, Text GenerationEN
- AI-Lab-Makerere/beansImage ClassificationEN
- AI-MO/NuminaMath-1.5Text GenerationEN
- AI-MO/NuminaMath-CoTText GenerationEN
- AI-MO/NuminaMath-LEANGeneral NLPEnglish
- AI-MO/NuminaMath-TIRText GenerationEN
- AI-MO/olympiadsGeneral NLPEnglish
- ai-safety-institute/AgentHarmGeneral NLPEnglish
- AI2 Science Questions MercuryReading ComprehensionEnglish
- AI2 Science Questions v2.1Question Answering, Reading ComprehensionEnglish
- ai4bharat/BPCCGeneral NLPEnglish
- ai4bharat/IndicVoicesGeneral NLPEnglish
- ai4bharat/indicvoices_rText To SpeechAS, BN, GU
- ai4bharat/RasaText To SpeechAS, BN, KN
- ai4bharat/samanantarText Generation, TranslationEN, AS, BN
- ai4bharat/sangrahaText GenerationAS, BN, GU
- AI4Chem/ChemData700KGeneral NLPEnglish
- AI4Math/MathVerseMultiple Choice, Question Answering, Visual Question AnsweringEN
- AI4Math/MathVistaMultiple Choice, Question Answering, Visual Question Answering, Text ClassificationEN, ZH, FA
- ai4privacy/pii-masking-200kText Classification, Token Classification, Table Question Answering, Question Answering, Zero Shot Classification, Summarization, Feature Extraction, Text Generation, Translation, Fill Mask, Tabular Classification, Tabular To Text, Table To Text, Text Retrieval, OtherEN, FR, DE
- ai4privacy/pii-masking-300kText Classification, Token Classification, Table Question Answering, Question Answering, Zero Shot Classification, Summarization, Feature Extraction, Text Generation, Translation, Fill Mask, Tabular Classification, Tabular To Text, Table To Text, Text Retrieval, OtherEN, FR, DE
- ai4privacy/pii-masking-400kText Classification, Token Classification, Table Question Answering, Question Answering, Zero Shot Classification, Summarization, Feature Extraction, Text Generation, Translation, Fill Mask, Tabular Classification, Tabular To Text, Table To Text, Text Retrieval, OtherEN, FR, DE
- aicrowd/whestbench-smoke-mlpOtherCODE
- AiEDA/iDATAGeneral NLPEnglish
- AIencoder/llama-cpp-wheelsText GenerationEN
- aifeifei798/DPO_Pairs-Roleplay-NSFWGeneral NLPEN
- Aignostics/OpenTMEImage Classification, Image Segmentation, Image Feature Extraction, Object DetectionEnglish
- ailsntua/ChordonomiconGeneral NLPEnglish
- AirDialogueDialogueEnglish
- AiresPucrs/stanford-encyclopedia-philosophyText Classification, Text GenerationEN
- airtrain-ai/fineweb-edu-fortifiedText GenerationEN
- aisa-group/PostTrainBench-TrajectoriesText GenerationEN
- Aisha-AI-Official/sdxl-modelsGeneral NLPEnglish
- ait4x/polyu-storyworld-charactersGeneral NLPEnglish
- AITRADER/dutch-tts-labeled-completeText To Speech, Automatic Speech RecognitionNL
- ajibawa-2023/Children-Stories-CollectionText GenerationEN
- ajibawa-2023/General-Stories-CollectionText GenerationEN
- ajibawa-2023/Java-Code-LargeText GenerationEN
- ajibawa-2023/JavaScript-Code-LargeText GenerationEN
- ajibawa-2023/Maths-CollegeText Generation, Question AnsweringEN
- ajibawa-2023/Python-Code-23k-ShareGPTGeneral NLPEN
- akariasai/PopQAGeneral NLPEnglish
- akasheroor/American-Sign-Language-DatasetGeneral NLPEnglish
- AkashPS11/recipes_data_food.comGeneral NLPEnglish
- akjindal53244/Arithmo-DataGeneral NLPEnglish
- akoksal/LongFormTable Question Answering, Summarization, Text Generation, Question AnsweringEN
- albertvillanova/datasets-tests-compressionGeneral NLPEnglish
- albertvillanova/legal_contractsGeneral NLPEnglish
- albertvillanova/tests-raw-jsonlGeneral NLPEnglish
- albertwilcox/mesa-all-train-lerobotGeneral NLPEnglish
- AlekseyKorshuk/persona-chatGeneral NLPEnglish
- alespalla/chatbot_instruction_promptsQuestion Answering, Text GenerationEN
- Alex-Song/MSR-86KAutomatic Speech RecognitionES, KO, EN
- alexfabbri/multi_newsSummarizationEN
- alexisplacet/adsblol_globe_historyOtherEN
- Alfaxad/vector-100kVisual Question Answering, Text GenerationEN
- ali-sh07/COCO-train2014General NLPEnglish
- aliasfox/srtm30m-mergedGeneral NLPEnglish
- Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120bText GenerationEN
- Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-LogprobText GenerationEN
- alibaba-multimodal-industrial-ai/IndustryBench-MIPUImage To Text, Visual Question AnsweringZH
- alibayram/doktorsitesiText Generation, Text Classification, Table Question AnsweringTR
- AlicanKiraz0/Agentic-Chain-of-Thought-Coding-SFT-DatasetText GenerationEN
- AlicanKiraz0/All-CVE-Records-Training-DatasetText GenerationEN
- AlicanKiraz0/Cybersecurity-Dataset-Fenrir-v2.1Text GenerationEN
- AlicanKiraz0/Turkish-Finance-SFT-DatasetQuestion AnsweringTR
- AlicanKiraz0/Turkish-SFT-Dataset-v1.0Text Classification, Question Answering, Text GenerationTR
- AlienKevin/SWE-ZERO-12M-trajectoriesText GenerationEN
- aline-gassenn/MedDialog-AudioAutomatic Speech RecognitionEN
- alkzar90/NIH-Chest-X-ray-datasetImage ClassificationEN
- All the News 2.0Text CorporaEnglish
- allenai/blog-imagesGeneral NLPEnglish
- allenai/c4Text Generation, Fill MaskAF, AM, AR
- allenai/cosmos_qaMultiple ChoiceEN
- allenai/CoSyn-400KVisual Question AnsweringEnglish
- allenai/Dolci-Instruct-SFTOtherAMH, ARB, ARY
- allenai/dolmaText GenerationEN
- allenai/dolma3_dolmino_mix-100B-1025Text GenerationEN
- allenai/dolma3_dolmino_mix-100B-1125General NLPEN
- allenai/dolma3_dolmino_poolText GenerationEN
- allenai/dolma3_longmino_mix-100B-1125General NLPEN
- allenai/dolma3_longmino_mix-50B-1025General NLPEN
- allenai/dolma3_longmino_poolGeneral NLPEN
- allenai/dolma3_mix-6TText GenerationEN
- allenai/dolma3_mix-6T-1025-7BText GenerationEN
- allenai/dolma3_poolText GenerationEN
- allenai/dolmino-mix-1124Text GenerationEN
- allenai/lilaGeneral NLPEnglish
- allenai/MADLAD-400Text GenerationEnglish
- allenai/math_qaQuestion AnsweringEN
- allenai/MolmoAct-Midtraining-MixtureRoboticsEnglish
- allenai/MolmoAct2-BimanualYAM-DatasetRoboticsEnglish
- allenai/molmobot-dataGeneral NLPEN
- allenai/molmospacesGeneral NLPEnglish
- allenai/nllbGeneral NLPEnglish
- allenai/objaverseGeneral NLPEN
- allenai/objaverse-xlGeneral NLPEN
- allenai/olmixOtherEnglish
- allenai/olmo-mix-1124Text GenerationEN
- allenai/olmOCR-benchGeneral NLPEN
- allenai/olmOCR-mix-0225General NLPEnglish
- allenai/olmOCR-mix-1025General NLPEnglish
- allenai/OLMoE-mix-0924Text GenerationEN
- allenai/openbookqaQuestion AnsweringEN
- allenai/palomaGeneral NLPEnglish
- allenai/peS2oText Generation, Fill MaskEN
- allenai/pixmo-capImage To TextEnglish
- allenai/pixmo-docsVisual Question AnsweringEnglish
- allenai/pixmo-pointsGeneral NLPEnglish
- allenai/prosocial-dialogText ClassificationEN
- allenai/qasperQuestion AnsweringEN
- allenai/quacQuestion Answering, Text Generation, Fill MaskEN
- allenai/real-toxicity-promptsGeneral NLPEN
- allenai/reward-benchQuestion AnsweringEN
- allenai/reward-bench-2Question AnsweringEN
- allenai/reward-bench-resultsGeneral NLPEnglish
- allenai/ropesQuestion AnsweringEN
- allenai/sciqQuestion AnsweringEN
- allenai/scirepevalGeneral NLPEnglish
- allenai/SciRIFFGeneral NLPEN
- allenai/scitldrSummarizationEN
- allenai/social_i_qaGeneral NLPEN
- allenai/sodaGeneral NLPEN
- allenai/tulu-3-sft-mixtureOtherAMH, ARB, ARY
- allenai/tulu-3-sft-olmo-2-mixtureOtherAMH, ARB, ARY
- allenai/tulu-3-sft-personas-instruction-followingText GenerationEN
- allenai/tulu-v2-sft-mixtureQuestion Answering, Text GenerationEN
- allenai/ultrafeedback_binarized_cleanedGeneral NLPEnglish
- allenai/WildBenchText GenerationEN
- allenai/WildChatText Generation, Question AnsweringEnglish
- allenai/WildChat-1MText Generation, Question AnsweringEnglish
- allenai/WildChat-1M-FullText Generation, Question AnsweringEnglish
- allenai/WildChat-4.8MText Generation, Question AnsweringEnglish
- allenai/WildChat-4.8M-FullText Generation, Question AnsweringEnglish
- allenai/wildguardmixText ClassificationEN
- allenai/wildjailbreakText GenerationEN
- allganize/RAG-Evaluation-Dataset-JAGeneral NLPJA
- allganize/RAG-Evaluation-Dataset-KOGeneral NLPKO
- alphatechlogics/SEC-EDGAR-DownloadGeneral NLPEnglish
- alpindale/light-novelsGeneral NLPEnglish
- alpindale/two-million-bluesky-postsGeneral NLPEN
- alpindale/visual-novelsText GenerationEN
- alvanlii/cantonese-youtubeAutomatic Speech Recognition, Audio ClassificationZH, YUE
- amaai-lab/MidiCapsGeneral NLPEnglish
- amaai-lab/MusicBenchGeneral NLPEnglish
- amaye15/NSFWGeneral NLPEnglish
- Amazon Fine Food ReviewsClassification, Sentiment AnalysisEnglish
- Amazon ReviewsClassification, Sentiment AnalysisEnglish
- amazon-agi/SIFT-50MAudio Text To Text, Audio Classification, Text To Speech, Audio To AudioEN, DE, FR
- AmazonScience/document-haystackQuestion Answering, Visual Question Answering, Document Question AnsweringEN
- AmazonScience/FalseRejectText Generation, Fill MaskEN
- AmazonScience/massiveText ClassificationEnglish
- Amber-River/Pixiv-2.6MImage Classification, Image To Text, Image To Image, Text To Image, Image Feature ExtractionEN, JA
- AmbigNQQuestion Answering, Reading ComprehensionEnglish
- amir0907/dataproGeneral NLPEnglish
- amirveyseh/acronym_identificationToken ClassificationEN
- Amod/mental_health_counseling_conversationsText Generation, Question AnsweringEN
- amongglue/muse_textbooksGeneral NLPEnglish
- amphion/EmiliaText To SpeechZH, EN, JA
- amphion/Emilia-DatasetText To Speech, Automatic Speech RecognitionZH, EN, JA
- amphion/Emilia-NVText To Speech, Automatic Speech RecognitionZH
- amphora/euler-math-logsGeneral NLPEnglish
- amphora/QwQ-LongCoT-130KText GenerationEN
Showing the first 500 of 1000 matches — refine with search or filters.
Browse datasets by task
Jump straight to curated collections for each NLP task.
General NLP (297)Text Corpora (154)Text Generation (137)Question Answering (130)Classification (45)Reading Comprehension (43)Text Classification (33)Machine Translation (25)Sentiment Analysis (21)Dialogue (21)Visual Question Answering (21)Text To Image (20)Image To Text (18)Summarization (18)Information Extraction (15)Named Entity Recognition (NER) (14)Image Classification (14)Image Text To Text (14)Knowledge Base (13)Commonsense (13)Robotics (10)Visual (10)Clustering (9)Fill Mask (9)
Browse datasets by language
Find datasets in the language you're working with.