Skip to content

CMU_ARCTIC

Speech RecognitionEnglishBenchmark

CMU_ARCTIC is a speech recognition benchmark dataset in English from CMU with 1,15 records in WAV format.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About CMU_ARCTIC

Dataset contains 1,150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databases include US English male (bdl) and female (slt) speakers (both experinced voice talent) as well as other accented speakers.

Details

Task
Speech Recognition
Language
English
Format
WAV
Rows / instances
1,15
Creator
CMU
Year
2004
Download Paper

Related Speech Recognition datasets

FAQ