Question 1

What is the Multilingual Corpus of Sentence-Aligned Spoken Utterances (MaSS) dataset?

Accepted Answer

Dataset of 8,130 parallel spoken utterances across 8 languages (56 language pairs). Languages: Basque, English, Finnish, French. Hungarian, Romanian, Russian, Spanish.

Question 2

Is Multilingual Corpus of Sentence-Aligned Spoken Utterances (MaSS) a benchmark?

Accepted Answer

Multilingual Corpus of Sentence-Aligned Spoken Utterances (MaSS) is a dataset for training or evaluation; it isn't tracked as a standard LLM benchmark in our catalog.

Question 3

Where can I download Multilingual Corpus of Sentence-Aligned Spoken Utterances (MaSS)?

Accepted Answer

Multilingual Corpus of Sentence-Aligned Spoken Utterances (MaSS) is available at its source: https://github.com/getalp/mass-dataset.

Multilingual Corpus of Sentence-Aligned Spoken Utterances (MaSS)

About Multilingual Corpus of Sentence-Aligned Spoken Utterances (MaSS)

Details

Related Speech Corpora datasets

FAQ