Classify and extract text 10x better and faster 🦾


➡️  Learn more

CoVoST Dataset

Created by Wang et al. at 2020, the CoVoST Dataset is a multilingual speech-to-text translation corpus covering translations from 21 languages into English and from English into 15 languages. The overall speech duration is 2,880 hours. The total number of speakers is 78K., in Multi-Lingual language. Containing 2,880 Hours in TSV, MP3 file format.

Dataset Sources

Here you can download the CoVoST dataset in TSV, MP3 format.

Download CoVoST dataset TSV, MP3 files

Fine-tune with CoVoST dataset

Metatext is a powerful no-code tool for train, tune and integrate custom NLP models

➡️  Learn more

Paper

Read full original CoVoST paper.

Download PDF paper


Classify and extract text 10x better and faster 🦾

Metatext helps you to classify and extract information from text and documents with customized language models with your data and expertise.