Question 1

What is the Web Inventory of Transcribed and Translated Talks (WIT3) dataset?

Accepted Answer

Dataset contains a collection of transcribed and translated talks. The core of the dataset is from Ted Talks corpus. As of 2016, It holds 109 languages.

Question 2

Is Web Inventory of Transcribed and Translated Talks (WIT3) a benchmark?

Accepted Answer

Web Inventory of Transcribed and Translated Talks (WIT3) is a dataset for training or evaluation; it isn't tracked as a standard LLM benchmark in our catalog.

Question 3

Where can I download Web Inventory of Transcribed and Translated Talks (WIT3)?

Accepted Answer

Web Inventory of Transcribed and Translated Talks (WIT3) is available at its source: https://wit3.fbk.eu/mono.php?release=XML_releases&tinfo=cleanedhtml_ted.

Web Inventory of Transcribed and Translated Talks (WIT3)

About Web Inventory of Transcribed and Translated Talks (WIT3)

Details

Related Machine Translation datasets

FAQ