Question 1

What is the CLIRMatrix dataset?

Accepted Answer

Dataset is a collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval extracted automatically from Wikipedia. It comprises of (1) BI-139, a bilingual dataset of queries in one language matched with relevant documents in another language for 19,182 language pairs, and (2) MULTI-8, a mult…

Question 2

Is CLIRMatrix a benchmark?

Accepted Answer

CLIRMatrix is a dataset for training or evaluation; it isn't tracked as a standard LLM benchmark in our catalog.

Question 3

Where can I download CLIRMatrix?

Accepted Answer

CLIRMatrix is available at its source: https://github.com/ssun32/CLIRMatrix.

CLIRMatrix

About CLIRMatrix

Details

Related Information Retrieval datasets

FAQ