Question 1

What is the Tencent AI Lab Embedding Corpus dataset?

Accepted Answer

Dataset provides 200-dimension vector representations, a.k.a. embeddings, for over 8 million Chinese words and phrases.

Question 2

Is Tencent AI Lab Embedding Corpus a benchmark?

Accepted Answer

Tencent AI Lab Embedding Corpus is a dataset for training or evaluation; it isn't tracked as a standard LLM benchmark in our catalog.

Question 3

Where can I download Tencent AI Lab Embedding Corpus?

Accepted Answer

Tencent AI Lab Embedding Corpus is available at its source: https://ai.tencent.com/ailab/nlp/embedding.html.

Tencent AI Lab Embedding Corpus

About Tencent AI Lab Embedding Corpus