Skip to content

abuelkhair-corpus/arabic_billion_words

Text GenerationFill MaskAR

Abuelkhair-corpus/arabic_billion_words is a text generation-focused dataset in AR that provides 5,370,082 labeled examples distributed in Parquet format. It is distributed under the unknown license and falls in the 100K<n<1M size category, and has been downloaded 220 times.

About abuelkhair-corpus/arabic_billion_words

Abu El-Khair Corpus is an Arabic text corpus, that includes more than five million newspaper articles. It contains over a billion and a half words in total, out of which, there are about three million unique words. The corpus is encoded with two t...

Details

Task
Text Generation, Fill Mask
Language
AR
Format
Parquet
Rows / instances
5370082
Size
100K<n<1M
Creator
abuelkhair-corpus
Year
2022
License
unknown
Downloads
220
Likes
34
Download Homepage

Related Text Generation, Fill Mask datasets

FAQ