Classify and extract text 10x better and faster 🦾


➡️  Learn more

1.5 billion Words Arabic Corpus Dataset

Created by El-khair et al. at 2016, the 1.5 billion Words Arabic Corpus The data were collected from newspaper articles in ten major news sources from eight Arabic countries, over a period of fourteen years., in Arabic language. Containing 5M in XML file format.

Dataset Sources

Here you can download the 1.5 billion Words Arabic Corpus dataset in XML format.

Download 1.5 billion Words Arabic Corpus dataset XML files

Fine-tune with 1.5 billion Words Arabic Corpus dataset

Metatext is a powerful no-code tool for train, tune and integrate custom NLP models

➡️  Learn more

Paper

Read full original 1.5 billion Words Arabic Corpus paper.

Download PDF paper


Classify and extract text 10x better and faster 🦾

Metatext helps you to classify and extract information from text and documents with customized language models with your data and expertise.