1.5 billion Words Arabic Corpus Dataset
Created by El-khair et al. at 2016, the 1.5 billion Words Arabic Corpus The data were collected from newspaper articles in ten major news sources from eight Arabic countries, over a period of fourteen years., in Arabic language. Containing 5M in XML file format.
Dataset Sources
Here you can download the 1.5 billion Words Arabic Corpus dataset in XML format.
Download 1.5 billion Words Arabic Corpus dataset XML files
Fine-tune with 1.5 billion Words Arabic Corpus dataset
Metatext is a powerful no-code tool for train, tune and integrate custom NLP models
Paper
Read full original 1.5 billion Words Arabic Corpus paper.
Classify and extract text 10x better and faster 🦾
Metatext helps you to classify and extract information from text and documents with customized language models with your data and expertise.