Classify and extract text 10x better and faster 🦾


➡️  Learn more

NewSHead Dataset

Created by Gu et al. at 2020, the NewSHead Dataset contains 369,940 English stories with 932,571 unique URLs, among which we have 359,940 stories for training, 5,000 for validation, and 5,000 for testing, respectively. Each news story contains at least three (and up to five) articles., in English language. Containing 369,94 in JSON file format.

Dataset Sources

Here you can download the NewSHead dataset in JSON format.

Download NewSHead dataset JSON files

Fine-tune with NewSHead dataset

Metatext is a powerful no-code tool for train, tune and integrate custom NLP models

➡️  Learn more

Paper

Read full original NewSHead paper.

Download PDF paper


Classify and extract text 10x better and faster 🦾

Metatext helps you to classify and extract information from text and documents with customized language models with your data and expertise.