NewSHead Dataset
Created by Gu et al. at 2020, the NewSHead Dataset contains 369,940 English stories with 932,571 unique URLs, among which we have 359,940 stories for training, 5,000 for validation, and 5,000 for testing, respectively. Each news story contains at least three (and up to five) articles., in English language. Containing 369,94 in JSON file format.
Dataset Sources
Here you can download the NewSHead dataset in JSON format.
Download NewSHead dataset JSON files
Fine-tune with NewSHead dataset
Metatext is a powerful no-code tool for train, tune and integrate custom NLP models
Paper
Read full original NewSHead paper.
Classify and extract text 10x better and faster 🦾
Metatext helps you to classify and extract information from text and documents with customized language models with your data and expertise.