Classify and extract text 10x better and faster 🦾


➡️  Learn more

SFU Opinion and Comments Corpus (SOCC) Dataset

Created by Kolhatkar et al. at 2018, the SFU Opinion and Comments Corpus (SOCC) Dataset contains 10,339 opinion articles (editorials, columns, and op-eds) together with their 663,173 comments from 303,665 comment threads, from the main Canadian daily in English, The Globe and Mail, from January 2012 to December 2016. In addition there's a subset annotated corpus measuring toxicity, negation and its scope, and appraisal containing 1,043 annotated comments in responses to 10 different articles covering a variety of subjects: technology, immigration, terrorism, politics, budget, social issues, religion, property, and refugees., in English language. Containing 663,173 in CSV file format.

Dataset Sources

Here you can download the SFU Opinion and Comments Corpus (SOCC) dataset in CSV format.

Download SFU Opinion and Comments Corpus (SOCC) dataset CSV files

Fine-tune with SFU Opinion and Comments Corpus (SOCC) dataset

Metatext is a powerful no-code tool for train, tune and integrate custom NLP models

➡️  Learn more

Paper

Read full original SFU Opinion and Comments Corpus (SOCC) paper.

Download PDF paper


Classify and extract text 10x better and faster 🦾

Metatext helps you to classify and extract information from text and documents with customized language models with your data and expertise.