Self-Annotated Reddit Corpus (SARC)
Text CorporaSarcasm DetectionEnglishBenchmark
Self-Annotated Reddit Corpus (SARC) is a text corpora benchmark dataset in English from Khodak et al. with 1.3 records in CSV format.
📊 This dataset is used as an LLM benchmark. See model leaderboards →
About Self-Annotated Reddit Corpus (SARC)
Dataset contains 1.3 million sarcastic comments from the Internet commentary website Reddit. It contains statements, along with their responses as well as many non-sarcastic comments from the same source.
Details
- Task
- Text Corpora, Sarcasm Detection
- Language
- English
- Format
- CSV
- Rows / instances
- 1.3M
- Creator
- Khodak et al.
- Year
- 2017