Skip to content

agentlans/high-quality-english-sentences

Text ClassificationText GenerationFeature ExtractionSentence SimilarityENodc-by

Agentlans/high-quality-english-sentences is a text classification-focused dataset in EN distributed in Parquet format. It is distributed under the odc-by license and falls in the 1M<n<10M size category, and has been downloaded 374 times.

About agentlans/high-quality-english-sentences

High-Quality English Sentences Dataset Description This dataset contains a collection of high-quality English sentences sourced from C4 and FineWeb (not FineWeb-Edu). The sentences have been carefully filtered and processed to ensure...

Details

Task
Text Classification, Text Generation, Feature Extraction, Sentence Similarity
Language
EN
Format
Parquet
Rows / instances
N/A
Size
1M<n<10M
Creator
agentlans
Year
2024
License
odc-by
Downloads
374
Likes
37
Download Homepage

FAQ