agentlans/high-quality-english-sentences
Text ClassificationText GenerationFeature ExtractionSentence SimilarityENodc-by
Agentlans/high-quality-english-sentences is a text classification-focused dataset in EN distributed in Parquet format. It is distributed under the odc-by license and falls in the 1M<n<10M size category, and has been downloaded 374 times.
About agentlans/high-quality-english-sentences
High-Quality English Sentences
Dataset Description
This dataset contains a collection of high-quality English sentences sourced from C4 and FineWeb (not FineWeb-Edu). The sentences have been carefully filtered and processed to ensure...
Details
- Task
- Text Classification, Text Generation, Feature Extraction, Sentence Similarity
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 1M<n<10M
- Creator
- agentlans
- Year
- 2024
- License
- odc-by
- Downloads
- 374
- Likes
- 37