code-search-net/code_search_net
Text GenerationFill MaskCODEBenchmark
Created by code-search-net at 2022, the code-search-net/code_search_net is a text generation benchmark dataset in CODE containing 4,141,072 records in Parquet format. With 22.2K downloads and 331 likes, it is actively used by the community. It is released under the other license and is a 1M<n<10M-scale dataset.
📊 This dataset is used as an LLM benchmark. See model leaderboards →
About code-search-net/code_search_net
Dataset Card for CodeSearchNet corpus
Dataset Summary
CodeSearchNet corpus is a dataset of 2 milllion (comment, code) pairs from opensource libraries hosted on GitHub. It contains code and documentation for several programming langua...
Details
- Task
- Text Generation, Fill Mask
- Language
- CODE
- Format
- Parquet
- Rows / instances
- 4141072
- Size
- 1M<n<10M
- Creator
- code-search-net
- Year
- 2022
- License
- other
- Downloads
- 22168
- Likes
- 331