Sentence Similarity • 0.1B • Updated • 10.3k • • 26
Dataset Viewer
Dataset Card for Habr QnA
Dataset Summary
This is a dataset of questions and answers scraped from Habr QnA. There are 723430 asked questions with answers, comments and other metadata.
Languages
The dataset is mostly Russian with source code in different languages.
Dataset Structure
Data Fields
Data fields can be previewed on the dataset card page.
Data Splits
All 723430 examples are in the train split, there is no validation split.
Dataset Creation
The data was scraped with a script, located in my GitHub repository
Additional Information
Dataset Curators
- Downloads last month
- 87
