Dataset Viewer

datasetId large_stringlengths 6 123	author large_stringlengths 2 42	last_modified large_stringdate 2021-02-22 10:20:34 2026-06-08 02:08:03	downloads int64 0 2.77M	likes int64 0 9.73k	tags large listlengths 1 6.16k	task_categories large listlengths 0 0	createdAt large_stringdate 2022-03-02 23:29:22 2026-06-08 02:06:48	trending_score float64 0 200	card large_stringlengths 31 29.7M
mzio/aprm-sft_thinkact-Eaprm_tw_treasure_easy_sp-Gnobandit_aprm_qw3_ap-S42-Rmt128_nb_treasure_ea	mzio	2026-03-10T01:43:14Z	39	0	[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2026-03-09T15:27:39Z	0	--- {} --- # Act-PRM Rollout Dataset ## Run Metadata - run_name: `act-prm-cc-isas=0-reru=0-enco=act_prm_tw_treasure_easy_sp-geco=nobandit_aprm_qwen3_ap-trco=aprm_for_sft100-moco=hf_qwen3_4b_inst_2507-loco=r8_a16_qkvo-acon=1-hiob=1-mato=128-difa=0_9-grsi=8-basi=8-lera=0_001-nusu=1-se=42-re=mt128_nb_treasure_easy` ...
yashaswinienkefalos/merged_all	yashaswinienkefalos	2025-12-06T12:35:02Z	5	0	[ "size_categories:100K<n<1M", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2025-12-06T12:34:46Z	0	--- dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string - name: citation list: string - name: link_validation list: - name: reason dtype: string - name: status dtype: string - name: url dtype: st...
electricsheepafrica/africa-who-antenatal-care-coverage-at-least-one-visit-sitpercent	electricsheepafrica	2026-05-01T17:49:48Z	0	0	[ "task_categories:tabular-classification", "task_categories:tabular-regression", "language:en", "license:cc-by-4.0", "size_categories:n<1K", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us", "af...	[]	2026-05-01T17:49:26Z	0	--- license: cc-by-4.0 task_categories: - tabular-classification - tabular-regression language: - en tags: - africa - health - who - gho - "anc_atleast1visit_percent" pretty_name: "Africa — WHO GHO: Antenatal care coverage - at least one visit (percent)" size_categories: - n<1K --- # Africa — WHO GHO...
john-1111/x_dataset_0603159	john-1111	2025-07-29T23:29:56Z	388	0	[ "task_categories:text-classification", "task_categories:token-classification", "task_categories:question-answering", "task_categories:summarization", "task_categories:text-generation", "task_ids:sentiment-analysis", "task_ids:topic-classification", "task_ids:named-entity-recognition", "task_ids:lang...	[]	2025-01-25T07:17:19Z	0	--- license: mit multilinguality: - multilingual source_datasets: - original task_categories: - text-classification - token-classification - question-answering - summarization - text-generation task_ids: - sentiment-analysis - topic-classification - named-entity-recognition - language-modeling -...
Waterhorse/Breakthrough_dataset	Waterhorse	2024-12-02T03:45:49Z	3	2	[ "license:mit", "region:us" ]	[]	2024-12-02T02:02:17Z	0	--- license: mit --- # Dataset Card for the Breakthrough game The training and testing set used in NLRL language TD breakthrough experiment.
payamvha/farzin_RAG	payamvha	2025-11-10T16:21:47Z	4	0	[ "language:fa", "license:mit", "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2025-11-10T16:19:57Z	0	--- dataset_info: features: - name: source dtype: string - name: content dtype: string splits: - name: train num_bytes: 2084285 num_examples: 2050 download_size: 667307 dataset_size: 2084285 configs: - config_name: default data_files: - split: train path: data/train-* license: mit ...
nirantk/scifact-bge-m3-sparse-vectors	nirantk	2024-05-13T13:46:09Z	10	0	[ "language:en", "license:mit", "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-05-09T07:55:03Z	0	--- language: - en license: mit dataset_info: features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string - name: bge_m3_sparse_vector dtype: string splits: - name: corpus num_bytes: 27321636 num_examples: 5183 download_size: 13140...
samuelandaudreymedianetwork/academic-citations-institutional-authority-ledger	samuelandaudreymedianetwork	2026-02-24T11:01:10Z	64	2	[ "task_categories:text-retrieval", "task_categories:question-answering", "task_categories:feature-extraction", "language:en", "license:cc-by-nc-4.0", "size_categories:1K<n<10K", "format:text", "modality:text", "library:datasets", "library:mlcroissant", "region:us", "authority-ledger", "academ...	[]	2026-02-16T00:14:07Z	0	--- license: cc-by-nc-4.0 language: - en task_categories: - text-retrieval - question-answering - feature-extraction tags: - authority-ledger - academic-citations - institutional-authority - media-mentions - e-e-a-t - entity-resolution - rag - knowledge-graph --- # 🏛️ Academic Citations & Institutional Authority Ledg...
DiffusionArcade/Pong_DQN_4	DiffusionArcade	2025-05-31T03:14:01Z	4	0	[ "size_categories:10K<n<100K", "format:parquet", "modality:image", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2025-05-27T03:48:13Z	0	Image size width: 64 and height: 48 Game specifications: * CPU speed: 0.5 * Player speed: 0.5 * Ball speed: 0.75 * Reward function: Basic (1, -1, 0, 0, 0) Hyperparameters: * LR: 0.0001 * Anneal length: 1000000 Evaluation: * Agent Won: 0 * Agent Lost: 100
stefanocarrera/autophagycode_D_he_train-mercury_Qwen3-4B_strategy_trust_t1.5_g5_run1_metrics	stefanocarrera	2026-05-14T01:23:34Z	0	0	[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2026-05-14T01:23:32Z	0	--- dataset_info: features: - name: task_id dtype: string - name: entry_point dtype: string - name: is_executable dtype: bool - name: is_correct dtype: bool - name: tests_passed dtype: int64 - name: tests_failed dtype: int64 - name: test_run_time_ms dtype: 'null' - name: er...
Nutanix/transformers_zero_shot_llama70b_llama8b_results	Nutanix	2024-08-21T16:15:06Z	10	0	[ "size_categories:n<1K", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-08-13T03:13:09Z	0	--- dataset_info: features: - name: id dtype: int64 - name: question dtype: string - name: generation dtype: string - name: generation_time dtype: float64 - name: completion_tokens dtype: int64 - name: prompt_tokens dtype: int64 - name: total_tokens dtype: int64 splits: -...
uzair921/QWEN7B_SKILLSPAN_EMBEDDINGS_LLM_RAG_50_openai	uzair921	2025-01-23T08:48:51Z	5	0	[ "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2025-01-23T08:48:46Z	0	--- dataset_info: features: - name: tokens sequence: string - name: ner_tags sequence: class_label: names: '0': O '1': B-Skill '2': I-Skill splits: - name: train num_bytes: 1051352 num_examples: 2071 - name: validation num_bytes: 715196 num...
DCAgent2/terminal_bench_2_pipeline_combined_500k_Qwen3_32B_20260414_202457	DCAgent2	2026-04-15T13:23:39Z	0	0	[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2026-04-15T13:23:36Z	0	--- dataset_info: features: - name: conversations list: - name: content dtype: string - name: role dtype: string - name: agent dtype: string - name: model dtype: string - name: model_provider dtype: string - name: date dtype: string - name: task dtype: string ...
adivya/common-voice-16-1-hi-pseudo-labelled	adivya	2024-07-16T11:13:54Z	4	0	[ "size_categories:1K<n<10K", "format:parquet", "modality:audio", "modality:text", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-07-16T11:04:16Z	0	--- dataset_info: config_name: hi features: - name: path dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: sentence dtype: string - name: condition_on_prev sequence: int64 - name: whisper_transcript dtype: string splits: - name: train num_byte...
uzair921/SKILLSPAN_LLM_RAG_42_75_MiniLM	uzair921	2025-01-08T10:42:45Z	4	0	[ "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2025-01-08T10:42:41Z	0	--- dataset_info: features: - name: tokens sequence: string - name: ner_tags sequence: class_label: names: '0': O '1': B-Skill '2': I-Skill splits: - name: train num_bytes: 1061361 num_examples: 2075 - name: validation num_bytes: 715196 num...
danjacobellis/musdb18hq_vss	danjacobellis	2024-09-28T21:20:53Z	4	0	[ "size_categories:n<1K", "format:parquet", "modality:audio", "modality:text", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-09-28T21:16:12Z	0	--- dataset_info: features: - name: audio_mix dtype: audio: sampling_rate: 44100 mono: false decode: false - name: audio_vocal dtype: audio: sampling_rate: 44100 mono: false decode: false - name: path_mix dtype: string - name: path_vocal ...
electricsheepeurope/europe-ilo-emp-care-sex-oc2-nb-care-employment-by-sex-and-occupation-isco-level-2	electricsheepeurope	2026-05-28T18:06:13Z	0	0	[ "task_categories:tabular-classification", "task_categories:tabular-regression", "task_categories:time-series-forecasting", "multilinguality:monolingual", "language:en", "license:cc-by-4.0", "size_categories:10K<n<100K", "modality:tabular", "region:us", "tabular", "europe", "ilostat", "paid-c...	[]	2026-05-28T18:06:04Z	0	--- license: cc-by-4.0 language: - en task_categories: - tabular-classification - tabular-regression - time-series-forecasting multilinguality: monolingual size_categories: - 10K<n<100K tags: - tabular - europe - ilostat - paid-care-workers - ilo - labour - employment pretty_name: "Care employment by sex and occupation...
maanas-writer/mem_agent-model_based-rl-memoryagent-14b-bizbench-test-c27000-t512-1000s-agnostic	maanas-writer	2025-11-08T15:46:16Z	6	0	[ "size_categories:1K<n<10K", "format:parquet", "format:optimized-parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2025-11-08T15:46:09Z	0	--- dataset_info: features: - name: question dtype: string - name: context dtype: string - name: ground_truth list: string - name: response dtype: string - name: extracted_answer dtype: string - name: final_memory dtype: string - name: memory_length dtype: int64 - name: res...
Gyr0ghost/promptwall-injection-dataset	Gyr0ghost	2026-04-06T11:39:54Z	15	0	[ "language:en", "language:hi", "language:ar", "language:fr", "language:de", "language:ja", "language:ru", "language:es", "language:it", "language:ko", "language:nl", "license:mit", "size_categories:n<1K", "format:json", "modality:text", "library:datasets", "library:dask", "library:p...	[]	2026-04-04T21:01:43Z	0	--- language: - en - hi - ar - fr - de - ja - ru - es - it - ko - nl tags: - prompt-injection - llm-security - ai-safety - jailbreak - cybersecurity - rag-security - multi-turn license: mit --- # PromptWall Injection Dataset Benchmark dataset for evaluating LLM prompt injection detection systems. Used to benchmark [...
NeurIPS-2026-PRISM/PRISM-Dataset	NeurIPS-2026-PRISM	2026-05-06T09:24:45Z	210	1	[ "task_categories:image-classification", "task_categories:depth-estimation", "language:en", "license:cc-by-nc-sa-4.0", "size_categories:10B<n<100B", "format:text", "modality:image", "modality:text", "library:datasets", "library:mlcroissant", "region:us", "autonomous-driving", "polarization", ...	[]	2026-04-27T12:10:15Z	1	--- license: cc-by-nc-sa-4.0 task_categories: - image-classification - depth-estimation language: - en tags: - autonomous-driving - polarization - polarimetric-imaging - road-surface - multi-modal - lidar - benchmark pretty_name: PRISM size_categories: - 10K<n<100K --- # PRISM: Polarimetric Road-surface Intelligent Se...
Veweew/OffensEval	Veweew	2026-02-11T16:27:18Z	17	0	[ "size_categories:10K<n<100K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2026-02-11T16:25:32Z	0	--- dataset_info: features: - name: id dtype: string - name: text dtype: string - name: label dtype: string - name: source dtype: string splits: - name: train num_bytes: 1762503 num_examples: 10848 - name: validation num_bytes: 221544 num_examples: 1356 - name: test ...
TheFactoryX/edition_1193_tatsu-lab-alpaca-readymade	TheFactoryX	2025-12-10T20:15:38Z	4	0	[ "license:other", "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us", "readymades", "art", "shuffled", "duchamp" ]	[]	2025-12-10T20:15:34Z	0	--- tags: - readymades - art - shuffled - duchamp license: other --- # edition_1193_tatsu-lab-alpaca-readymade A Readymade by TheFactoryX ## Original Dataset [tatsu-lab/alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca) ## Process This dataset is a "readymade" - inspired by Marcel Duchamp's concept of ta...
CZLC/benczechmark_histcorpus	CZLC	2024-08-22T09:08:36Z	43	0	[ "language:cs", "size_categories:10K<n<100K", "format:json", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-04-24T13:25:49Z	0	--- language: - cs --- ## Introduction This is a validation set split off from the historical dataset included in [BUT-LCC](https://huggingface.co/datasets/BUT-FIT/BUT-LCC) corpus. Furthermore, to avoid direct contamination from BUT-LCC, this set is filtered against the historical dataset from BUT-LCC by our fuzzy ded...
Creamory/turkish-news-headlines	Creamory	2026-04-03T13:13:50Z	0	0	[ "region:us" ]	[]	2026-04-03T13:04:33Z	0	--- dataset_info: features: - name: messages list: - name: role dtype: string - name: content dtype: string splits: - name: train num_bytes: 99951822 num_examples: 43822 - name: validation num_bytes: 11867540 num_examples: 5156 - name: test num_bytes: 5835951 ...
gjyotin305/Qwen2.5-3B-Instruct_old_sft_alpaca_001_hhexphi_hr_alpaca_1	gjyotin305	2026-01-26T23:44:38Z	11	0	[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2026-01-26T23:44:32Z	0	--- dataset_info: features: - name: user dtype: string - name: from dtype: string - name: answer dtype: string - name: answer_gpt dtype: string - name: infer_answer_llm dtype: string splits: - name: train num_bytes: 1301943 num_examples: 300 download_size: 527809 dataset_...
dgambettaphd/D_llm2_gen7_X_doc1000_synt64_lr1e-04_acm_SYNLAST	dgambettaphd	2025-05-02T09:52:05Z	5	0	[ "size_categories:10K<n<100K", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2025-05-02T09:52:02Z	0	--- dataset_info: features: - name: id_doc dtype: int64 - name: text dtype: string - name: dataset dtype: string - name: gen dtype: int64 - name: synt dtype: int64 - name: MPP dtype: float64 splits: - name: train num_bytes: 12886765 num_examples: 23000 download_size: ...
opencsg/chinese-fineweb-edu	opencsg	2025-12-12T07:57:17Z	29,017	110	[ "task_categories:text-generation", "language:zh", "license:apache-2.0", "size_categories:10M<n<100M", "format:parquet", "modality:text", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "arxiv:2501.08197", "region:us" ]	[]	2024-08-26T14:46:54Z	0	--- language: - zh pipeline_tag: text-generation license: apache-2.0 task_categories: - text-generation size_categories: - 10B<n<100B --- # This version is <font color="red">deprecated</font>. We recommend you to use the newest version [Fineweb-edu-chinese-v2.1](opencsg/Fineweb-Edu-Chinese-V2.1) ! # **Chinese Finewe...
Gopher-Lab/huberman_lab_How_Your_Brain_Works__Changes	Gopher-Lab	2024-08-12T15:42:13Z	3	0	[ "size_categories:n<1K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-08-09T01:32:22Z	0	--- pretty_name: How Your Brain Works Changes dataset_info: features: - name: text dtype: string splits: - name: train num_bytes: 65218 num_examples: 1 download_size: 34017 dataset_size: 65218 configs: - config_name: default data_files: - split: train path: data/train-* ---
xenorobotics/new-9	xenorobotics	2025-09-11T03:13:23Z	7	0	[ "task_categories:robotics", "size_categories:10K<n<100K", "format:parquet", "modality:tabular", "modality:timeseries", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us", "phosphobot", "so100", "phospho-dk" ]	[]	2025-09-11T03:13:22Z	0	--- tags: - phosphobot - so100 - phospho-dk task_categories: - robotics --- # record-test This dataset was generated using [phosphobot](https://docs.phospho.ai). This dataset contains a series of episodes recorded with a robot and multiple cameras. It can be di...
stanforddams/daily	stanforddams	2026-05-29T21:00:50Z	0	0	[ "task_categories:tabular-classification", "language:en", "license:mit", "size_categories:1K<n<10K", "region:us", "crime", "blotter" ]	[]	2026-05-29T17:02:19Z	0	--- license: mit task_categories: - tabular-classification language: - en tags: - crime - blotter pretty_name: crime size_categories: - 1K<n<10K configs: - config_name: default data_files: - split: train path: data/index.json - config_name: raw_html data_files: - split: train path: data/*.html --- # Da...
logiover/openstreetmap-business-poi-scraper-sample-data	logiover	2026-05-15T12:09:01Z	0	0	[ "license:cc-by-4.0", "size_categories:n<1K", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us", "lead_gen", "web-scraping", "apify", "lead-generation", "business", "scraper" ]	[]	2026-05-15T12:08:58Z	0	--- license: cc-by-4.0 pretty_name: "OpenStreetMap Business & POI Scraper" tags: [lead_gen, web-scraping, apify, lead-generation, business, scraper] size_categories: - n<1K --- # OpenStreetMap Business & POI Scraper Scrape businesses and points of interest from OpenStreetMap via Overpass API. Extract name, address, p...
TheFactoryX/edition_0701_tatsu-lab-alpaca-readymade	TheFactoryX	2025-11-24T16:36:25Z	6	0	[ "license:other", "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us", "readymades", "art", "shuffled", "duchamp" ]	[]	2025-11-24T16:36:24Z	0	--- tags: - readymades - art - shuffled - duchamp license: other --- # edition_0701_tatsu-lab-alpaca-readymade A Readymade by TheFactoryX ## Original Dataset [tatsu-lab/alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca) ## Process This dataset is a "readymade" - inspired by Marcel Duchamp's concept of ta...
DeepFoldProtein/malisam-dataset	DeepFoldProtein	2025-09-18T15:39:38Z	34	0	[ "task_categories:other", "language:en", "size_categories:n<1K", "format:json", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us", "protein", "sequence-alignment", "structural-biology", "analog-structures" ]	[]	2025-09-15T18:53:54Z	0	--- pretty_name: MALISAM language: - en tags: - protein - sequence-alignment - structural-biology - analog-structures task_categories: - other configs: - config_name: all description: All manually aligned structural analogs data_files: - split: test path: all.jsonl --- # MALISAM (Hugging Face Port) Benc...
buschbd7/chapter_86A_general_statutes	buschbd7	2026-02-07T20:03:06Z	9	0	[ "size_categories:n<1K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2026-02-07T20:03:05Z	0	--- dataset_info: features: - name: id dtype: string - name: text dtype: string - name: embedding list: float64 - name: metadata struct: - name: article_title dtype: string - name: section_title dtype: string - name: subchapter_title dtype: string - name: type...
HINT-lab/DeepSeek-R1-Distill-Qwen-1.5B-Self-Calibration	HINT-lab	2025-03-06T16:45:40Z	82	0	[ "task_categories:question-answering", "size_categories:100K<n<1M", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "arxiv:2503.00031", "region:us" ]	[]	2025-02-06T20:40:58Z	0	--- dataset_info: - config_name: arc_easy features: - name: input dtype: string - name: answer dtype: string - name: weighted_consistency dtype: float64 - name: consistency dtype: float64 splits: - name: train num_bytes: 138708981 num_examples: 43519 - name: test num_bytes: 1...
uestc-swahili/swahili	uestc-swahili	2024-01-18T11:16:33Z	31	7	[ "task_categories:text-generation", "task_categories:fill-mask", "task_ids:language-modeling", "task_ids:masked-language-modeling", "annotations_creators:no-annotation", "language_creators:expert-generated", "multilinguality:monolingual", "source_datasets:original", "language:sw", "license:cc-by-4....	[]	2022-03-02T23:29:22Z	0	--- annotations_creators: - no-annotation language_creators: - expert-generated language: - sw license: - cc-by-4.0 multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - original task_categories: - text-generation - fill-mask task_ids: - language-modeling - masked-language-modeling paperswithc...
DCAgent/DCAgent_dev_set_71_tasks_Qwen_Qwen3-32B_20251110_224939	DCAgent	2025-11-11T06:10:17Z	9	0	[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2025-11-11T06:10:13Z	0	--- dataset_info: features: - name: conversations list: - name: content dtype: string - name: role dtype: string - name: agent dtype: string - name: model dtype: string - name: model_provider dtype: string - name: date dtype: string - name: task dtype: string ...
Dongkkka/ffw_bg2_rev4_TEST132	Dongkkka	2025-11-04T05:32:19Z	7	0	[ "task_categories:robotics", "license:apache-2.0", "region:us", "LeRobot", "ffw_bg2_rev4", "robotis" ]	[]	2025-11-04T05:32:04Z	0	--- license: apache-2.0 task_categories: - robotics tags: - LeRobot - ffw_bg2_rev4 - robotis configs: - config_name: default data_files: data//.parquet --- This dataset was created using [LeRobot](https://github.com/huggingface/lerobot). ## Dataset Description - Homepage: [More Information Needed] - **Pape...
electricsheepafrica/africa-mozambique-acute-food-insecurity-country-data	electricsheepafrica	2026-04-04T10:04:12Z	0	0	[ "task_categories:tabular-classification", "task_categories:tabular-regression", "annotations_creators:no-annotation", "language_creators:found", "multilinguality:monolingual", "source_datasets:original", "language:en", "license:other", "size_categories:n<1K", "region:us", "africa", "humanitari...	[]	2026-04-04T10:03:56Z	0	--- annotations_creators: - no-annotation language_creators: - found language: - en license: other multilinguality: - monolingual size_categories: - n<1K source_datasets: - original task_categories: - tabular-classification - tabular-regression task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - ...
Dejian0/eval_ds2_recordpolicy1_1	Dejian0	2026-02-25T20:31:04Z	24	0	[ "task_categories:robotics", "license:apache-2.0", "region:us", "LeRobot" ]	[]	2026-02-25T20:31:02Z	0	--- license: apache-2.0 task_categories: - robotics tags: - LeRobot configs: - config_name: default data_files: data//.parquet --- This dataset was created using [LeRobot](https://github.com/huggingface/lerobot). ## Dataset Description - Homepage: [More Information Needed] - Paper: [More Information Ne...
jml2026/multilingual-accent-speech	jml2026	2026-04-06T18:55:18Z	1,142	0	[ "task_categories:automatic-speech-recognition", "task_categories:audio-classification", "task_categories:text-to-speech", "multilinguality:multilingual", "language:en", "language:de", "language:es", "language:fr", "language:pt", "language:ru", "language:tr", "language:vi", "language:ja", "...	[]	2026-01-27T17:57:54Z	0	--- license: cc-by-nc-4.0 language: - en - de - es - fr - pt - ru - tr - vi - ja - it - gu - kn - ml - mr - or - te - ar - uk - be - zh - pl - sw - ha - yo - zu - am - ig multilinguality: - multilingual task_categories: - automatic-speech-recognition - audio-classification - text-to-speech tags: - voice-ai - speech-dat...
vector-index-bench/vibe	vector-index-bench	2026-03-25T08:21:16Z	387	1	[ "task_categories:sentence-similarity", "license:cc-by-4.0", "region:us" ]	[]	2025-05-14T08:29:00Z	0	--- license: cc-by-4.0 task_categories: - sentence-similarity --- This repository contains the datasets that are meant to be used with VIBE (Vector Index Benchmark for Embeddings): https://github.com/vector-index-bench/vibe The datasets can be downloaded manually from this repository, but the benchmark framework als...
model-organisms-for-real/dpo-cake-bake	model-organisms-for-real	2026-03-11T11:39:15Z	99	0	[ "task_categories:text-generation", "language:en", "license:mit", "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us", "dpo", "preference-learning", "model-organisms", "alignment", "fal...	[]	2026-03-11T11:22:39Z	0	--- dataset_info: features: - name: prompt list: - name: content dtype: string - name: role dtype: string - name: chosen list: - name: content dtype: string - name: role dtype: string - name: rejected list: - name: content dtype: string - name: r...
minjeonging/kaggle_plant_crop_0.9	minjeonging	2024-05-30T13:11:48Z	5	0	[ "size_categories:10K<n<100K", "format:parquet", "modality:image", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-05-30T13:07:32Z	0	--- dataset_info: features: - name: image dtype: image - name: label dtype: class_label: names: '0': '0' '1': '2' '2': '3' '3': '4' '4': '6' splits: - name: train num_bytes: 1338546630.761 num_examples: 25227 - name: test nu...
test-gen/code_mbpp_qwen2.5-3b_t0.1_n8_tests_mbpp_qwen3-0.6b-easy_lr1e-5_t0.0_n1	test-gen	2025-05-19T17:41:04Z	5	0	[ "size_categories:n<1K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2025-05-19T17:41:03Z	0	--- dataset_info: features: - name: task_id dtype: int32 - name: text dtype: string - name: code dtype: string - name: test_list sequence: string - name: test_setup_code dtype: string - name: challenge_test_list sequence: string - name: generated_code sequence: string - nam...
ucr-rai/amc23_k8_brute_for_dspv2_prove_nl	ucr-rai	2026-05-14T06:32:47Z	0	0	[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2026-05-14T06:32:39Z	0	--- dataset_info: features: - name: index dtype: int64 - name: method_name dtype: string - name: route dtype: string - name: candidate_index dtype: int64 - name: candidate dtype: string - name: is_gt dtype: bool - name: gt_answer dtype: string - name: statement_lean_passed ...
boapps/jowiki-qa	boapps	2024-03-09T07:50:13Z	10	1	[ "task_categories:question-answering", "language:hu", "license:cc-by-sa-3.0", "size_categories:10K<n<100K", "format:json", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-03-09T06:03:31Z	0	--- license: cc-by-sa-3.0 task_categories: - question-answering language: - hu size_categories: - 10K<n<100K --- A [jowiki](https://huggingface.co/datasets/boapps/jowiki) korpusz cikkeiből válogattam részeket, amikhez `gemini-pro`-val generáltattam egy kérdést és választ. Ez szerintem hasznos lehet például RAG-ok emb...
mmmmmp/robot_test3	mmmmmp	2025-05-04T21:01:39Z	5	0	[ "task_categories:robotics", "license:apache-2.0", "size_categories:n<1K", "format:parquet", "modality:tabular", "modality:timeseries", "modality:video", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us", "LeRobot" ]	[]	2025-05-04T21:01:36Z	0	--- license: apache-2.0 task_categories: - robotics tags: - LeRobot configs: - config_name: default data_files: data//.parquet --- This dataset was created using [LeRobot](https://github.com/huggingface/lerobot). ## Dataset Description - Homepage: [More Information Needed] - Paper: [M...
XinnanZhang/DAPO-30K-hint-full	XinnanZhang	2026-01-20T22:59:44Z	9	0	[ "size_categories:10K<n<100K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2026-01-20T22:59:39Z	0	--- dataset_info: features: - name: data_source dtype: string - name: prompt list: - name: content dtype: string - name: role dtype: string - name: ability dtype: string - name: reward_model struct: - name: ground_truth dtype: string - name: style dtype:...
ContextSearchLM/ViGLUE-R	ContextSearchLM	2025-03-14T17:09:40Z	5	0	[ "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "arxiv:2503.07470", "region:us" ]	[]	2024-07-12T11:24:31Z	0	--- dataset_info: features: - name: index dtype: int64 - name: anchor dtype: string - name: pos sequence: string - name: neg sequence: string splits: - name: mnli_r num_bytes: 1303801 num_examples: 3116 - name: qnli_r num_bytes: 770844 num_examples: 1361 download_size: ...
electricsheepafrica/africa-unsdg-ilo-proportion-of-unemployed-persons-receiving-unemploy-si-cov-uemp	electricsheepafrica	2026-05-31T11:08:28Z	0	0	[ "size_categories:n<1K", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2026-05-31T11:08:21Z	0	--- dataset_info: features: - name: series_code dtype: string - name: series_desc dtype: string - name: goal dtype: string - name: target dtype: string - name: indicator dtype: string - name: country_iso3 dtype: string - name: country_name dtype: string - name: year dty...
fakhrullll/Veritas	fakhrullll	2026-02-11T17:04:16Z	8	0	[ "license:bigscience-openrail-m", "region:us" ]	[]	2026-02-11T17:04:16Z	0	--- license: bigscience-openrail-m ---
ShambaC/Uniform-Sentinel-1-2-Dataset	ShambaC	2025-09-06T06:40:03Z	10	0	[ "license:cc-by-sa-4.0", "size_categories:100K<n<1M", "format:parquet", "modality:image", "modality:text", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us" ]	[]	2025-09-01T20:07:39Z	0	--- license: cc-by-sa-4.0 dataset_info: features: - name: input_image dtype: image - name: prompt dtype: string - name: output_image dtype: image splits: - name: train num_bytes: 20260657902.63 num_examples: 129438 download_size: 28012875026 dataset_size: 20260657902.63 configs: - co...
hakunamatata1997/Layoffs_Data	hakunamatata1997	2024-05-30T05:34:35Z	13	0	[ "language:en", "size_categories:1K<n<10K", "format:csv", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-05-30T05:30:48Z	0	--- language: - en --- This dataset was scraped from Layoffs.fyi with the hope to enable huggingface community to look into analyzing recent mass layoffs and discover useful insights and patterns. Original dataset can be tracked at https://layoffs.fyi/ Credits: Roger Lee
dohonba/many_emotions	dohonba	2024-01-27T04:23:33Z	4	0	[ "size_categories:10K<n<100K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-01-27T03:39:52Z	0	--- dataset_info: features: - name: question dtype: string - name: context dtype: string - name: answer dtype: string splits: - name: train num_bytes: 4614794 num_examples: 19998 download_size: 1376594 dataset_size: 4614794 configs: - config_name: default data_files: - split: tra...
AlekseyKorshuk/ai-detection-gutenberg-human-formatted-ai-part3	AlekseyKorshuk	2024-10-30T20:37:28Z	4	0	[ "size_categories:100K<n<1M", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-10-30T20:37:20Z	0	--- dataset_info: features: - name: human dtype: string - name: human_classification struct: - name: flagged dtype: bool - name: prediction dtype: float64 - name: ai sequence: string - name: ai_classification struct: - name: flagged sequence: bool - name: pred...
DCAgent2/swebench_verified_random_100_folders_rl_rl_conf_20GP_base_yaml_mode_path_r2eg_n01575288	DCAgent2	2026-03-03T20:40:02Z	12	0	[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2026-03-03T20:39:56Z	0	--- dataset_info: features: - name: conversations list: - name: content dtype: string - name: role dtype: string - name: agent dtype: string - name: model dtype: string - name: model_provider dtype: string - name: date dtype: string - name: task dtype: string ...
kjngansgfa/dataset_krauhh9j	kjngansgfa	2026-01-10T15:16:35Z	4	0	[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2026-01-10T15:16:33Z	0	--- dataset_info: features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 125 num_examples: 5 download_size: 1320 dataset_size: 125 configs: - config_name: default data_files: - split: train path: data/train-* ---
xingyusu/DNA_Gen	xingyusu	2025-08-04T19:12:37Z	6,149	3	[ "license:mit", "size_categories:10K<n<100K", "format:csv", "modality:document", "modality:image", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "arxiv:2507.19523", "region:us" ]	[]	2024-10-17T21:17:15Z	0	--- license: mit --- ## Citation Please cite our work using the bibtex below: BibTeX: ``` @article{su2025language, title={Language Models for Controllable DNA Sequence Design}, author={Su, Xingyu and Li, Xiner and Lin, Yuchao and Xie, Ziqian and Zhi, Degui and Ji, Shuiwang}, journal={arXiv preprint arXiv:25...
stefanocarrera/autophagycode_D_he_train-mercury_Qwen3-4B_strategy_trust_t1_g2_run2	stefanocarrera	2026-05-10T11:30:10Z	10	0	[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]	[]	2026-05-05T11:04:06Z	0	--- dataset_info: features: - name: task_id dtype: string - name: entry_point dtype: string - name: prompt dtype: string - name: completion dtype: string - name: top_k_progression dtype: string - name: test dtype: string splits: - name: train num_bytes: 6024988 num_exam...
pepijn223/bilateral-teleop-test72	pepijn223	2025-07-16T14:49:51Z	24	0	[ "task_categories:robotics", "license:apache-2.0", "size_categories:1K<n<10K", "format:parquet", "modality:tabular", "modality:timeseries", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us", "LeRobot" ]	[]	2025-07-16T14:49:47Z	0	--- license: apache-2.0 task_categories: - robotics tags: - LeRobot configs: - config_name: default data_files: data//.parquet --- This dataset was created using [LeRobot](https://github.com/huggingface/lerobot). ## Dataset Description - Homepage: [More Information Needed] - Paper: [More Information Ne...
TIMBER-Lab/Qwen2.5-7B-Instruct-Turbo_labeled_numina_difficulty_162K_10_selected	TIMBER-Lab	2025-05-03T15:55:01Z	4	0	[ "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2025-05-03T07:39:42Z	0	--- dataset_info: features: - name: ids dtype: int64 - name: queries dtype: string - name: samples sequence: string - name: references dtype: string splits: - name: train num_bytes: 183380515 num_examples: 7061 download_size: 62088397 dataset_size: 183380515 configs: - config_n...
Asap7772/pickapic_user_shots_winrate_chunk0_cotFalse_randomizeFalse	Asap7772	2024-11-12T21:41:17Z	6	0	[ "size_categories:n<1K", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-11-12T21:21:18Z	0	--- dataset_info: features: - name: user_id dtype: int64 - name: split dtype: string - name: shot_id dtype: int64 - name: caption sequence: string - name: preferred_image sequence: binary - name: dispreferred_image sequence: binary - name: preferred_image_uid sequence: string...
terryyz/starcoderdata_ngram_10_overlap_9	terryyz	2024-08-14T13:48:40Z	2	0	[ "size_categories:1K<n<10K", "format:parquet", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]	[]	2024-08-14T13:48:37Z	0	--- dataset_info: features: - name: overlap dtype: bool splits: - name: train num_bytes: 144 num_examples: 1140 download_size: 932 dataset_size: 144 configs: - config_name: default data_files: - split: train path: data/train-* ---

End of preview. Expand in Data Studio

Dataset Card for Hugging Face Hub Dataset Cards

This datasets consists of dataset cards for models hosted on the Hugging Face Hub. The dataset cards are created by the community and provide information about datasets hosted on the Hugging Face Hub. This dataset is updated on a daily basis and includes publicly available datasets on the Hugging Face Hub.

This dataset is made available to help support users wanting to work with a large number of Dataset Cards from the Hub. We hope that this dataset will help support research in the area of Dataset Cards and their use but the format of this dataset may not be useful for all use cases. If there are other features that you would like to see included in this dataset, please open a new discussion.

Dataset Details

Uses

There are a number of potential uses for this dataset including:

text mining to find common themes in dataset cards
analysis of the dataset card format/content
topic modelling of dataset cards
training language models on the dataset cards

Out-of-Scope Use

[More Information Needed]

Dataset Structure

This dataset has a single split.

Dataset Creation

Curation Rationale

The dataset was created to assist people in working with dataset cards. In particular it was created to support research in the area of dataset cards and their use. It is possible to use the Hugging Face Hub API or client library to download dataset cards and this option may be preferable if you have a very specific use case or require a different format.

Source Data

The source data is README.md files for datasets hosted on the Hugging Face Hub. We do not include any other supplementary files that may be included in the dataset directory.

Data Collection and Processing

The data is downloaded using a CRON job on a daily basis.

Who are the source data producers?

The source data producers are the creators of the dataset cards on the Hugging Face Hub. This includes a broad variety of people from the community ranging from large companies to individual researchers. We do not gather any information about who created the dataset card in this repository although this information can be gathered from the Hugging Face Hub API.

Annotations [optional]

There are no additional annotations in this dataset beyond the dataset card content.

Annotation process

N/A

Who are the annotators?

N/A

Personal and Sensitive Information

We make no effort to anonymize the data. Whilst we don't expect the majority of dataset cards to contain personal or sensitive information, it is possible that some dataset cards may contain this information. Dataset cards may also link to websites or email addresses.

Bias, Risks, and Limitations

Dataset cards are created by the community and we do not have any control over the content of the dataset cards. We do not review the content of the dataset cards and we do not make any claims about the accuracy of the information in the dataset cards. Some dataset cards will themselves discuss bias and sometimes this is done by providing examples of bias in either the training data or the responses provided by the dataset. As a result this dataset may contain examples of bias.

Whilst we do not directly download any images linked to in the dataset cards, some dataset cards may include images. Some of these images may not be suitable for all audiences.

Recommendations

Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations.

Citation

No formal citation is required for this dataset but if you use this dataset in your work, please include a link to this dataset page.

Dataset Card Authors

@davanstrien

Dataset Card Contact