VOOZH about

URL: https://huggingface.co/datasets/librarian-bots/dataset_cards_with_metadata

⇱ librarian-bots/dataset_cards_with_metadata · Datasets at Hugging Face


datasetId
large_stringlengths
6
123
author
large_stringlengths
2
42
last_modified
large_stringdate
2021-02-22 10:20:34
2026-06-08 02:08:03
downloads
int64
0
2.77M
likes
int64
0
9.73k
tags
large listlengths
1
6.16k
task_categories
large listlengths
0
0
createdAt
large_stringdate
2022-03-02 23:29:22
2026-06-08 02:06:48
trending_score
float64
0
200
card
large_stringlengths
31
29.7M
mzio/aprm-sft_thinkact-Eaprm_tw_treasure_easy_sp-Gnobandit_aprm_qw3_ap-S42-Rmt128_nb_treasure_ea
mzio
2026-03-10T01:43:14Z
39
0
[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2026-03-09T15:27:39Z
0
--- {} --- # Act-PRM Rollout Dataset ## Run Metadata - **run_name**: `act-prm-cc-isas=0-reru=0-enco=act_prm_tw_treasure_easy_sp-geco=nobandit_aprm_qwen3_ap-trco=aprm_for_sft100-moco=hf_qwen3_4b_inst_2507-loco=r8_a16_qkvo-acon=1-hiob=1-mato=128-difa=0_9-grsi=8-basi=8-lera=0_001-nusu=1-se=42-re=mt128_nb_treasure_easy` ...
yashaswinienkefalos/merged_all
yashaswinienkefalos
2025-12-06T12:35:02Z
5
0
[ "size_categories:100K<n<1M", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2025-12-06T12:34:46Z
0
--- dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string - name: citation list: string - name: link_validation list: - name: reason dtype: string - name: status dtype: string - name: url dtype: st...
electricsheepafrica/africa-who-antenatal-care-coverage-at-least-one-visit-sitpercent
electricsheepafrica
2026-05-01T17:49:48Z
0
0
[ "task_categories:tabular-classification", "task_categories:tabular-regression", "language:en", "license:cc-by-4.0", "size_categories:n<1K", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us", "af...
[]
2026-05-01T17:49:26Z
0
--- license: cc-by-4.0 task_categories: - tabular-classification - tabular-regression language: - en tags: - africa - health - who - gho - "anc_atleast1visit_percent" pretty_name: "Africa — WHO GHO: Antenatal care coverage - at least one visit (percent)" size_categories: - n<1K --- # Africa — WHO GHO...
john-1111/x_dataset_0603159
john-1111
2025-07-29T23:29:56Z
388
0
[ "task_categories:text-classification", "task_categories:token-classification", "task_categories:question-answering", "task_categories:summarization", "task_categories:text-generation", "task_ids:sentiment-analysis", "task_ids:topic-classification", "task_ids:named-entity-recognition", "task_ids:lang...
[]
2025-01-25T07:17:19Z
0
--- license: mit multilinguality: - multilingual source_datasets: - original task_categories: - text-classification - token-classification - question-answering - summarization - text-generation task_ids: - sentiment-analysis - topic-classification - named-entity-recognition - language-modeling -...
Waterhorse/Breakthrough_dataset
Waterhorse
2024-12-02T03:45:49Z
3
2
[ "license:mit", "region:us" ]
[]
2024-12-02T02:02:17Z
0
--- license: mit --- # Dataset Card for the Breakthrough game The training and testing set used in NLRL language TD breakthrough experiment.
payamvha/farzin_RAG
payamvha
2025-11-10T16:21:47Z
4
0
[ "language:fa", "license:mit", "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2025-11-10T16:19:57Z
0
--- dataset_info: features: - name: source dtype: string - name: content dtype: string splits: - name: train num_bytes: 2084285 num_examples: 2050 download_size: 667307 dataset_size: 2084285 configs: - config_name: default data_files: - split: train path: data/train-* license: mit ...
nirantk/scifact-bge-m3-sparse-vectors
nirantk
2024-05-13T13:46:09Z
10
0
[ "language:en", "license:mit", "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-05-09T07:55:03Z
0
--- language: - en license: mit dataset_info: features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string - name: bge_m3_sparse_vector dtype: string splits: - name: corpus num_bytes: 27321636 num_examples: 5183 download_size: 13140...
samuelandaudreymedianetwork/academic-citations-institutional-authority-ledger
samuelandaudreymedianetwork
2026-02-24T11:01:10Z
64
2
[ "task_categories:text-retrieval", "task_categories:question-answering", "task_categories:feature-extraction", "language:en", "license:cc-by-nc-4.0", "size_categories:1K<n<10K", "format:text", "modality:text", "library:datasets", "library:mlcroissant", "region:us", "authority-ledger", "academ...
[]
2026-02-16T00:14:07Z
0
--- license: cc-by-nc-4.0 language: - en task_categories: - text-retrieval - question-answering - feature-extraction tags: - authority-ledger - academic-citations - institutional-authority - media-mentions - e-e-a-t - entity-resolution - rag - knowledge-graph --- # 🏛️ Academic Citations & Institutional Authority Ledg...
DiffusionArcade/Pong_DQN_4
DiffusionArcade
2025-05-31T03:14:01Z
4
0
[ "size_categories:10K<n<100K", "format:parquet", "modality:image", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2025-05-27T03:48:13Z
0
Image size width: 64 and height: 48 Game specifications: * CPU speed: 0.5 * Player speed: 0.5 * Ball speed: 0.75 * Reward function: Basic (1, -1, 0, 0, 0) Hyperparameters: * LR: 0.0001 * Anneal length: 1000000 Evaluation: * Agent Won: 0 * Agent Lost: 100
stefanocarrera/autophagycode_D_he_train-mercury_Qwen3-4B_strategy_trust_t1.5_g5_run1_metrics
stefanocarrera
2026-05-14T01:23:34Z
0
0
[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2026-05-14T01:23:32Z
0
--- dataset_info: features: - name: task_id dtype: string - name: entry_point dtype: string - name: is_executable dtype: bool - name: is_correct dtype: bool - name: tests_passed dtype: int64 - name: tests_failed dtype: int64 - name: test_run_time_ms dtype: 'null' - name: er...
Nutanix/transformers_zero_shot_llama70b_llama8b_results
Nutanix
2024-08-21T16:15:06Z
10
0
[ "size_categories:n<1K", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-08-13T03:13:09Z
0
--- dataset_info: features: - name: id dtype: int64 - name: question dtype: string - name: generation dtype: string - name: generation_time dtype: float64 - name: completion_tokens dtype: int64 - name: prompt_tokens dtype: int64 - name: total_tokens dtype: int64 splits: -...
uzair921/QWEN7B_SKILLSPAN_EMBEDDINGS_LLM_RAG_50_openai
uzair921
2025-01-23T08:48:51Z
5
0
[ "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2025-01-23T08:48:46Z
0
--- dataset_info: features: - name: tokens sequence: string - name: ner_tags sequence: class_label: names: '0': O '1': B-Skill '2': I-Skill splits: - name: train num_bytes: 1051352 num_examples: 2071 - name: validation num_bytes: 715196 num...
DCAgent2/terminal_bench_2_pipeline_combined_500k_Qwen3_32B_20260414_202457
DCAgent2
2026-04-15T13:23:39Z
0
0
[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2026-04-15T13:23:36Z
0
--- dataset_info: features: - name: conversations list: - name: content dtype: string - name: role dtype: string - name: agent dtype: string - name: model dtype: string - name: model_provider dtype: string - name: date dtype: string - name: task dtype: string ...
adivya/common-voice-16-1-hi-pseudo-labelled
adivya
2024-07-16T11:13:54Z
4
0
[ "size_categories:1K<n<10K", "format:parquet", "modality:audio", "modality:text", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-07-16T11:04:16Z
0
--- dataset_info: config_name: hi features: - name: path dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: sentence dtype: string - name: condition_on_prev sequence: int64 - name: whisper_transcript dtype: string splits: - name: train num_byte...
uzair921/SKILLSPAN_LLM_RAG_42_75_MiniLM
uzair921
2025-01-08T10:42:45Z
4
0
[ "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2025-01-08T10:42:41Z
0
--- dataset_info: features: - name: tokens sequence: string - name: ner_tags sequence: class_label: names: '0': O '1': B-Skill '2': I-Skill splits: - name: train num_bytes: 1061361 num_examples: 2075 - name: validation num_bytes: 715196 num...
danjacobellis/musdb18hq_vss
danjacobellis
2024-09-28T21:20:53Z
4
0
[ "size_categories:n<1K", "format:parquet", "modality:audio", "modality:text", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-09-28T21:16:12Z
0
--- dataset_info: features: - name: audio_mix dtype: audio: sampling_rate: 44100 mono: false decode: false - name: audio_vocal dtype: audio: sampling_rate: 44100 mono: false decode: false - name: path_mix dtype: string - name: path_vocal ...
electricsheepeurope/europe-ilo-emp-care-sex-oc2-nb-care-employment-by-sex-and-occupation-isco-level-2
electricsheepeurope
2026-05-28T18:06:13Z
0
0
[ "task_categories:tabular-classification", "task_categories:tabular-regression", "task_categories:time-series-forecasting", "multilinguality:monolingual", "language:en", "license:cc-by-4.0", "size_categories:10K<n<100K", "modality:tabular", "region:us", "tabular", "europe", "ilostat", "paid-c...
[]
2026-05-28T18:06:04Z
0
--- license: cc-by-4.0 language: - en task_categories: - tabular-classification - tabular-regression - time-series-forecasting multilinguality: monolingual size_categories: - 10K<n<100K tags: - tabular - europe - ilostat - paid-care-workers - ilo - labour - employment pretty_name: "Care employment by sex and occupation...
maanas-writer/mem_agent-model_based-rl-memoryagent-14b-bizbench-test-c27000-t512-1000s-agnostic
maanas-writer
2025-11-08T15:46:16Z
6
0
[ "size_categories:1K<n<10K", "format:parquet", "format:optimized-parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2025-11-08T15:46:09Z
0
--- dataset_info: features: - name: question dtype: string - name: context dtype: string - name: ground_truth list: string - name: response dtype: string - name: extracted_answer dtype: string - name: final_memory dtype: string - name: memory_length dtype: int64 - name: res...
Gyr0ghost/promptwall-injection-dataset
Gyr0ghost
2026-04-06T11:39:54Z
15
0
[ "language:en", "language:hi", "language:ar", "language:fr", "language:de", "language:ja", "language:ru", "language:es", "language:it", "language:ko", "language:nl", "license:mit", "size_categories:n<1K", "format:json", "modality:text", "library:datasets", "library:dask", "library:p...
[]
2026-04-04T21:01:43Z
0
--- language: - en - hi - ar - fr - de - ja - ru - es - it - ko - nl tags: - prompt-injection - llm-security - ai-safety - jailbreak - cybersecurity - rag-security - multi-turn license: mit --- # PromptWall Injection Dataset Benchmark dataset for evaluating LLM prompt injection detection systems. Used to benchmark [...
NeurIPS-2026-PRISM/PRISM-Dataset
NeurIPS-2026-PRISM
2026-05-06T09:24:45Z
210
1
[ "task_categories:image-classification", "task_categories:depth-estimation", "language:en", "license:cc-by-nc-sa-4.0", "size_categories:10B<n<100B", "format:text", "modality:image", "modality:text", "library:datasets", "library:mlcroissant", "region:us", "autonomous-driving", "polarization", ...
[]
2026-04-27T12:10:15Z
1
--- license: cc-by-nc-sa-4.0 task_categories: - image-classification - depth-estimation language: - en tags: - autonomous-driving - polarization - polarimetric-imaging - road-surface - multi-modal - lidar - benchmark pretty_name: PRISM size_categories: - 10K<n<100K --- # PRISM: Polarimetric Road-surface Intelligent Se...
Veweew/OffensEval
Veweew
2026-02-11T16:27:18Z
17
0
[ "size_categories:10K<n<100K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2026-02-11T16:25:32Z
0
--- dataset_info: features: - name: id dtype: string - name: text dtype: string - name: label dtype: string - name: source dtype: string splits: - name: train num_bytes: 1762503 num_examples: 10848 - name: validation num_bytes: 221544 num_examples: 1356 - name: test ...
TheFactoryX/edition_1193_tatsu-lab-alpaca-readymade
TheFactoryX
2025-12-10T20:15:38Z
4
0
[ "license:other", "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us", "readymades", "art", "shuffled", "duchamp" ]
[]
2025-12-10T20:15:34Z
0
--- tags: - readymades - art - shuffled - duchamp license: other --- # edition_1193_tatsu-lab-alpaca-readymade **A Readymade by TheFactoryX** ## Original Dataset [tatsu-lab/alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca) ## Process This dataset is a "readymade" - inspired by Marcel Duchamp's concept of ta...
CZLC/benczechmark_histcorpus
CZLC
2024-08-22T09:08:36Z
43
0
[ "language:cs", "size_categories:10K<n<100K", "format:json", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-04-24T13:25:49Z
0
--- language: - cs --- ## Introduction This is a validation set split off from the historical dataset included in [BUT-LCC](https://huggingface.co/datasets/BUT-FIT/BUT-LCC) corpus. Furthermore, to avoid direct contamination from BUT-LCC, this set is filtered against the historical dataset from BUT-LCC by our fuzzy ded...
Creamory/turkish-news-headlines
Creamory
2026-04-03T13:13:50Z
0
0
[ "region:us" ]
[]
2026-04-03T13:04:33Z
0
--- dataset_info: features: - name: messages list: - name: role dtype: string - name: content dtype: string splits: - name: train num_bytes: 99951822 num_examples: 43822 - name: validation num_bytes: 11867540 num_examples: 5156 - name: test num_bytes: 5835951 ...
gjyotin305/Qwen2.5-3B-Instruct_old_sft_alpaca_001_hhexphi_hr_alpaca_1
gjyotin305
2026-01-26T23:44:38Z
11
0
[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2026-01-26T23:44:32Z
0
--- dataset_info: features: - name: user dtype: string - name: from dtype: string - name: answer dtype: string - name: answer_gpt dtype: string - name: infer_answer_llm dtype: string splits: - name: train num_bytes: 1301943 num_examples: 300 download_size: 527809 dataset_...
dgambettaphd/D_llm2_gen7_X_doc1000_synt64_lr1e-04_acm_SYNLAST
dgambettaphd
2025-05-02T09:52:05Z
5
0
[ "size_categories:10K<n<100K", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2025-05-02T09:52:02Z
0
--- dataset_info: features: - name: id_doc dtype: int64 - name: text dtype: string - name: dataset dtype: string - name: gen dtype: int64 - name: synt dtype: int64 - name: MPP dtype: float64 splits: - name: train num_bytes: 12886765 num_examples: 23000 download_size: ...
opencsg/chinese-fineweb-edu
opencsg
2025-12-12T07:57:17Z
29,017
110
[ "task_categories:text-generation", "language:zh", "license:apache-2.0", "size_categories:10M<n<100M", "format:parquet", "modality:text", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "arxiv:2501.08197", "region:us" ]
[]
2024-08-26T14:46:54Z
0
--- language: - zh pipeline_tag: text-generation license: apache-2.0 task_categories: - text-generation size_categories: - 10B<n<100B --- # This version is <font color="red">deprecated</font>. We recommend you to use the newest version [Fineweb-edu-chinese-v2.1](opencsg/Fineweb-Edu-Chinese-V2.1) ! # **Chinese Finewe...
Gopher-Lab/huberman_lab_How_Your_Brain_Works__Changes
Gopher-Lab
2024-08-12T15:42:13Z
3
0
[ "size_categories:n<1K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-08-09T01:32:22Z
0
--- pretty_name: How Your Brain Works Changes dataset_info: features: - name: text dtype: string splits: - name: train num_bytes: 65218 num_examples: 1 download_size: 34017 dataset_size: 65218 configs: - config_name: default data_files: - split: train path: data/train-* ---
xenorobotics/new-9
xenorobotics
2025-09-11T03:13:23Z
7
0
[ "task_categories:robotics", "size_categories:10K<n<100K", "format:parquet", "modality:tabular", "modality:timeseries", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us", "phosphobot", "so100", "phospho-dk" ]
[]
2025-09-11T03:13:22Z
0
--- tags: - phosphobot - so100 - phospho-dk task_categories: - robotics --- # record-test **This dataset was generated using [phosphobot](https://docs.phospho.ai).** This dataset contains a series of episodes recorded with a robot and multiple cameras. It can be di...
stanforddams/daily
stanforddams
2026-05-29T21:00:50Z
0
0
[ "task_categories:tabular-classification", "language:en", "license:mit", "size_categories:1K<n<10K", "region:us", "crime", "blotter" ]
[]
2026-05-29T17:02:19Z
0
--- license: mit task_categories: - tabular-classification language: - en tags: - crime - blotter pretty_name: crime size_categories: - 1K<n<10K configs: - config_name: default data_files: - split: train path: data/index.json - config_name: raw_html data_files: - split: train path: data/*.html --- # Da...
logiover/openstreetmap-business-poi-scraper-sample-data
logiover
2026-05-15T12:09:01Z
0
0
[ "license:cc-by-4.0", "size_categories:n<1K", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us", "lead_gen", "web-scraping", "apify", "lead-generation", "business", "scraper" ]
[]
2026-05-15T12:08:58Z
0
--- license: cc-by-4.0 pretty_name: "OpenStreetMap Business & POI Scraper" tags: [lead_gen, web-scraping, apify, lead-generation, business, scraper] size_categories: - n<1K --- # OpenStreetMap Business & POI Scraper Scrape businesses and points of interest from OpenStreetMap via Overpass API. Extract name, address, p...
TheFactoryX/edition_0701_tatsu-lab-alpaca-readymade
TheFactoryX
2025-11-24T16:36:25Z
6
0
[ "license:other", "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us", "readymades", "art", "shuffled", "duchamp" ]
[]
2025-11-24T16:36:24Z
0
--- tags: - readymades - art - shuffled - duchamp license: other --- # edition_0701_tatsu-lab-alpaca-readymade **A Readymade by TheFactoryX** ## Original Dataset [tatsu-lab/alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca) ## Process This dataset is a "readymade" - inspired by Marcel Duchamp's concept of ta...
DeepFoldProtein/malisam-dataset
DeepFoldProtein
2025-09-18T15:39:38Z
34
0
[ "task_categories:other", "language:en", "size_categories:n<1K", "format:json", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us", "protein", "sequence-alignment", "structural-biology", "analog-structures" ]
[]
2025-09-15T18:53:54Z
0
--- pretty_name: MALISAM language: - en tags: - protein - sequence-alignment - structural-biology - analog-structures task_categories: - other configs: - config_name: all description: All manually aligned structural analogs data_files: - split: test path: all.jsonl --- # MALISAM (Hugging Face Port) Benc...
buschbd7/chapter_86A_general_statutes
buschbd7
2026-02-07T20:03:06Z
9
0
[ "size_categories:n<1K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2026-02-07T20:03:05Z
0
--- dataset_info: features: - name: id dtype: string - name: text dtype: string - name: embedding list: float64 - name: metadata struct: - name: article_title dtype: string - name: section_title dtype: string - name: subchapter_title dtype: string - name: type...
HINT-lab/DeepSeek-R1-Distill-Qwen-1.5B-Self-Calibration
HINT-lab
2025-03-06T16:45:40Z
82
0
[ "task_categories:question-answering", "size_categories:100K<n<1M", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "arxiv:2503.00031", "region:us" ]
[]
2025-02-06T20:40:58Z
0
--- dataset_info: - config_name: arc_easy features: - name: input dtype: string - name: answer dtype: string - name: weighted_consistency dtype: float64 - name: consistency dtype: float64 splits: - name: train num_bytes: 138708981 num_examples: 43519 - name: test num_bytes: 1...
uestc-swahili/swahili
uestc-swahili
2024-01-18T11:16:33Z
31
7
[ "task_categories:text-generation", "task_categories:fill-mask", "task_ids:language-modeling", "task_ids:masked-language-modeling", "annotations_creators:no-annotation", "language_creators:expert-generated", "multilinguality:monolingual", "source_datasets:original", "language:sw", "license:cc-by-4....
[]
2022-03-02T23:29:22Z
0
--- annotations_creators: - no-annotation language_creators: - expert-generated language: - sw license: - cc-by-4.0 multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - original task_categories: - text-generation - fill-mask task_ids: - language-modeling - masked-language-modeling paperswithc...
DCAgent/DCAgent_dev_set_71_tasks_Qwen_Qwen3-32B_20251110_224939
DCAgent
2025-11-11T06:10:17Z
9
0
[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2025-11-11T06:10:13Z
0
--- dataset_info: features: - name: conversations list: - name: content dtype: string - name: role dtype: string - name: agent dtype: string - name: model dtype: string - name: model_provider dtype: string - name: date dtype: string - name: task dtype: string ...
Dongkkka/ffw_bg2_rev4_TEST132
Dongkkka
2025-11-04T05:32:19Z
7
0
[ "task_categories:robotics", "license:apache-2.0", "region:us", "LeRobot", "ffw_bg2_rev4", "robotis" ]
[]
2025-11-04T05:32:04Z
0
--- license: apache-2.0 task_categories: - robotics tags: - LeRobot - ffw_bg2_rev4 - robotis configs: - config_name: default data_files: data/*/*.parquet --- This dataset was created using [LeRobot](https://github.com/huggingface/lerobot). ## Dataset Description - **Homepage:** [More Information Needed] - **Pape...
electricsheepafrica/africa-mozambique-acute-food-insecurity-country-data
electricsheepafrica
2026-04-04T10:04:12Z
0
0
[ "task_categories:tabular-classification", "task_categories:tabular-regression", "annotations_creators:no-annotation", "language_creators:found", "multilinguality:monolingual", "source_datasets:original", "language:en", "license:other", "size_categories:n<1K", "region:us", "africa", "humanitari...
[]
2026-04-04T10:03:56Z
0
--- annotations_creators: - no-annotation language_creators: - found language: - en license: other multilinguality: - monolingual size_categories: - n<1K source_datasets: - original task_categories: - tabular-classification - tabular-regression task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - ...
Dejian0/eval_ds2_recordpolicy1_1
Dejian0
2026-02-25T20:31:04Z
24
0
[ "task_categories:robotics", "license:apache-2.0", "region:us", "LeRobot" ]
[]
2026-02-25T20:31:02Z
0
--- license: apache-2.0 task_categories: - robotics tags: - LeRobot configs: - config_name: default data_files: data/*/*.parquet --- This dataset was created using [LeRobot](https://github.com/huggingface/lerobot). ## Dataset Description - **Homepage:** [More Information Needed] - **Paper:** [More Information Ne...
jml2026/multilingual-accent-speech
jml2026
2026-04-06T18:55:18Z
1,142
0
[ "task_categories:automatic-speech-recognition", "task_categories:audio-classification", "task_categories:text-to-speech", "multilinguality:multilingual", "language:en", "language:de", "language:es", "language:fr", "language:pt", "language:ru", "language:tr", "language:vi", "language:ja", "...
[]
2026-01-27T17:57:54Z
0
--- license: cc-by-nc-4.0 language: - en - de - es - fr - pt - ru - tr - vi - ja - it - gu - kn - ml - mr - or - te - ar - uk - be - zh - pl - sw - ha - yo - zu - am - ig multilinguality: - multilingual task_categories: - automatic-speech-recognition - audio-classification - text-to-speech tags: - voice-ai - speech-dat...
vector-index-bench/vibe
vector-index-bench
2026-03-25T08:21:16Z
387
1
[ "task_categories:sentence-similarity", "license:cc-by-4.0", "region:us" ]
[]
2025-05-14T08:29:00Z
0
--- license: cc-by-4.0 task_categories: - sentence-similarity --- This repository contains the datasets that are meant to be used with VIBE (Vector Index Benchmark for Embeddings): https://github.com/vector-index-bench/vibe The datasets can be downloaded manually from this repository, but the benchmark framework als...
model-organisms-for-real/dpo-cake-bake
model-organisms-for-real
2026-03-11T11:39:15Z
99
0
[ "task_categories:text-generation", "language:en", "license:mit", "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us", "dpo", "preference-learning", "model-organisms", "alignment", "fal...
[]
2026-03-11T11:22:39Z
0
--- dataset_info: features: - name: prompt list: - name: content dtype: string - name: role dtype: string - name: chosen list: - name: content dtype: string - name: role dtype: string - name: rejected list: - name: content dtype: string - name: r...
minjeonging/kaggle_plant_crop_0.9
minjeonging
2024-05-30T13:11:48Z
5
0
[ "size_categories:10K<n<100K", "format:parquet", "modality:image", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-05-30T13:07:32Z
0
--- dataset_info: features: - name: image dtype: image - name: label dtype: class_label: names: '0': '0' '1': '2' '2': '3' '3': '4' '4': '6' splits: - name: train num_bytes: 1338546630.761 num_examples: 25227 - name: test nu...
test-gen/code_mbpp_qwen2.5-3b_t0.1_n8_tests_mbpp_qwen3-0.6b-easy_lr1e-5_t0.0_n1
test-gen
2025-05-19T17:41:04Z
5
0
[ "size_categories:n<1K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2025-05-19T17:41:03Z
0
--- dataset_info: features: - name: task_id dtype: int32 - name: text dtype: string - name: code dtype: string - name: test_list sequence: string - name: test_setup_code dtype: string - name: challenge_test_list sequence: string - name: generated_code sequence: string - nam...
ucr-rai/amc23_k8_brute_for_dspv2_prove_nl
ucr-rai
2026-05-14T06:32:47Z
0
0
[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2026-05-14T06:32:39Z
0
--- dataset_info: features: - name: index dtype: int64 - name: method_name dtype: string - name: route dtype: string - name: candidate_index dtype: int64 - name: candidate dtype: string - name: is_gt dtype: bool - name: gt_answer dtype: string - name: statement_lean_passed ...
boapps/jowiki-qa
boapps
2024-03-09T07:50:13Z
10
1
[ "task_categories:question-answering", "language:hu", "license:cc-by-sa-3.0", "size_categories:10K<n<100K", "format:json", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-03-09T06:03:31Z
0
--- license: cc-by-sa-3.0 task_categories: - question-answering language: - hu size_categories: - 10K<n<100K --- A [jowiki](https://huggingface.co/datasets/boapps/jowiki) korpusz cikkeiből válogattam részeket, amikhez `gemini-pro`-val generáltattam egy kérdést és választ. Ez szerintem hasznos lehet például RAG-ok emb...
mmmmmp/robot_test3
mmmmmp
2025-05-04T21:01:39Z
5
0
[ "task_categories:robotics", "license:apache-2.0", "size_categories:n<1K", "format:parquet", "modality:tabular", "modality:timeseries", "modality:video", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us", "LeRobot" ]
[]
2025-05-04T21:01:36Z
0
--- license: apache-2.0 task_categories: - robotics tags: - LeRobot configs: - config_name: default data_files: data/*/*.parquet --- This dataset was created using [LeRobot](https://github.com/huggingface/lerobot). ## Dataset Description - **Homepage:** [More Information Needed] - **Paper:** [M...
XinnanZhang/DAPO-30K-hint-full
XinnanZhang
2026-01-20T22:59:44Z
9
0
[ "size_categories:10K<n<100K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2026-01-20T22:59:39Z
0
--- dataset_info: features: - name: data_source dtype: string - name: prompt list: - name: content dtype: string - name: role dtype: string - name: ability dtype: string - name: reward_model struct: - name: ground_truth dtype: string - name: style dtype:...
ContextSearchLM/ViGLUE-R
ContextSearchLM
2025-03-14T17:09:40Z
5
0
[ "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "arxiv:2503.07470", "region:us" ]
[]
2024-07-12T11:24:31Z
0
--- dataset_info: features: - name: index dtype: int64 - name: anchor dtype: string - name: pos sequence: string - name: neg sequence: string splits: - name: mnli_r num_bytes: 1303801 num_examples: 3116 - name: qnli_r num_bytes: 770844 num_examples: 1361 download_size: ...
electricsheepafrica/africa-unsdg-ilo-proportion-of-unemployed-persons-receiving-unemploy-si-cov-uemp
electricsheepafrica
2026-05-31T11:08:28Z
0
0
[ "size_categories:n<1K", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2026-05-31T11:08:21Z
0
--- dataset_info: features: - name: series_code dtype: string - name: series_desc dtype: string - name: goal dtype: string - name: target dtype: string - name: indicator dtype: string - name: country_iso3 dtype: string - name: country_name dtype: string - name: year dty...
fakhrullll/Veritas
fakhrullll
2026-02-11T17:04:16Z
8
0
[ "license:bigscience-openrail-m", "region:us" ]
[]
2026-02-11T17:04:16Z
0
--- license: bigscience-openrail-m ---
ShambaC/Uniform-Sentinel-1-2-Dataset
ShambaC
2025-09-06T06:40:03Z
10
0
[ "license:cc-by-sa-4.0", "size_categories:100K<n<1M", "format:parquet", "modality:image", "modality:text", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us" ]
[]
2025-09-01T20:07:39Z
0
--- license: cc-by-sa-4.0 dataset_info: features: - name: input_image dtype: image - name: prompt dtype: string - name: output_image dtype: image splits: - name: train num_bytes: 20260657902.63 num_examples: 129438 download_size: 28012875026 dataset_size: 20260657902.63 configs: - co...
hakunamatata1997/Layoffs_Data
hakunamatata1997
2024-05-30T05:34:35Z
13
0
[ "language:en", "size_categories:1K<n<10K", "format:csv", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-05-30T05:30:48Z
0
--- language: - en --- This dataset was scraped from Layoffs.fyi with the hope to enable huggingface community to look into analyzing recent mass layoffs and discover useful insights and patterns. Original dataset can be tracked at https://layoffs.fyi/ Credits: Roger Lee
dohonba/many_emotions
dohonba
2024-01-27T04:23:33Z
4
0
[ "size_categories:10K<n<100K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-01-27T03:39:52Z
0
--- dataset_info: features: - name: question dtype: string - name: context dtype: string - name: answer dtype: string splits: - name: train num_bytes: 4614794 num_examples: 19998 download_size: 1376594 dataset_size: 4614794 configs: - config_name: default data_files: - split: tra...
AlekseyKorshuk/ai-detection-gutenberg-human-formatted-ai-part3
AlekseyKorshuk
2024-10-30T20:37:28Z
4
0
[ "size_categories:100K<n<1M", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-10-30T20:37:20Z
0
--- dataset_info: features: - name: human dtype: string - name: human_classification struct: - name: flagged dtype: bool - name: prediction dtype: float64 - name: ai sequence: string - name: ai_classification struct: - name: flagged sequence: bool - name: pred...
DCAgent2/swebench_verified_random_100_folders_rl_rl_conf_20GP_base_yaml_mode_path_r2eg_n01575288
DCAgent2
2026-03-03T20:40:02Z
12
0
[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2026-03-03T20:39:56Z
0
--- dataset_info: features: - name: conversations list: - name: content dtype: string - name: role dtype: string - name: agent dtype: string - name: model dtype: string - name: model_provider dtype: string - name: date dtype: string - name: task dtype: string ...
kjngansgfa/dataset_krauhh9j
kjngansgfa
2026-01-10T15:16:35Z
4
0
[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2026-01-10T15:16:33Z
0
--- dataset_info: features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 125 num_examples: 5 download_size: 1320 dataset_size: 125 configs: - config_name: default data_files: - split: train path: data/train-* ---
xingyusu/DNA_Gen
xingyusu
2025-08-04T19:12:37Z
6,149
3
[ "license:mit", "size_categories:10K<n<100K", "format:csv", "modality:document", "modality:image", "modality:tabular", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "arxiv:2507.19523", "region:us" ]
[]
2024-10-17T21:17:15Z
0
--- license: mit --- ## Citation Please cite our work using the bibtex below: **BibTeX:** ``` @article{su2025language, title={Language Models for Controllable DNA Sequence Design}, author={Su, Xingyu and Li, Xiner and Lin, Yuchao and Xie, Ziqian and Zhi, Degui and Ji, Shuiwang}, journal={arXiv preprint arXiv:25...
stefanocarrera/autophagycode_D_he_train-mercury_Qwen3-4B_strategy_trust_t1_g2_run2
stefanocarrera
2026-05-10T11:30:10Z
10
0
[ "size_categories:n<1K", "format:parquet", "format:optimized-parquet", "modality:text", "library:datasets", "library:pandas", "library:polars", "library:mlcroissant", "region:us" ]
[]
2026-05-05T11:04:06Z
0
--- dataset_info: features: - name: task_id dtype: string - name: entry_point dtype: string - name: prompt dtype: string - name: completion dtype: string - name: top_k_progression dtype: string - name: test dtype: string splits: - name: train num_bytes: 6024988 num_exam...
pepijn223/bilateral-teleop-test72
pepijn223
2025-07-16T14:49:51Z
24
0
[ "task_categories:robotics", "license:apache-2.0", "size_categories:1K<n<10K", "format:parquet", "modality:tabular", "modality:timeseries", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us", "LeRobot" ]
[]
2025-07-16T14:49:47Z
0
--- license: apache-2.0 task_categories: - robotics tags: - LeRobot configs: - config_name: default data_files: data/*/*.parquet --- This dataset was created using [LeRobot](https://github.com/huggingface/lerobot). ## Dataset Description - **Homepage:** [More Information Needed] - **Paper:** [More Information Ne...
TIMBER-Lab/Qwen2.5-7B-Instruct-Turbo_labeled_numina_difficulty_162K_10_selected
TIMBER-Lab
2025-05-03T15:55:01Z
4
0
[ "size_categories:1K<n<10K", "format:parquet", "modality:text", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2025-05-03T07:39:42Z
0
--- dataset_info: features: - name: ids dtype: int64 - name: queries dtype: string - name: samples sequence: string - name: references dtype: string splits: - name: train num_bytes: 183380515 num_examples: 7061 download_size: 62088397 dataset_size: 183380515 configs: - config_n...
Asap7772/pickapic_user_shots_winrate_chunk0_cotFalse_randomizeFalse
Asap7772
2024-11-12T21:41:17Z
6
0
[ "size_categories:n<1K", "format:parquet", "modality:tabular", "modality:text", "library:datasets", "library:dask", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-11-12T21:21:18Z
0
--- dataset_info: features: - name: user_id dtype: int64 - name: split dtype: string - name: shot_id dtype: int64 - name: caption sequence: string - name: preferred_image sequence: binary - name: dispreferred_image sequence: binary - name: preferred_image_uid sequence: string...
terryyz/starcoderdata_ngram_10_overlap_9
terryyz
2024-08-14T13:48:40Z
2
0
[ "size_categories:1K<n<10K", "format:parquet", "library:datasets", "library:pandas", "library:mlcroissant", "library:polars", "region:us" ]
[]
2024-08-14T13:48:37Z
0
--- dataset_info: features: - name: overlap dtype: bool splits: - name: train num_bytes: 144 num_examples: 1140 download_size: 932 dataset_size: 144 configs: - config_name: default data_files: - split: train path: data/train-* ---
End of preview. Expand in Data Studio

Dataset Card for Hugging Face Hub Dataset Cards

This datasets consists of dataset cards for models hosted on the Hugging Face Hub. The dataset cards are created by the community and provide information about datasets hosted on the Hugging Face Hub. This dataset is updated on a daily basis and includes publicly available datasets on the Hugging Face Hub.

This dataset is made available to help support users wanting to work with a large number of Dataset Cards from the Hub. We hope that this dataset will help support research in the area of Dataset Cards and their use but the format of this dataset may not be useful for all use cases. If there are other features that you would like to see included in this dataset, please open a new discussion.

Dataset Details

Uses

There are a number of potential uses for this dataset including:

  • text mining to find common themes in dataset cards
  • analysis of the dataset card format/content
  • topic modelling of dataset cards
  • training language models on the dataset cards

Out-of-Scope Use

[More Information Needed]

Dataset Structure

This dataset has a single split.

Dataset Creation

Curation Rationale

The dataset was created to assist people in working with dataset cards. In particular it was created to support research in the area of dataset cards and their use. It is possible to use the Hugging Face Hub API or client library to download dataset cards and this option may be preferable if you have a very specific use case or require a different format.

Source Data

The source data is README.md files for datasets hosted on the Hugging Face Hub. We do not include any other supplementary files that may be included in the dataset directory.

Data Collection and Processing

The data is downloaded using a CRON job on a daily basis.

Who are the source data producers?

The source data producers are the creators of the dataset cards on the Hugging Face Hub. This includes a broad variety of people from the community ranging from large companies to individual researchers. We do not gather any information about who created the dataset card in this repository although this information can be gathered from the Hugging Face Hub API.

Annotations [optional]

There are no additional annotations in this dataset beyond the dataset card content.

Annotation process

N/A

Who are the annotators?

N/A

Personal and Sensitive Information

We make no effort to anonymize the data. Whilst we don't expect the majority of dataset cards to contain personal or sensitive information, it is possible that some dataset cards may contain this information. Dataset cards may also link to websites or email addresses.

Bias, Risks, and Limitations

Dataset cards are created by the community and we do not have any control over the content of the dataset cards. We do not review the content of the dataset cards and we do not make any claims about the accuracy of the information in the dataset cards. Some dataset cards will themselves discuss bias and sometimes this is done by providing examples of bias in either the training data or the responses provided by the dataset. As a result this dataset may contain examples of bias.

Whilst we do not directly download any images linked to in the dataset cards, some dataset cards may include images. Some of these images may not be suitable for all audiences.

Recommendations

Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations.

Citation

No formal citation is required for this dataset but if you use this dataset in your work, please include a link to this dataset page.

Dataset Card Authors

@davanstrien

Dataset Card Contact

@davanstrien

Downloads last month
373

Space using librarian-bots/dataset_cards_with_metadata 1

Collection including librarian-bots/dataset_cards_with_metadata