VOOZH about

URL: https://huggingface.co/datasets/emperor-mew/ooni-censorship-historical

⇱ emperor-mew/ooni-censorship-historical · Datasets at Hugging Face


Search is not available for this dataset
country
string
date
string
anomaly_rate
float64
measurement_count
int64
spike_magnitude
float64
label
int64
event
string
confidence
float64
AE
2018-01-03
0.358209
67
2.259626
0
null
0.7
AE
2018-11-09
0.303797
79
1.724287
0
null
0.7
AE
2019-04-21
0.294118
51
1.62905
0
null
0.7
AE
2019-05-23
0.315789
57
1.842273
0
null
0.7
AE
2019-09-14
0.353846
65
2.216702
0
null
0.7
AE
2021-01-25
0.31746
126
1.858712
0
null
0.7
AE
2021-03-17
0.293103
58
1.619071
0
null
0.7
AE
2021-10-06
0.352941
51
2.207798
0
null
0.7
AE
2022-04-18
0.285714
224
1.546371
0
null
0.7
AE
2022-05-01
0.373333
75
2.408431
0
null
0.7
AE
2022-08-08
0.373984
123
2.41483
0
null
0.7
AE
2022-09-26
0.412261
2,610
2.791424
0
null
0.7
AE
2022-10-26
0.294927
1,163
1.637012
0
null
0.7
AE
2023-01-11
0.351542
1,135
2.19403
0
null
0.7
AE
2023-01-25
0.322388
335
1.907194
0
null
0.7
AE
2023-02-05
0.346405
612
2.143493
0
null
0.7
AE
2023-03-03
0.296296
648
1.650485
0
null
0.7
AE
2023-05-30
0.412
500
2.788861
0
null
0.7
AE
2023-05-31
0.37375
800
2.41253
0
null
0.7
AE
2023-06-02
0.31625
800
1.846804
0
null
0.7
AE
2023-06-14
0.285714
700
1.546371
0
null
0.7
AE
2023-06-15
0.33125
800
1.994385
0
null
0.7
AE
2023-07-31
0.94
50
7.983704
0
null
0.7
AE
2023-08-01
0.945455
55
8.03737
0
null
0.7
AE
2023-08-02
0.933333
60
7.918113
0
null
0.7
AE
2023-08-09
0.342039
2,795
2.100538
0
null
0.7
AE
2023-08-21
0.307619
2,100
1.761886
0
null
0.7
AE
2023-08-27
0.320423
3,873
1.887865
0
null
0.7
AE
2023-08-30
0.938462
65
7.968568
0
null
0.7
AE
2023-08-31
0.86
50
7.196607
0
null
0.7
AE
2023-09-04
0.8
60
6.606284
0
null
0.7
AE
2023-09-16
0.317446
2,224
1.858571
0
null
0.7
AE
2023-09-26
0.367656
4,749
2.352576
0
null
0.7
AE
2023-09-27
0.92
50
7.78693
0
null
0.7
AE
2023-10-11
0.289716
2,820
1.585746
0
null
0.7
AE
2023-10-20
0.419355
62
2.861223
0
null
0.7
AE
2023-10-21
0.381818
55
2.491911
0
null
0.7
AE
2023-10-21
0.375
56
2.424828
0
null
0.7
AE
2023-10-22
0.522727
88
3.878275
0
null
0.7
AE
2023-10-22
0.318182
88
1.86581
0
null
0.7
AE
2023-10-24
0.352941
51
2.207798
0
null
0.7
AE
2023-10-24
0.384615
52
2.519432
0
null
0.7
AE
2023-10-25
0.462687
67
3.287552
0
null
0.7
AE
2023-10-25
0.470588
68
3.365294
0
null
0.7
AE
2023-10-26
0.433333
60
2.998754
0
null
0.7
AE
2023-11-15
0.365306
1,470
2.329453
0
null
0.7
AE
2023-11-17
0.304682
1,303
1.732985
0
null
0.7
AE
2023-11-18
0.361111
1,152
2.28818
0
null
0.7
AE
2023-11-29
0.325548
3,511
1.938287
0
null
0.7
AE
2023-11-29
0.90566
53
7.645846
0
null
0.7
AE
2023-12-05
0.2861
741
1.550165
0
null
0.7
AE
2023-12-15
0.901961
51
7.609447
0
null
0.7
AE
2023-12-15
0.444444
54
3.108073
0
null
0.7
AE
2023-12-16
0.281637
3,469
1.50626
0
null
0.7
AE
2023-12-16
0.56
50
4.244991
0
null
0.7
AE
2023-12-18
0.846154
52
7.060378
0
null
0.7
AE
2023-12-18
0.37037
54
2.379279
0
null
0.7
AE
2023-12-19
0.888889
54
7.480836
0
null
0.7
AE
2023-12-19
0.574074
54
4.383462
0
null
0.7
AE
2023-12-20
0.779661
59
6.406174
0
null
0.7
AE
2023-12-20
0.807018
57
6.675327
0
null
0.7
AE
2024-02-10
0.328687
2,224
1.969168
0
null
0.7
AE
2024-02-27
0.282566
2,042
1.515397
0
null
0.7
AE
2024-03-10
0.335851
2,516
2.039648
0
null
0.7
AE
2024-03-15
0.30732
1,653
1.758944
0
null
0.7
AE
2024-03-25
0.314286
1,750
1.827478
0
null
0.7
AE
2024-03-30
0.288644
1,902
1.575192
0
null
0.7
AE
2024-04-02
0.440781
819
3.072034
0
null
0.7
AE
2024-04-18
0.33625
1,600
2.043578
0
null
0.7
AE
2024-04-19
0.351268
1,301
2.191339
0
null
0.7
AE
2024-04-27
0.320723
2,987
1.890814
0
null
0.7
AE
2024-04-28
0.349125
1,372
2.170255
0
null
0.7
AE
2024-05-01
0.360075
2,655
2.277989
0
null
0.7
AE
2024-05-02
0.368273
3,549
2.358641
0
null
0.7
AE
2024-05-07
0.334405
1,244
2.025427
0
null
0.7
AE
2024-05-18
0.471698
53
3.376214
0
null
0.7
AE
2024-05-18
0.377358
53
2.448033
0
null
0.7
AE
2024-05-20
0.313502
1,933
1.81977
0
null
0.7
AE
2024-05-21
0.288889
2,385
1.577606
0
null
0.7
AE
2024-05-22
0.290245
1,671
1.590951
0
null
0.7
AE
2024-05-29
0.381776
1,734
2.491498
0
null
0.7
AE
2024-06-01
0.294831
1,896
1.63607
0
null
0.7
AE
2024-06-04
0.34
1,400
2.080473
0
null
0.7
AE
2024-06-12
0.851852
54
7.116439
0
null
0.7
AE
2024-06-24
0.355956
3,551
2.237461
0
null
0.7
AE
2024-06-25
0.320204
2,158
1.885705
0
null
0.7
AE
2024-07-05
0.285375
4,205
1.543029
0
null
0.7
AE
2024-07-08
0.627451
51
4.908623
0
null
0.7
AE
2024-07-18
0.839286
56
6.992805
0
null
0.7
AE
2024-07-31
0.304622
3,808
1.732398
0
null
0.7
AE
2024-08-03
0.317988
2,107
1.8639
0
null
0.7
AE
2024-08-11
0.404605
608
2.716106
0
null
0.7
AE
2024-08-12
0.290105
1,334
1.58957
0
null
0.7
AE
2024-08-20
0.287583
4,816
1.564758
0
null
0.7
AE
2024-09-05
0.457627
59
3.237773
0
null
0.7
AE
2024-09-05
0.5
62
3.654668
0
null
0.7
AE
2025-07-19
0.335476
778
2.035959
0
null
0.7
AE
2025-10-14
0.792453
53
6.532029
0
null
0.7
AE
2025-11-18
0.538462
52
4.03308
0
null
0.7
AE
2025-12-21
0.288462
52
1.573401
0
null
0.7
End of preview. Expand in Data Studio

Voidly OONI Censorship Historical

A 10-year open archive for internet censorship research and ML.

Dataset Description

This dataset contains 10 years of global internet censorship measurements from 120+ countries:

  • 1.6M+ daily measurements (2017-2026)
  • 37K detected anomaly spikes
  • 4.5K confirmed censorship events with labels
  • 25+ known major incidents (Mahsa Amini protests, Myanmar coup, etc.)

Data Sources

Files

File Description Rows
data/ooni-historical.parquet Daily measurements by country/test 1.6M
data/censorship-incidents.parquet Labeled anomaly spikes 37K

Usage

from datasets import load_dataset

# Load historical measurements
ds = load_dataset("emperor-mew/ooni-censorship-historical",
 data_files="data/ooni-historical.parquet")

# Load labeled incidents (for ML training)
incidents = load_dataset("emperor-mew/ooni-censorship-historical",
 data_files="data/censorship-incidents.parquet")

Schema

ooni-historical

Column Type Description
country string ISO 3166-1 alpha-2 country code
test_name string OONI test type (web_connectivity, telegram, whatsapp)
date date Measurement date
measurement_count int Total measurements
anomaly_count int Measurements showing anomalies
confirmed_count int Confirmed blocked
anomaly_rate float Fraction showing anomalies (0-1)

censorship-incidents

Column Type Description
country string ISO 3166-1 alpha-2 country code
date date Incident date
anomaly_rate float Measured anomaly rate
measurement_count int Sample size
spike_magnitude float Z-score above baseline
label int 1=confirmed censorship, 0=not
event string Matched known event (if any)
confidence float Label confidence (0-1)

Known Events Covered

  • Iran Mahsa Amini protests (2022)
  • Myanmar military coup (2021)
  • Belarus election shutdown (2020)
  • Russia Ukraine invasion blocks (2022+)
  • Kazakhstan January protests (2022)
  • Sudan military coup (2021)
  • Cuba July protests (2021)
  • Uganda election shutdown (2021)
  • And 17+ more

Voidly Atlas ML Stack (2026-05-21)

This historical archive is the long-horizon training substrate for the Voidly Atlas ML stack. The production stack is documented in dedicated HuggingFace model cards under emperor-mew:

  • Classifier v3.3 (emperor-mew/voidly-classifier-v3.3) — country-day censorship classifier, GradientBoosting, regime-similarity-weighted contagion features. Honest cross-country generalization: leave-one-country-out median F1 0.87, mean F1 0.71. The fitted .pkl + per-country thresholds ship in that repo.
  • Multi-horizon forecast (emperor-mew/voidly-forecast-v1-multi-horizon) — 1d/7d/30d XGBoost + isotonic, LOCO AUC 0.91 / 0.88 / 0.84.
  • Unsupervised anomaly (emperor-mew/voidly-anomaly-dbscan-v1) — CenDTect-style DBSCAN second-opinion signal.
  • 12 more model cards — search emperor-mew/voidly- on the Hub.

Note on the older "F1 99.8% / AUC 1.000" claim: that figure was a stratified-random-split number on a now-superseded v2 model. It does not reflect cross-country generalization. The current honest metric is the LOCO (leave-one-country-out) F1 reported above — random splits inflate apparent accuracy because the model learns per-country base rates.

For a clean held-out evaluation task, use the companion benchmark emperor-mew/voidly-bench-v1.

Citation

@dataset{voidly_ooni_historical_2026,
 author = {Voidly Research},
 title = {Voidly OONI Censorship Historical: 10 Years of Internet Measurement Data},
 year = {2026},
 publisher = {Hugging Face},
 url = {https://huggingface.co/datasets/emperor-mew/ooni-censorship-historical}
}

Links

License

CC BY 4.0 - Attribution required

Downloads last month
36

Space using emperor-mew/ooni-censorship-historical 1

Collection including emperor-mew/ooni-censorship-historical