gheim-ch-560m-research
β οΈ Research-only / non-commercial. This variant of
gheim-ch-560mwas fine-tuned on a multi-source training mix that includes Babelscape WikiNeural (CC BY-NC-SA 4.0) and CoNLL-2003 (Reuters Corpus, research-only). The most restrictive upstream licence binds the released model: the checkpoint may not be used for commercial purposes. For commercial deployments use the sister checkpointjoelbarmettler/gheim-ch-560m, which is trained only on the Apache-2.0-compatiblegheim-ch-pii-212kdataset.
A multilingual token-classification model for personally-identifiable
information (PII) detection across the four official Swiss languages
(de_CH, fr_CH, it_CH, rm) and English. Architecture and parameter
count are identical to the flagship joelbarmettler/gheim-ch-560m;
the difference is the training mix, which adds public external
NER / PII training data on top of the in-domain
gheim-ch-pii-212k corpus to lift cross-domain transfer (especially
on Swiss-news text and external person-NER benchmarks).
| Parameters | 560M |
| Languages | de_CH, fr_CH, it_CH, rm, en |
| Categories | account_number, private_address, private_date, private_email, private_person, private_phone, private_url, secret |
| Tag scheme | BIOES (33 classes) |
| Max sequence length | 512 |
| License | CC BY-NC-SA 4.0 (with Reuters research-only restriction inherited from CoNLL-2003) |
When to choose this variant
| If your use case is⦠| Use |
|---|---|
| Production deployment, customer data, commercial product | joelbarmettler/gheim-ch-560m (Apache 2.0) |
| Academic research, benchmarks, papers, evaluations | this checkpoint |
| Cross-domain redaction (Swiss-news articles, English news, multilingual Wikipedia) | this checkpoint (substantially stronger on those distributions) |
| In-domain redaction (Swiss court / parliament / web text) | either β numbers are essentially identical (see below) |
Performance vs the commercial flagship
In-distribution (held-out test split of gheim-ch-pii-212k,
21,246 chunks) β the two checkpoints are within 0.2 pp on every
headline metric:
| Metric | gheim-ch-560m (Apache 2.0) |
gheim-ch-560m-research (this) |
Ξ |
|---|---|---|---|
| Test strict F1 | 0.910 | 0.912 | +0.002 |
| Test char F1 | 0.946 | 0.946 | 0.000 |
| Per-(lang Γ cat) char F1 | 0.940 | 0.940 | 0.000 |
On six external benchmarks (PER char F1). Three of the six β
openpii-1m, WikiNeural, CoNLL-2003 β are present in this
variant's training mix, so those numbers are in-distribution
generalisation, not zero-shot cross-domain transfer. The
swissner, open-pii-500k, and gretel_finance rows are
zero-shot for both checkpoints and are the honest cross-domain
comparison.
| Benchmark | n | gheim-ch-560m (Apache 2.0) | gheim-ch-560m-research (this) | Ξ |
|---|---|---|---|---|
ZurichNLP/swissner (Swiss-news NER, zero-shot for both) |
800 | 0.702 | 0.903 | +20.1 pp |
ai4privacy/pii-masking-openpii-1m (in-distribution for this variant) |
8,000 | 0.938 | 0.995 | +5.7 pp |
ai4privacy/open-pii-masking-500k (zero-shot for both) |
8,000 | 0.933 | 0.982 | +4.9 pp |
gretelai/synthetic_pii_finance_multilingual (zero-shot for both) |
4,800 | 0.624 | 0.627 | +0.3 pp |
Babelscape/wikineural (in-distribution for this variant) |
8,000 | 0.808 | 0.795 | β1.3 pp |
tomaarsen/conll2003 (in-distribution for this variant) |
3,453 | 0.911 | 0.765 | β14.6 pp |
Headline finding: on the zero-shot Swiss-news test (swissner),
this variant attains 0.903 PER char F1 vs the Apache-2.0 baseline's
0.702 β a 20 pp gain from broader training data even though
swissner itself is not seen at training time (no swissner train
split exists; the gain comes from openpii + WikiNeural + CoNLL-2003
generalisation).
Two caveats worth understanding before using this variant:
- On
Babelscape/WikiNeuralandtomaarsen/conll2003β both present in this variant's training mix β the research variant actually regresses against the Apache-2.0 baseline. The broader 8-category output schema produces non-PER false positives on news / Wikipedia text (where any IBAN-shaped or email-shaped string would be flagged) that the in-domain-only Apache baseline does not produce. PII-detector recall comes at the cost of NER-benchmark precision. - On
gretel_financethe two variants are statistically tied (~0.62). None of the training mixes contain financial-document text, so this benchmark measures the architecture's structural-PII generalisation. Adding financial-domain data is a planned future iteration.
Per-language swissner PER char F1 (the headline cross-domain
finding):
| Language | Apache 2.0 | Research | Ξ |
|---|---|---|---|
| de | 0.539 | 0.931 | +39 pp |
| fr | 0.761 | 0.913 | +15 pp |
| it | 0.643 | 0.856 | +21 pp |
| rm | 0.409 | 0.873 | +46 pp |
The Romansh result is the most notable: no Romansh training data is added (none of the external corpora include RM), but the model gains 46 pp on Romansh swissner PER char F1 via cross-lingual transfer in XLM-R's shared encoder body from the additional de/fr/it/en supervision.
Per-(language Γ category) char-level F1 on the in-domain test
| Category | de_ch | fr_ch | it_ch | rm | en | Avg. |
|---|---|---|---|---|---|---|
account_number |
0.994 | 0.998 | 0.992 | 0.971 | 1.000 | 0.995 |
private_address |
0.932 | 0.912 | 0.920 | 0.851 | 0.962 | 0.917 |
private_date |
0.949 | 0.909 | 0.951 | 0.919 | 0.899 | 0.933 |
private_email |
0.996 | 0.998 | 0.999 | 0.991 | 1.000 | 0.996 |
private_person |
0.913 | 0.939 | 0.952 | 0.907 | 0.959 | 0.930 |
private_phone |
0.995 | 0.998 | 0.997 | 0.987 | 1.000 | 0.995 |
private_url |
0.991 | 0.994 | 0.993 | 0.993 | 0.967 | 0.991 |
secret |
0.993 | 0.999 | 1.000 | 1.000 | n/a | 0.997 |
| Avg. | 0.943 | 0.931 | 0.956 | 0.916 | 0.959 | 0.940 |
Usage
The serving surface is identical to the Apache-2.0 variant; the SDKs
and transformers / transformers.js pipelines accept either repo
ID. Swap the model ID:
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
repo = "joelbarmettler/gheim-ch-560m-research" # research variant
tok = AutoTokenizer.from_pretrained(repo)
mdl = AutoModelForTokenClassification.from_pretrained(repo)
ner = pipeline("token-classification", model=mdl, tokenizer=tok,
aggregation_strategy="simple")
For the gheim SDKs:
from gheim import Anonymizer
a = Anonymizer(detector_model="joelbarmettler/gheim-ch-560m-research")
Training procedure
Same architecture (FacebookAI/xlm-roberta-large, 560M params),
same hyperparameters as the flagship checkpoint (AdamW, LR 5e-5,
cosine with 5% warmup, no LLRD, effective batch 128 on 2Γ RTX 4090
DDP, bf16, 3 epochs, max sequence length 512). Best checkpoint at
step 5,500 of 5,946 (epoch 2.77) by validation overall_f1 0.911.
Wall time β 94 min train + 5 min eval.
The training data is the union of:
| Source | Train chunks | License | Role |
|---|---|---|---|
joelbarmettler/gheim-ch-pii-212k (train split) |
170,001 | CC BY 4.0 | In-domain Swiss PII (court, parliament, web, Romansh) |
ai4privacy/pii-masking-openpii-1m (train split, de/fr/it/en) |
40,000 | Apache 2.0 / CC-BY 4.0 | Multi-category PII supervision (chat / forms) |
Babelscape/wikineural (train splits, de/fr/it/en) |
39,223 | CC BY-NC-SA 4.0 | News / Wikipedia PER supervision |
tomaarsen/conll2003 (train split, en, PER-bearing) |
4,373 | Research-only (Reuters) | English news PER supervision |
| Total | 253,597 | (most restrictive binds) |
Only chunks containing at least one in-schema PII span are kept from the external sources, so the model receives positive supervision without "implicit O" negatives on text the source dataset did not annotate exhaustively.
The full builder is at
training/src/gheim_training/data/external_train.py
and the train-mix HF DatasetDict builder is at
training/src/gheim_training/data/build_hf_multisource.py.
Limitations
- Non-commercial use only. Inherits the most restrictive of its
training-data licenses (CC BY-NC-SA 4.0 from WikiNeural; Reuters
research-only from CoNLL-2003). If you need a commercial-friendly
checkpoint, use
joelbarmettler/gheim-ch-560m(Apache 2.0, trained only on in-domain Swiss text; cross-domain numbers are weaker β see the comparison table above). private_addresstest strict F1 is 0.84 (char F1 0.92). Boundary placement on multi-token addresses is the dominant error mode.- Swiss German dialect (GSW) is not measured. The fasttext detector used in data preparation labels GSW as standard German.
account_numbertest strict F1 is 0.99 in the headline, but regex-shaped non-PII in numeric tables can still slip through. For production use, pair with the regex front-end documented in thegheimlibrary (checksum validation: IBAN, AHV, VAT-CHE, Luhn).- Re-identification is not in scope. The model is intended for redaction; it does not return entity-linked identifiers.
License
Composite, bounded by the most restrictive upstream:
- Model weights: CC BY-NC-SA 4.0 with a Reuters research-only rider inherited from CoNLL-2003.
- Base architecture (
FacebookAI/xlm-roberta-large): Apache 2.0 (does not bind the fine-tune since training data overrides). - In-domain training data
(
gheim-ch-pii-212k): CC BY 4.0. ai4privacy/openpii-1mtraining data: Apache 2.0 / CC BY 4.0.Babelscape/wikineuraltraining data: CC BY-NC-SA 4.0.tomaarsen/conll2003training data: Reuters research-only.
If your use case cannot accept research-only / non-commercial terms,
use the sibling checkpoint
joelbarmettler/gheim-ch-560m.
Citation
@misc{barmettler2026gheim_ch_560m_research,
title = {gheim-ch-560m-research: A multi-source-trained Swiss-PII detector with strong cross-domain transfer},
author = {Joel Barmettler},
year = {2026},
url = {https://huggingface.co/joelbarmettler/gheim-ch-560m-research},
note = {Research-only variant of gheim-ch-560m; trained on gheim-ch-pii-212k + ai4privacy/openpii-1m + Babelscape/wikineural + tomaarsen/conll2003}
}
Maintainer
Joel Barmettler Β· jbarmettler@proton.me Β· joelbarmettler.xyz Β· github.com/joelbarmettlerUZH/gheim
- Downloads last month
- 18
Model tree for joelbarmettler/gheim-ch-560m-research
Base model
FacebookAI/xlm-roberta-largeDatasets used to train joelbarmettler/gheim-ch-560m-research
Evaluation results
- Strict-span F1 (seqeval) on gheim-ch-pii-212ktest set self-reported0.911
- Char-level F1 (label-aware) on gheim-ch-pii-212ktest set self-reported0.946
- Strict-span precision on gheim-ch-pii-212ktest set self-reported0.894
- Strict-span recall on gheim-ch-pii-212ktest set self-reported0.929
- PER char F1 (overall, zero-shot) on ZurichNLP/swissnertest set self-reported0.903
- PER char F1 (de, zero-shot) on ZurichNLP/swissnertest set self-reported0.931
- PER char F1 (fr, zero-shot) on ZurichNLP/swissnertest set self-reported0.913
- PER char F1 (it, zero-shot) on ZurichNLP/swissnertest set self-reported0.856
