OpenMed Privacy Filter (Multilingual) — MLX 8-bit
A native MLX port of
OpenMed/privacy-filter-multilingual
for fast, on-device fine-grained PII detection across 54 categories
and 16 languages on Apple Silicon.
This 8-bit affine-quantized artifact reduces download size and resident memory; for the full-precision sibling see OpenMed/privacy-filter-multilingual-mlx.
Family at a glance. Same architecture and training data, three runtimes:
- PyTorch —
OpenMed/privacy-filter-multilingual— CPU + CUDA.- MLX BF16 —
OpenMed/privacy-filter-multilingual-mlx— Apple Silicon, full precision (~2.6 GB).- MLX 8-bit (this repo) —
OpenMed/privacy-filter-multilingual-mlx-8bit— Apple Silicon, ~1.4 GB.
What it does
The model is a token classifier built on the OpenAI Privacy Filter
architecture (openai_privacy_filter). It tags each token with a BIOES
label across 54 PII span classes, then a Viterbi pass over the BIOES
grammar yields clean entity spans. Languages covered: Arabic, Bengali,
Chinese, Dutch, English, French, German, Hindi, Italian, Japanese,
Korean, Portuguese, Spanish, Telugu, Turkish, Vietnamese.
For per-label accuracy, training recipe, and dataset details, see the base PyTorch checkpoint.
Architecture
| Field | Value |
|---|---|
| Source model type | openai_privacy_filter |
| Source architecture | OpenAIPrivacyFilterForTokenClassification |
| Hidden size | 640 |
| Transformer layers | 8 |
| Attention | Grouped-Query (14 query heads / 2 KV heads, head_dim=64) with attention sinks |
| FFN | Sparse Mixture-of-Experts — 128 experts, top-4 routing, SwiGLU |
| Position encoding | YARN-scaled RoPE (rope_theta=150_000, factor=32) |
| Context length | 131,072 tokens (initial 4,096) |
| Tokenizer | o200k_base (tiktoken) — vocab 200,064 |
| Output head | Linear(640 → 217) with bias |
File set
| File | Size | Purpose |
|---|---|---|
weights.safetensors |
~1.4 GB | Model weights in OpenMed-MLX layout |
config.json |
~19 KB | Model + MLX runtime config |
id2label.json |
~5 KB | Numeric ID → BIOES label string |
openmed-mlx.json |
~1 KB | OpenMed MLX manifest (task, family, runtime hints) |
tokenizer.json, tokenizer_config.json |
~28 MB | Source tokenizer files (kept for reference) |
The MLX runtime uses tiktoken o200k_base directly for tokenization;
the tokenizer.json is kept so consumers can inspect or re-tokenize via
transformers if desired.
Label space (54 categories)
| Category | Typical examples |
|---|---|
| Identity | FIRSTNAME, MIDDLENAME, LASTNAME, PREFIX, AGE, GENDER, SEX, EYECOLOR, HEIGHT, USERNAME, OCCUPATION, JOBTITLE, JOBDEPARTMENT, ORGANIZATION, USERAGENT |
| Contact | EMAIL, PHONE, URL |
| Address | STREET, BUILDINGNUMBER, SECONDARYADDRESS, CITY, COUNTY, STATE, ZIPCODE, GPSCOORDINATES, ORDINALDIRECTION |
| Dates & time | DATE, DATEOFBIRTH, TIME |
| Government IDs | SSN |
| Financial | ACCOUNTNAME, BANKACCOUNT, IBAN, BIC, CREDITCARD, CREDITCARDISSUER, CVV, PIN, MASKEDNUMBER, AMOUNT, CURRENCY, CURRENCYCODE, CURRENCYNAME, CURRENCYSYMBOL |
| Crypto | BITCOINADDRESS, ETHEREUMADDRESS, LITECOINADDRESS |
| Vehicle | VIN, VRM |
| Digital | IPADDRESS, MACADDRESS, IMEI |
| Auth | PASSWORD |
Quick start
With OpenMed — recommended
OpenMed gives you a single extract_pii() / deidentify() API that
auto-selects MLX on Apple Silicon and PyTorch elsewhere — same code on
every host.
pip install -U "openmed[mlx]"
from openmed import extract_pii, deidentify
text = (
"Patient Sarah Johnson (DOB 03/15/1985), phone 415-555-0123, email sarah.johnson@example.com."
)
# Extract grouped entity spans (runs on MLX here, PyTorch fallback elsewhere)
result = extract_pii(text, model_name="OpenMed/privacy-filter-multilingual-mlx-8bit")
for ent in result.entities:
print(f"{ent.label:30s} {ent.text!r} conf={ent.confidence:.2f}")
# De-identify
masked = deidentify(text, method="mask",
model_name="OpenMed/privacy-filter-multilingual-mlx-8bit")
fake = deidentify(
text,
method="replace",
model_name="OpenMed/privacy-filter-multilingual-mlx-8bit",
consistent=True,
seed=42, # deterministic locale-aware Faker surrogates
)
When MLX isn't available (Linux, Windows, Intel Mac, missing mlx package),
this exact same call automatically falls back to the PyTorch checkpoint
OpenMed/privacy-filter-multilingual with a one-time warning. Family-aware fallback: a Multilingual
MLX request never substitutes an unrelated baseline.
Direct MLX usage (lower-level)
from huggingface_hub import snapshot_download
from openmed.mlx.inference import PrivacyFilterMLXPipeline
model_path = snapshot_download("OpenMed/privacy-filter-multilingual-mlx-8bit")
pipe = PrivacyFilterMLXPipeline(model_path)
print(pipe("Email me at alice.smith@example.com after 5pm."))
# [{'entity_group': 'EMAIL',
# 'score': 0.92,
# 'word': 'alice.smith@example.com',
# 'start': 12,
# 'end': 35}]
The pipeline returns a list of dicts with entity_group, score, word,
start, and end (character offsets into the input string).
Hardware notes
- Designed for Apple Silicon (M-series GPUs); CPU inference works but is slower.
- Tested on macOS with
mlx>=0.18. The MLX runtime in this repo is independent ofmlx_lm(token classification, not causal LM). - Lower latency / smaller memory than the BF16 sibling.
Credits & Acknowledgements
This artifact wouldn't exist without two open-source releases — sincere thanks to both teams:
- OpenAI for open-sourcing the Privacy Filter
(architecture, modeling code, and
opftraining/eval CLI). The MLX port in this repo runs that same architecture under Apple's MLX framework. - AI4Privacy for releasing the multilingual PII masking datasets
used to fine-tune the source PyTorch checkpoint:
pii-masking-200k,pii-masking-400k, andopen-pii-masking-500k-ai4privacy.
Additional thanks to Apple for MLX and the HuggingFace team for the model-distribution ecosystem.
License
Apache 2.0.
- Downloads last month
- 57
Quantized
Model tree for OpenMed/privacy-filter-multilingual-mlx-8bit
Base model
openai/privacy-filter