VOOZH about

URL: https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx

⇱ OpenMed/privacy-filter-nemotron-mlx · Hugging Face


OpenMed Privacy Filter (Nemotron) — MLX BF16

A native MLX port of OpenMed/privacy-filter-nemotron for fast, on-device PII detection on Apple Silicon. This BF16 artifact preserves the full source precision; for a smaller / faster sibling, see OpenMed/privacy-filter-nemotron-mlx-8bit.

Family at a glance. Same architecture and training data, three runtimes:

What it does

The model is a token classifier built on OpenAI's open Privacy Filter architecture (the same openai_privacy_filter model type used by openai/privacy-filter). It tags each token with a BIOES label across 55 PII span classes, then a Viterbi pass over the BIOES grammar yields clean entity spans. Detected categories include:

  • Personal identifiers — first_name, last_name, user_name, gender, age, date_of_birth
  • Contact — email, phone_number, fax_number, street_address, city, state, country, county, postcode, coordinate
  • Government / legal IDs — ssn, national_id, tax_id, certificate_license_number
  • Financial — account_number, bank_routing_number, credit_debit_card, cvv, pin, swift_bic
  • Medical — medical_record_number, health_plan_beneficiary_number, blood_type
  • Workplace — company_name, occupation, employee_id, customer_id, employment_status, education_level
  • Online — url, ipv4, ipv6, mac_address, http_cookie, api_key, password, device_identifier
  • Demographic — race_ethnicity, religious_belief, political_view, sexuality, language
  • Vehicles — license_plate, vehicle_identifier
  • Time — date, date_time, time
  • Misc — biometric_identifier, unique_id

For per-label accuracy, training recipe, and dataset details, see the base PyTorch checkpoint.

Architecture

Field Value
Source model type openai_privacy_filter
Source architecture OpenAIPrivacyFilterForTokenClassification
Hidden size 640
Transformer layers 8
Attention Grouped-Query (14 query heads / 2 KV heads, head_dim=64) with attention sinks
FFN Sparse Mixture-of-Experts — 128 experts, top-4 routing, SwiGLU
Position encoding YARN-scaled RoPE (rope_theta=150_000, factor=32)
Context length 131,072 tokens (initial 4,096)
Tokenizer o200k_base (tiktoken) — vocab 200,064
Output head Linear(640 → 221) with bias

File set

File Size Purpose
weights.safetensors 2.6 GB BF16 model weights in OpenMed-MLX layout
config.json 19 KB Model + MLX runtime config
id2label.json 5.4 KB Numeric ID → BIOES label string
openmed-mlx.json 0.7 KB OpenMed MLX manifest (task, family, runtime hints)
tokenizer.json, tokenizer_config.json 27 MB Source tokenizer files (kept for reference)

The MLX runtime uses tiktoken o200k_base directly for tokenization; the tokenizer.json is kept so consumers can inspect or re-tokenize via transformers if desired.

Quick start

With OpenMed — recommended

OpenMed gives you a single extract_pii() / deidentify() API that auto-selects MLX on Apple Silicon and PyTorch elsewhere — same code on every host.

pip install -U "openmed[mlx]"
from openmed import extract_pii, deidentify

text = (
 "Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, "
 "phone 415-555-0123, email sarah.johnson@example.com."
)

# Extract grouped entity spans (runs on MLX here, PyTorch fallback elsewhere)
result = extract_pii(text, model_name="OpenMed/privacy-filter-nemotron-mlx")
for ent in result.entities:
 print(f"{ent.label:30s} {ent.text!r} conf={ent.confidence:.2f}")

# De-identify
masked = deidentify(text, method="mask",
 model_name="OpenMed/privacy-filter-nemotron-mlx")
fake = deidentify(
 text,
 method="replace",
 model_name="OpenMed/privacy-filter-nemotron-mlx",
 consistent=True,
 seed=42, # deterministic locale-aware Faker surrogates
)

When MLX isn't available (Linux, Windows, Intel Mac, missing mlx package), this exact same call automatically falls back to the PyTorch checkpoint OpenMed/privacy-filter-nemotron with a one-time warning. Family-aware fallback: a Nemotron MLX request never substitutes the unrelated openai/privacy-filter baseline.

Direct MLX usage (lower-level)

from huggingface_hub import snapshot_download
from openmed.mlx.inference import PrivacyFilterMLXPipeline

model_path = snapshot_download("OpenMed/privacy-filter-nemotron-mlx")
pipe = PrivacyFilterMLXPipeline(model_path)

print(pipe("Email me at alice.smith@example.com after 5pm."))
# [{'entity_group': 'email',
# 'score': 0.92,
# 'word': 'alice.smith@example.com',
# 'start': 12,
# 'end': 35}]

The pipeline returns a list of dicts with entity_group, score, word, start, and end (character offsets into the input string).

Loading from a local snapshot

from openmed.mlx.models import load_model
import mlx.core as mx

model = load_model("/path/to/privacy-filter-nemotron-mlx")
ids = mx.array([[1, 100, 200, 300]], dtype=mx.int32)
mask = mx.ones((1, 4), dtype=mx.bool_)
logits = model(ids, attention_mask=mask) # shape (1, 4, 221)

Hardware notes

  • Designed for Apple Silicon (M-series GPUs); CPU inference works but is slower.
  • Tested on macOS with mlx>=0.18. The MLX runtime in this repo is independent of mlx_lm (token classification, not causal LM).
  • Forward pass on a typical PII sentence (~10 tokens) takes ~14 ms on M-series GPU after warmup. For lower latency or smaller memory footprint, use the -mlx-8bit sibling instead.

Credits & Acknowledgements

This model wouldn't exist without two open-source releases — sincere thanks to both teams:

  • OpenAI for open-sourcing the Privacy Filter (architecture, modeling code, and opf training/eval CLI). The MLX port in this repo runs that same architecture under Apple's MLX framework.
  • NVIDIA for releasing the Nemotron-PII dataset used to fine-tune the source PyTorch checkpoint.

Additional thanks to Apple for MLX and the HuggingFace team for the model-distribution ecosystem.

License

Apache 2.0 (matches the source checkpoint).

Downloads last month
2,679
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Model tree for OpenMed/privacy-filter-nemotron-mlx

Finetuned
(3)
this model

Dataset used to train OpenMed/privacy-filter-nemotron-mlx

Collection including OpenMed/privacy-filter-nemotron-mlx