OpenMed Privacy Filter (Nemotron) — MLX BF16

A native MLX port of OpenMed/privacy-filter-nemotron for fast, on-device PII detection on Apple Silicon. This BF16 artifact preserves the full source precision; for a smaller / faster sibling, see OpenMed/privacy-filter-nemotron-mlx-8bit.

Family at a glance. Same architecture and training data, three runtimes:

PyTorch — OpenMed/privacy-filter-nemotron — CPU + CUDA.

MLX BF16 (this repo) — Apple Silicon, full precision (~2.6 GB).

MLX 8-bit — OpenMed/privacy-filter-nemotron-mlx-8bit — Apple Silicon, ~1.4 GB, ~1.7× faster.

What it does

The model is a token classifier built on OpenAI's open Privacy Filter architecture (the same openai_privacy_filter model type used by openai/privacy-filter). It tags each token with a BIOES label across 55 PII span classes, then a Viterbi pass over the BIOES grammar yields clean entity spans. Detected categories include:

Personal identifiers — first_name, last_name, user_name, gender, age, date_of_birth
Contact — email, phone_number, fax_number, street_address, city, state, country, county, postcode, coordinate
Government / legal IDs — ssn, national_id, tax_id, certificate_license_number
Financial — account_number, bank_routing_number, credit_debit_card, cvv, pin, swift_bic
Medical — medical_record_number, health_plan_beneficiary_number, blood_type
Workplace — company_name, occupation, employee_id, customer_id, employment_status, education_level
Online — url, ipv4, ipv6, mac_address, http_cookie, api_key, password, device_identifier
Demographic — race_ethnicity, religious_belief, political_view, sexuality, language
Vehicles — license_plate, vehicle_identifier
Time — date, date_time, time
Misc — biometric_identifier, unique_id

For per-label accuracy, training recipe, and dataset details, see the base PyTorch checkpoint.

Architecture

Field	Value
Source model type	`openai_privacy_filter`
Source architecture	`OpenAIPrivacyFilterForTokenClassification`
Hidden size	640
Transformer layers	8
Attention	Grouped-Query (14 query heads / 2 KV heads, head_dim=64) with attention sinks
FFN	Sparse Mixture-of-Experts — 128 experts, top-4 routing, SwiGLU
Position encoding	YARN-scaled RoPE (`rope_theta=150_000`, factor=32)
Context length	131,072 tokens (initial 4,096)
Tokenizer	`o200k_base` (tiktoken) — vocab 200,064
Output head	Linear(640 → 221) with bias

File set

File	Size	Purpose
`weights.safetensors`	2.6 GB	BF16 model weights in OpenMed-MLX layout
`config.json`	19 KB	Model + MLX runtime config
`id2label.json`	5.4 KB	Numeric ID → BIOES label string
`openmed-mlx.json`	0.7 KB	OpenMed MLX manifest (task, family, runtime hints)
`tokenizer.json`, `tokenizer_config.json`	27 MB	Source tokenizer files (kept for reference)

The MLX runtime uses tiktoken o200k_base directly for tokenization; the tokenizer.json is kept so consumers can inspect or re-tokenize via transformers if desired.

Quick start

With OpenMed — recommended

OpenMed gives you a single extract_pii() / deidentify() API that auto-selects MLX on Apple Silicon and PyTorch elsewhere — same code on every host.

pip install -U "openmed[mlx]"

from openmed import extract_pii, deidentify

text = (
 "Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, "
 "phone 415-555-0123, email sarah.johnson@example.com."
)

# Extract grouped entity spans (runs on MLX here, PyTorch fallback elsewhere)
result = extract_pii(text, model_name="OpenMed/privacy-filter-nemotron-mlx")
for ent in result.entities:
 print(f"{ent.label:30s} {ent.text!r} conf={ent.confidence:.2f}")

# De-identify
masked = deidentify(text, method="mask",
 model_name="OpenMed/privacy-filter-nemotron-mlx")
fake = deidentify(
 text,
 method="replace",
 model_name="OpenMed/privacy-filter-nemotron-mlx",
 consistent=True,
 seed=42, # deterministic locale-aware Faker surrogates
)

When MLX isn't available (Linux, Windows, Intel Mac, missing mlx package), this exact same call automatically falls back to the PyTorch checkpoint OpenMed/privacy-filter-nemotron with a one-time warning. Family-aware fallback: a Nemotron MLX request never substitutes the unrelated openai/privacy-filter baseline.

Direct MLX usage (lower-level)

from huggingface_hub import snapshot_download
from openmed.mlx.inference import PrivacyFilterMLXPipeline

model_path = snapshot_download("OpenMed/privacy-filter-nemotron-mlx")
pipe = PrivacyFilterMLXPipeline(model_path)

print(pipe("Email me at alice.smith@example.com after 5pm."))
# [{'entity_group': 'email',
# 'score': 0.92,
# 'word': 'alice.smith@example.com',
# 'start': 12,
# 'end': 35}]

The pipeline returns a list of dicts with entity_group, score, word, start, and end (character offsets into the input string).

Loading from a local snapshot

from openmed.mlx.models import load_model
import mlx.core as mx

model = load_model("/path/to/privacy-filter-nemotron-mlx")
ids = mx.array([[1, 100, 200, 300]], dtype=mx.int32)
mask = mx.ones((1, 4), dtype=mx.bool_)
logits = model(ids, attention_mask=mask) # shape (1, 4, 221)

Hardware notes

Designed for Apple Silicon (M-series GPUs); CPU inference works but is slower.
Tested on macOS with mlx>=0.18. The MLX runtime in this repo is independent of mlx_lm (token classification, not causal LM).
Forward pass on a typical PII sentence (~10 tokens) takes ~14 ms on M-series GPU after warmup. For lower latency or smaller memory footprint, use the -mlx-8bit sibling instead.

Credits & Acknowledgements

This model wouldn't exist without two open-source releases — sincere thanks to both teams:

OpenAI for open-sourcing the Privacy Filter (architecture, modeling code, and opf training/eval CLI). The MLX port in this repo runs that same architecture under Apple's MLX framework.
NVIDIA for releasing the Nemotron-PII dataset used to fine-tune the source PyTorch checkpoint.

Additional thanks to Apple for MLX and the HuggingFace team for the model-distribution ecosystem.

License

Apache 2.0 (matches the source checkpoint).

Downloads last month: 2,679

MLX

Hardware compatibility

Quantized

Model tree for OpenMed/privacy-filter-nemotron-mlx

Base model

openai/privacy-filter

Finetuned

OpenMed/privacy-filter-nemotron