VOOZH about

URL: https://thenewstack.io/openai-privacy-filter-pii/

⇱ OpenAI's new Privacy Filter runs on your laptop so PII never hits the cloud - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-04-23 16:54:34
OpenAI's new Privacy Filter runs on your laptop so PII never hits the cloud
AI / AI Infrastructure / Large Language Models

OpenAI’s new Privacy Filter runs on your laptop so PII never hits the cloud

OpenAI's new Privacy Filter detects and redacts PII in long-form text with a 96% F1 score, runs locally, and handles up to 128,000 tokens in one pass.
Apr 23rd, 2026 4:54pm by Meredith Shubel
👁 Featued image for: OpenAI’s new Privacy Filter runs on your laptop so PII never hits the cloud
Eduardo Ramos via Unsplash+

OpenAI has debuted Privacy Filter, a bidirectional token-classification model for detecting and redacting personally identifiable information (PII) that can scan long-form text in a single pass, run locally, and deliver greater context-awareness. 

Scanning text in a single pass for emails, numbers, and more

For developers working with large language models (LLMs), data privacy has long been a recurring issue. But with its new Privacy Filter, released on Wednesday, OpenAI is essentially opening up access to what it uses in-house for its own privacy-preserving workflows.

So, how does it work? 

As OpenAI explains in its announcement blog post, it starts with an autoregressive pretrained checkpoint and converts it into a token classifier over a fixed taxonomy of privacy labels. 

Rather than generating each token at a time, it “labels an input sequence in one pass and then decodes coherent spans with a constrained Viterbi procedure.”

There are eight such labels, allowing Privacy Filter to mask or redact names, addresses, emails, phone numbers, URLs, dates, account numbers, and secrets (e.g., API keys or passwords).

(It’s a decent round-up, but it doesn’t catch everything; social security numbers and passport numbers, for example, are overlooked.) 

Greater context-awareness, run locally

OpenAI claims Privacy Filter has greater context awareness, allowing it to pick up on subtler personal information and make more nuanced decisions.

“By combining strong language understanding with a privacy-specific labeling system, it can detect a wider range of PII in unstructured text, including cases where the right decision depends on context.”  

Specifically, the AI company claims its bidirectional token-classification model is a step up from traditional PII detection tools (such as regular expressions (RegEx) or NLP libraries), which typically rely on deterministic rules for format. 

While these approaches might get the job done for simpler cases, like phone numbers or email addresses, they’re more likely to run into problems when context introduces more subtlety: 

“By combining strong language understanding with a privacy-specific labeling system, it can detect a wider range of PII in unstructured text, including cases where the right decision depends on context.” 

For example, Privacy Filter should be able to distinguish between publicly available information that it can preserve and private information that it should mask or redact, such as a public business address versus a private home address. 

This focus on context also comes into play when processing lengthy documents with unstructured text. OpenAI says Privacy Filter was specifically designed to catch PII in “noisy, real-world” texts, perhaps support logs, long legal filings, and the like. To scan these long-form texts without chunking, the model supports up to 128,000 tokens of context. 

Privacy Filter is also notably small. 

At 1.5 billion total parameters with 50 million active parameters, the model is snappy enough to run locally on a browser or laptop. Besides efficiency gains, this means developers can use Privacy Filter to mask and redact PII in their own environments, thereby reducing exposure risks for sensitive data. 

How it compares to the competition

In its announcement blog post, OpenAI boasts that Privacy Filter “achieves state-of-the-art performance on the PII-Masking-300k benchmark, when corrected for annotation issues we identified during evaluation.” 

What it calls “state of the art” is an F1 score of 96% (94.04% precision and 98.04% recall). 

Of course, OpenAI isn’t the first to offer a PII detection and redaction solution.

Microsoft’s Presidio, for example, is an open-source framework for detecting, redacting, masking, and anonymizing text, images, and structured data. Here, Microsoft might win: In its blog post, OpenAI flat-out states that Privacy Filter is not an anonymization tool but “one component in a broader privacy-by-design system.” 

Amazon’s Comprehend, meanwhile, is a managed service for PII detection and redaction in AWS workflows. 

Stacked up against existing competitors, Privacy Filter stands out for its context-aware, locally run design. 

Where Microsoft may give developers more capabilities than Privacy Filter, OpenAI’s model makes up for its smaller scope with greater context-awareness and local deployment — at least against Amazon’s managed service. 

What this means for developers

For developers building RAG systems, developing customer support pipelines, or orchestrating any other workflow that requires feeding user text into an LLM, OpenAI says Privacy Filter should slot in nicely. 

It’s the option for fine-tuning that adds extra appeal to OpenAI’s model. 

And supposedly, it only takes a small amount of data to see results. In its model card, OpenAI reports that “training on 10% of the dataset is enough to drive F1 scores above 96%.” 

That means with relatively little data, developers can adapt OpenAI’s model for different data distributions, privacy policies, and domain-specific tasks. 

That said, OpenAI expresses caution about high-sensitivity domains, such as legal, medical, and financial workflows, reminding developers to keep human review in the loop and prepare for potential mistakes. 

“Training on 10% of the dataset is enough to drive F1 scores above 96%.” 

One more piece in OpenAI’s stack

Privacy Filter is available today on Hugging Face and GitHub under the Apache 2.0 license. 

It comes alongside OpenAI’s launch of GPT-5.5, released on Thursday, a new model that OpenAI calls “a new class of intelligence.”

TRENDING STORIES
Meredith Shubel is a technical writer covering cloud infrastructure and enterprise software. She has contributed to The New Stack since 2022, profiling startups and exploring how organizations adopt emerging technologies. Beyond The New Stack, she ghostwrites white papers, executive bylines,...
Read more from Meredith Shubel
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.