Redact: A Transparent Privacy Filter for the AI Era

May 9, 2026

ai-privacypiianonymizationmitmproxyspacy

As AI tools become embedded in daily workflows — from browser-based chat interfaces to API-driven code assistants — every prompt sent to an external service is a potential data leak. Personally identifiable information (PII) routinely ends up in training data, logs, and model contexts with no easy way to recall it.

The Problem

Most privacy solutions require users to change their behavior: manually scrubbing prompts, using specialized interfaces, or routing through enterprise proxy systems that need IT involvement. This friction means they rarely get adopted outside compliance-mandated environments.

How Redact Works

Redact takes a different approach. It sits transparently between your Linux machine and AI services — both browser-based and API-based — intercepting outbound traffic and stripping PII before data reaches external servers. Built on mitmproxy for traffic interception and spaCy for NLP-based entity recognition, Redact requires no changes to existing applications or workflows.

The key design decisions:

Transparent interception — works with any browser or API client without configuration changes
NLP-based detection — uses spaCy models to identify names, addresses, emails, phone numbers, and other PII entities rather than relying on brittle regex patterns
Local processing — all anonymization happens on-device; no PII ever leaves the machine
Reversible mapping — maintains a local mapping table so responses containing anonymized tokens can be de-anonymized for the user

Architecture

Redact runs as a local proxy (via mitmproxy) that intercepts HTTPS traffic to known AI service domains. For each outbound request:

The request body is parsed to extract user-provided text content
spaCy NER models identify PII entities in the text
Detected entities are replaced with consistent placeholder tokens
The sanitized request is forwarded to the AI service
The response is scanned for placeholder tokens and de-anonymized before returning to the user

Trade-offs

Redact is designed for individual users on Linux workstations. It does not attempt to solve enterprise-scale data governance or multi-tenant privacy. NLP-based detection has inherent false negative rates — domain-specific PII (e.g., internal project names, proprietary identifiers) requires custom entity training. The transparent proxy approach also means it only protects traffic from the machine where it is installed.

Source

The project is open source and available on GitHub: drig-ai/redact