How Masker's compliance firewall and proxy pipeline work

Masker works as a forward proxy. You point your voice platform at a Masker URL instead of directly at your LLM, and every conversation turn flows through the pipeline below. Your LLM provider only ever sees tokens. Your callers hear natural responses. The full pipeline adds 45–95 ms of overhead.

Request flow

Caller ↔ Voice Vendor ──► Masker proxy ──► Your LLM
                               │                │
                          detect + tokenize      │
                               │           LLM responds
                               │           with tokens
                          rehydrate ◄────────────┘
                               │
                          Audit chain
                          (hash-linked)
                               │
                          Compliance report
                          (JSON + PDF, signed)

Caller speaks

Your voice platform (Vapi, ElevenLabs, Bolna) handles the phone connection, runs speech-to-text, and assembles a chat-completions request with the conversation history. This is standard behavior — Masker has not changed anything yet.

Voice platform POSTs to Masker

Instead of posting to OpenAI directly, your voice platform sends the request to your per-agent Masker proxy URL:

POST https://masker-voice.fly.dev/proxy/{agent_id}/v1/chat/completions

The {agent_id} is generated when you create an agent in the portal. This is the only configuration change required on your voice platform — one URL swap.

Masker detects PHI

Masker runs a two-pass detection pipeline over the request body:

Pass 1 — regex: Structured PHI with known patterns: SSN, US phone and fax, email, ZIP code, date of birth, MRN, account numbers, URLs, IP addresses, credit card numbers (Luhn-checked), and VINs.
Pass 2 — NER: Unstructured PHI using Gemma-4 named-entity recognition: person names, organizations, locations, and medical terms that don’t match a regex pattern.

Each detected span is recorded with its character offsets, detector ID, and confidence score before tokenization begins.

Masker tokenizes each span

Every detected span is replaced with a stable token. Masker supports two tokenization schemes:

Vault-deterministic (HMAC-SHA256): The same input always produces the same token within a session. The LLM can refer to the same person consistently across turns.
Reversible AEAD (AES-256-GCM-SIV): Each tokenization produces a ciphertext that can be reversed using the session key. Used when the original value must be recoverable server-side.

The token format is:

MSKV1.PHONE.K_HEALTHCARE.A1B2C3D4E5F6G7H8I9J0KL

Breaking that down:

Segment	Meaning
`MSKV1`	Token version (Masker v1)
`PHONE`	PHI kind — the category of data that was redacted
`K_HEALTHCARE`	Key context — the key ring used for this session
`A1B2C3...`	Encoded value — the HMAC or AEAD ciphertext

The original values are stored in a per-session SQLite vault, keyed by session ID and token. Nothing is persisted on the public demo; self-hosted deployments retain only the encrypted vault.

Masker forwards the masked request to your LLM

The redacted request body — with all PHI spans replaced by tokens — is forwarded to your configured upstream LLM. The default is OpenAI gpt-4o-mini. Masker’s proxy endpoint is OpenAI-compatible, so any voice platform with a Custom LLM URL setting works without code changes.

Your LLM responds with tokens intact

The LLM treats tokens as opaque strings and responds naturally. A typical masked response looks like:

"Thanks, MSKV1.person_name.K_HEALTHCARE.a3f9. I have your appointment
booked for MSKV1.dob.K_HEALTHCARE.b7c2."

Your LLM provider’s logs — OpenAI, Anthropic, or any other — only ever contain these token strings.

Masker rehydrates the response

Masker walks the response, finds every token, looks it up in the per-session vault, and substitutes the original value back in. The result returned to your voice platform:

"Thanks, John. I have your appointment booked for March 14th."

The caller hears a natural response. No [REDACTED], no broken references.

Masker writes the audit chain

Every detection and redaction event is appended to a hash-chained journal. Each entry carries a SHA-256 prev_hash linking it to the previous event and a curr_hash covering its own contents plus the previous hash:

{"seq":0,"kind":"detection","detector":"ssn_v1","placeholder":"[SSN_01]","prev_hash":"0000…","curr_hash":"a3f2…","ts":"2026-05-01T18:33:01Z"}
{"seq":1,"kind":"detection","detector":"usphone_v2","placeholder":"[USPHONE_01]","prev_hash":"a3f2…","curr_hash":"7c9e…","ts":"2026-05-01T18:33:02Z"}
{"seq":2,"kind":"redaction_applied","span":[12,23],"placeholder":"[SSN_01]","prev_hash":"7c9e…","curr_hash":"e1b4…","ts":"2026-05-01T18:33:02Z"}

A single mutated byte anywhere in the chain breaks every downstream hash. The session’s merkle_root_hex can be verified offline. POST /audit/verify re-runs the chain check and returns {ok, event_count, message} — the literal string "chain ok" is what you show an auditor.

Three artifacts per session

Every completed session produces three artifacts, all derived from the same event chain:

Artifact	What it is	Who uses it
Live firewall view	Side-by-side transcript split at the compliance boundary. Left: real PHI (patient ↔ voice vendor). Right: tokens only (Masker ↔ LLM). Animated chips show each redaction and rehydration as it happens.	Auditors, compliance reviews
Audit chain	Hash-linked JSONL journal of every detection, redaction, and rehydration event. Tamper-evident; verifiable offline via `merkle_root_hex`.	Forensic review, BAA chain documentation
Compliance report	Ed25519-signed JSON + auditor-ready PDF. Includes HIPAA Safe Harbor coverage (9/18 fully covered today), PCI-DSS scope, leak detection results, and retention attestation. Both formats share the same `merkle_root_hex`.	Auditors, compliance officers, legal

HIPAA Safe Harbor coverage today

Masker currently fully covers 9 of the 18 HIPAA Safe Harbor identifier categories, with 3 partial:

Category	Identifier	Status
D	Phone number	✅ Full
E	Fax number	✅ Full
F	Email address	✅ Full
G	Social security number	✅ Full
N	URLs	✅ Full
O	IP addresses	✅ Full
H	Medical record number	✅ Full
P	Account numbers	✅ Full
—	Credit card (PCI-DSS)	✅ Full
B	Geographic data / ZIP	🟡 Partial
C	Dates	🟡 Partial
L	VIN	🟡 Partial
A	Names	🟡 NER (Gemma-4)

Full 18-category coverage is targeted for the May 30, 2026 production beta. See hello@masker.dev if you need coverage for a specific PHI shape sooner.

Latency budget

Stage	Typical latency
Detection — regex (Pass 1)	< 1 ms
Detection — Gemma-4 NER (Pass 2)	20–50 ms
Tokenization + vault write	2–5 ms
Rehydration on response	< 5 ms
Network hop overhead (voice vendor → Masker)	15–30 ms
Total added latency	45–95 ms

This is well within Vapi’s first-token-latency budget (< 1500 ms typical) and is imperceptible to callers in real-time conversation. If you observe higher overhead in your environment, that’s a bug — report it at hello@masker.dev.

Quickstart

First masked call in under five minutes.

Beta access

Production access with a signed BAA and VPC deployment.

Get Started

Masking

Integrations

Compliance

Configuration

How Masker's compliance firewall and proxy pipeline work

Request flow

Three artifacts per session

HIPAA Safe Harbor coverage today

Latency budget

What to read next

Quickstart

Beta access

Get Started

Masking

Integrations

Compliance

Configuration

Documentation Index

​Request flow

​Three artifacts per session

​HIPAA Safe Harbor coverage today

​Latency budget

​What to read next

Quickstart

Beta access

Request flow

Three artifacts per session

HIPAA Safe Harbor coverage today

Latency budget

What to read next