Masker works as a forward proxy. You point your voice platform at a Masker URL instead of directly at your LLM, and every conversation turn flows through the pipeline below. Your LLM provider only ever sees tokens. Your callers hear natural responses. The full pipeline adds 45–95 ms of overhead.Documentation Index
Fetch the complete documentation index at: https://docs.masker.dev/llms.txt
Use this file to discover all available pages before exploring further.
Request flow
Caller speaks
Your voice platform (Vapi, ElevenLabs, Bolna) handles the phone connection, runs speech-to-text, and assembles a chat-completions request with the conversation history. This is standard behavior — Masker has not changed anything yet.
Voice platform POSTs to Masker
Instead of posting to OpenAI directly, your voice platform sends the request to your per-agent Masker proxy URL:The
{agent_id} is generated when you create an agent in the portal. This is the only configuration change required on your voice platform — one URL swap.Masker detects PHI
Masker runs a two-pass detection pipeline over the request body:
- Pass 1 — regex: Structured PHI with known patterns: SSN, US phone and fax, email, ZIP code, date of birth, MRN, account numbers, URLs, IP addresses, credit card numbers (Luhn-checked), and VINs.
- Pass 2 — NER: Unstructured PHI using Gemma-4 named-entity recognition: person names, organizations, locations, and medical terms that don’t match a regex pattern.
Masker tokenizes each span
Every detected span is replaced with a stable token. Masker supports two tokenization schemes:Breaking that down:
The original values are stored in a per-session SQLite vault, keyed by session ID and token. Nothing is persisted on the public demo; self-hosted deployments retain only the encrypted vault.
- Vault-deterministic (HMAC-SHA256): The same input always produces the same token within a session. The LLM can refer to the same person consistently across turns.
- Reversible AEAD (AES-256-GCM-SIV): Each tokenization produces a ciphertext that can be reversed using the session key. Used when the original value must be recoverable server-side.
| Segment | Meaning |
|---|---|
MSKV1 | Token version (Masker v1) |
PHONE | PHI kind — the category of data that was redacted |
K_HEALTHCARE | Key context — the key ring used for this session |
A1B2C3... | Encoded value — the HMAC or AEAD ciphertext |
Masker forwards the masked request to your LLM
The redacted request body — with all PHI spans replaced by tokens — is forwarded to your configured upstream LLM. The default is OpenAI
gpt-4o-mini. Masker’s proxy endpoint is OpenAI-compatible, so any voice platform with a Custom LLM URL setting works without code changes.Your LLM responds with tokens intact
The LLM treats tokens as opaque strings and responds naturally. A typical masked response looks like:Your LLM provider’s logs — OpenAI, Anthropic, or any other — only ever contain these token strings.
Masker rehydrates the response
Masker walks the response, finds every token, looks it up in the per-session vault, and substitutes the original value back in. The result returned to your voice platform:The caller hears a natural response. No
[REDACTED], no broken references.Masker writes the audit chain
Every detection and redaction event is appended to a hash-chained journal. Each entry carries a SHA-256 A single mutated byte anywhere in the chain breaks every downstream hash. The session’s
prev_hash linking it to the previous event and a curr_hash covering its own contents plus the previous hash:merkle_root_hex can be verified offline. POST /audit/verify re-runs the chain check and returns {ok, event_count, message} — the literal string "chain ok" is what you show an auditor.Three artifacts per session
Every completed session produces three artifacts, all derived from the same event chain:| Artifact | What it is | Who uses it |
|---|---|---|
| Live firewall view | Side-by-side transcript split at the compliance boundary. Left: real PHI (patient ↔ voice vendor). Right: tokens only (Masker ↔ LLM). Animated chips show each redaction and rehydration as it happens. | Auditors, compliance reviews |
| Audit chain | Hash-linked JSONL journal of every detection, redaction, and rehydration event. Tamper-evident; verifiable offline via merkle_root_hex. | Forensic review, BAA chain documentation |
| Compliance report | Ed25519-signed JSON + auditor-ready PDF. Includes HIPAA Safe Harbor coverage (9/18 fully covered today), PCI-DSS scope, leak detection results, and retention attestation. Both formats share the same merkle_root_hex. | Auditors, compliance officers, legal |
HIPAA Safe Harbor coverage today
Masker currently fully covers 9 of the 18 HIPAA Safe Harbor identifier categories, with 3 partial:| Category | Identifier | Status |
|---|---|---|
| D | Phone number | ✅ Full |
| E | Fax number | ✅ Full |
| F | Email address | ✅ Full |
| G | Social security number | ✅ Full |
| N | URLs | ✅ Full |
| O | IP addresses | ✅ Full |
| H | Medical record number | ✅ Full |
| P | Account numbers | ✅ Full |
| — | Credit card (PCI-DSS) | ✅ Full |
| B | Geographic data / ZIP | 🟡 Partial |
| C | Dates | 🟡 Partial |
| L | VIN | 🟡 Partial |
| A | Names | 🟡 NER (Gemma-4) |
Latency budget
| Stage | Typical latency |
|---|---|
| Detection — regex (Pass 1) | < 1 ms |
| Detection — Gemma-4 NER (Pass 2) | 20–50 ms |
| Tokenization + vault write | 2–5 ms |
| Rehydration on response | < 5 ms |
| Network hop overhead (voice vendor → Masker) | 15–30 ms |
| Total added latency | 45–95 ms |
What to read next
Quickstart
First masked call in under five minutes.
Beta access
Production access with a signed BAA and VPC deployment.