> ## Documentation Index > Fetch the complete documentation index at: https://docs.masker.dev/llms.txt > Use this file to discover all available pages before exploring further. # How Masker's compliance firewall and proxy pipeline work > Trace every conversation turn from your voice vendor through detection, tokenization, LLM forwarding, rehydration, and audit chain generation. Masker works as a forward proxy. You point your voice platform at a Masker URL instead of directly at your LLM, and every conversation turn flows through the pipeline below. Your LLM provider only ever sees tokens. Your callers hear natural responses. The full pipeline adds 45–95 ms of overhead. ## Request flow ``` Caller ↔ Voice Vendor ──► Masker proxy ──► Your LLM │ │ detect + tokenize │ │ LLM responds │ with tokens rehydrate ◄────────────┘ │ Audit chain (hash-linked) │ Compliance report (JSON + PDF, signed) ``` Your voice platform (Vapi, ElevenLabs, Bolna) handles the phone connection, runs speech-to-text, and assembles a chat-completions request with the conversation history. This is standard behavior — Masker has not changed anything yet. Instead of posting to OpenAI directly, your voice platform sends the request to your per-agent Masker proxy URL: ``` POST https://masker-voice.fly.dev/proxy/{agent_id}/v1/chat/completions ``` The `{agent_id}` is generated when you create an agent in the [portal](https://masker-voice.fly.dev/portal/login). This is the only configuration change required on your voice platform — one URL swap. Masker runs a two-pass detection pipeline over the request body: * **Pass 1 — regex:** Structured PHI with known patterns: SSN, US phone and fax, email, ZIP code, date of birth, MRN, account numbers, URLs, IP addresses, credit card numbers (Luhn-checked), and VINs. * **Pass 2 — NER:** Unstructured PHI using Gemma-4 named-entity recognition: person names, organizations, locations, and medical terms that don't match a regex pattern. Each detected span is recorded with its character offsets, detector ID, and confidence score before tokenization begins. Every detected span is replaced with a stable token. Masker supports two tokenization schemes: * **Vault-deterministic (HMAC-SHA256):** The same input always produces the same token within a session. The LLM can refer to the same person consistently across turns. * **Reversible AEAD (AES-256-GCM-SIV):** Each tokenization produces a ciphertext that can be reversed using the session key. Used when the original value must be recoverable server-side. The token format is: ``` MSKV1.PHONE.K_HEALTHCARE.A1B2C3D4E5F6G7H8I9J0KL ``` Breaking that down: | Segment | Meaning | | -------------- | ------------------------------------------------- | | `MSKV1` | Token version (Masker v1) | | `PHONE` | PHI kind — the category of data that was redacted | | `K_HEALTHCARE` | Key context — the key ring used for this session | | `A1B2C3...` | Encoded value — the HMAC or AEAD ciphertext | The original values are stored in a per-session SQLite vault, keyed by session ID and token. Nothing is persisted on the public demo; self-hosted deployments retain only the encrypted vault. The redacted request body — with all PHI spans replaced by tokens — is forwarded to your configured upstream LLM. The default is OpenAI `gpt-4o-mini`. Masker's proxy endpoint is OpenAI-compatible, so any voice platform with a Custom LLM URL setting works without code changes. The LLM treats tokens as opaque strings and responds naturally. A typical masked response looks like: ``` "Thanks, MSKV1.person_name.K_HEALTHCARE.a3f9. I have your appointment booked for MSKV1.dob.K_HEALTHCARE.b7c2." ``` Your LLM provider's logs — OpenAI, Anthropic, or any other — only ever contain these token strings. Masker walks the response, finds every token, looks it up in the per-session vault, and substitutes the original value back in. The result returned to your voice platform: ``` "Thanks, John. I have your appointment booked for March 14th." ``` The caller hears a natural response. No `[REDACTED]`, no broken references. Every detection and redaction event is appended to a hash-chained journal. Each entry carries a SHA-256 `prev_hash` linking it to the previous event and a `curr_hash` covering its own contents plus the previous hash: ```jsonl theme={null} {"seq":0,"kind":"detection","detector":"ssn_v1","placeholder":"[SSN_01]","prev_hash":"0000…","curr_hash":"a3f2…","ts":"2026-05-01T18:33:01Z"} {"seq":1,"kind":"detection","detector":"usphone_v2","placeholder":"[USPHONE_01]","prev_hash":"a3f2…","curr_hash":"7c9e…","ts":"2026-05-01T18:33:02Z"} {"seq":2,"kind":"redaction_applied","span":[12,23],"placeholder":"[SSN_01]","prev_hash":"7c9e…","curr_hash":"e1b4…","ts":"2026-05-01T18:33:02Z"} ``` A single mutated byte anywhere in the chain breaks every downstream hash. The session's `merkle_root_hex` can be verified offline. `POST /audit/verify` re-runs the chain check and returns `{ok, event_count, message}` — the literal string `"chain ok"` is what you show an auditor. ## Three artifacts per session Every completed session produces three artifacts, all derived from the same event chain: | Artifact | What it is | Who uses it | | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | | **Live firewall view** | Side-by-side transcript split at the compliance boundary. Left: real PHI (patient ↔ voice vendor). Right: tokens only (Masker ↔ LLM). Animated chips show each redaction and rehydration as it happens. | Auditors, compliance reviews | | **Audit chain** | Hash-linked JSONL journal of every detection, redaction, and rehydration event. Tamper-evident; verifiable offline via `merkle_root_hex`. | Forensic review, BAA chain documentation | | **Compliance report** | Ed25519-signed JSON + auditor-ready PDF. Includes HIPAA Safe Harbor coverage (9/18 fully covered today), PCI-DSS scope, leak detection results, and retention attestation. Both formats share the same `merkle_root_hex`. | Auditors, compliance officers, legal | ## HIPAA Safe Harbor coverage today Masker currently fully covers 9 of the 18 HIPAA Safe Harbor identifier categories, with 3 partial: | Category | Identifier | Status | | -------- | ---------------------- | ---------------- | | D | Phone number | ✅ Full | | E | Fax number | ✅ Full | | F | Email address | ✅ Full | | G | Social security number | ✅ Full | | N | URLs | ✅ Full | | O | IP addresses | ✅ Full | | H | Medical record number | ✅ Full | | P | Account numbers | ✅ Full | | — | Credit card (PCI-DSS) | ✅ Full | | B | Geographic data / ZIP | 🟡 Partial | | C | Dates | 🟡 Partial | | L | VIN | 🟡 Partial | | A | Names | 🟡 NER (Gemma-4) | Full 18-category coverage is targeted for the May 30, 2026 production beta. See [hello@masker.dev](mailto:hello@masker.dev) if you need coverage for a specific PHI shape sooner. ## Latency budget | Stage | Typical latency | | -------------------------------------------- | --------------- | | Detection — regex (Pass 1) | \< 1 ms | | Detection — Gemma-4 NER (Pass 2) | 20–50 ms | | Tokenization + vault write | 2–5 ms | | Rehydration on response | \< 5 ms | | Network hop overhead (voice vendor → Masker) | 15–30 ms | | **Total added latency** | **45–95 ms** | This is well within Vapi's first-token-latency budget (\< 1500 ms typical) and is imperceptible to callers in real-time conversation. If you observe higher overhead in your environment, that's a bug — report it at [hello@masker.dev](mailto:hello@masker.dev). ## What to read next First masked call in under five minutes. Production access with a signed BAA and VPC deployment.