Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.masker.dev/llms.txt

Use this file to discover all available pages before exploring further.

Masker works as a forward proxy. You point your voice platform at a Masker URL instead of directly at your LLM, and every conversation turn flows through the pipeline below. Your LLM provider only ever sees tokens. Your callers hear natural responses. The full pipeline adds 45–95 ms of overhead.

Request flow

Caller ↔ Voice Vendor ──► Masker proxy ──► Your LLM
                               │                │
                          detect + tokenize      │
                               │           LLM responds
                               │           with tokens
                          rehydrate ◄────────────┘

                          Audit chain
                          (hash-linked)

                          Compliance report
                          (JSON + PDF, signed)
1

Caller speaks

Your voice platform (Vapi, ElevenLabs, Bolna) handles the phone connection, runs speech-to-text, and assembles a chat-completions request with the conversation history. This is standard behavior — Masker has not changed anything yet.
2

Voice platform POSTs to Masker

Instead of posting to OpenAI directly, your voice platform sends the request to your per-agent Masker proxy URL:
POST https://masker-voice.fly.dev/proxy/{agent_id}/v1/chat/completions
The {agent_id} is generated when you create an agent in the portal. This is the only configuration change required on your voice platform — one URL swap.
3

Masker detects PHI

Masker runs a two-pass detection pipeline over the request body:
  • Pass 1 — regex: Structured PHI with known patterns: SSN, US phone and fax, email, ZIP code, date of birth, MRN, account numbers, URLs, IP addresses, credit card numbers (Luhn-checked), and VINs.
  • Pass 2 — NER: Unstructured PHI using Gemma-4 named-entity recognition: person names, organizations, locations, and medical terms that don’t match a regex pattern.
Each detected span is recorded with its character offsets, detector ID, and confidence score before tokenization begins.
4

Masker tokenizes each span

Every detected span is replaced with a stable token. Masker supports two tokenization schemes:
  • Vault-deterministic (HMAC-SHA256): The same input always produces the same token within a session. The LLM can refer to the same person consistently across turns.
  • Reversible AEAD (AES-256-GCM-SIV): Each tokenization produces a ciphertext that can be reversed using the session key. Used when the original value must be recoverable server-side.
The token format is:
MSKV1.PHONE.K_HEALTHCARE.A1B2C3D4E5F6G7H8I9J0KL
Breaking that down:
SegmentMeaning
MSKV1Token version (Masker v1)
PHONEPHI kind — the category of data that was redacted
K_HEALTHCAREKey context — the key ring used for this session
A1B2C3...Encoded value — the HMAC or AEAD ciphertext
The original values are stored in a per-session SQLite vault, keyed by session ID and token. Nothing is persisted on the public demo; self-hosted deployments retain only the encrypted vault.
5

Masker forwards the masked request to your LLM

The redacted request body — with all PHI spans replaced by tokens — is forwarded to your configured upstream LLM. The default is OpenAI gpt-4o-mini. Masker’s proxy endpoint is OpenAI-compatible, so any voice platform with a Custom LLM URL setting works without code changes.
6

Your LLM responds with tokens intact

The LLM treats tokens as opaque strings and responds naturally. A typical masked response looks like:
"Thanks, MSKV1.person_name.K_HEALTHCARE.a3f9. I have your appointment
booked for MSKV1.dob.K_HEALTHCARE.b7c2."
Your LLM provider’s logs — OpenAI, Anthropic, or any other — only ever contain these token strings.
7

Masker rehydrates the response

Masker walks the response, finds every token, looks it up in the per-session vault, and substitutes the original value back in. The result returned to your voice platform:
"Thanks, John. I have your appointment booked for March 14th."
The caller hears a natural response. No [REDACTED], no broken references.
8

Masker writes the audit chain

Every detection and redaction event is appended to a hash-chained journal. Each entry carries a SHA-256 prev_hash linking it to the previous event and a curr_hash covering its own contents plus the previous hash:
{"seq":0,"kind":"detection","detector":"ssn_v1","placeholder":"[SSN_01]","prev_hash":"0000…","curr_hash":"a3f2…","ts":"2026-05-01T18:33:01Z"}
{"seq":1,"kind":"detection","detector":"usphone_v2","placeholder":"[USPHONE_01]","prev_hash":"a3f2…","curr_hash":"7c9e…","ts":"2026-05-01T18:33:02Z"}
{"seq":2,"kind":"redaction_applied","span":[12,23],"placeholder":"[SSN_01]","prev_hash":"7c9e…","curr_hash":"e1b4…","ts":"2026-05-01T18:33:02Z"}
A single mutated byte anywhere in the chain breaks every downstream hash. The session’s merkle_root_hex can be verified offline. POST /audit/verify re-runs the chain check and returns {ok, event_count, message} — the literal string "chain ok" is what you show an auditor.

Three artifacts per session

Every completed session produces three artifacts, all derived from the same event chain:
ArtifactWhat it isWho uses it
Live firewall viewSide-by-side transcript split at the compliance boundary. Left: real PHI (patient ↔ voice vendor). Right: tokens only (Masker ↔ LLM). Animated chips show each redaction and rehydration as it happens.Auditors, compliance reviews
Audit chainHash-linked JSONL journal of every detection, redaction, and rehydration event. Tamper-evident; verifiable offline via merkle_root_hex.Forensic review, BAA chain documentation
Compliance reportEd25519-signed JSON + auditor-ready PDF. Includes HIPAA Safe Harbor coverage (9/18 fully covered today), PCI-DSS scope, leak detection results, and retention attestation. Both formats share the same merkle_root_hex.Auditors, compliance officers, legal

HIPAA Safe Harbor coverage today

Masker currently fully covers 9 of the 18 HIPAA Safe Harbor identifier categories, with 3 partial:
CategoryIdentifierStatus
DPhone number✅ Full
EFax number✅ Full
FEmail address✅ Full
GSocial security number✅ Full
NURLs✅ Full
OIP addresses✅ Full
HMedical record number✅ Full
PAccount numbers✅ Full
Credit card (PCI-DSS)✅ Full
BGeographic data / ZIP🟡 Partial
CDates🟡 Partial
LVIN🟡 Partial
ANames🟡 NER (Gemma-4)
Full 18-category coverage is targeted for the May 30, 2026 production beta. See hello@masker.dev if you need coverage for a specific PHI shape sooner.

Latency budget

StageTypical latency
Detection — regex (Pass 1)< 1 ms
Detection — Gemma-4 NER (Pass 2)20–50 ms
Tokenization + vault write2–5 ms
Rehydration on response< 5 ms
Network hop overhead (voice vendor → Masker)15–30 ms
Total added latency45–95 ms
This is well within Vapi’s first-token-latency budget (< 1500 ms typical) and is imperceptible to callers in real-time conversation. If you observe higher overhead in your environment, that’s a bug — report it at hello@masker.dev.

Quickstart

First masked call in under five minutes.

Beta access

Production access with a signed BAA and VPC deployment.