Voice is what Masker was built for. The hard part isn’t masking text — it’s masking streaming, low-latency, partial transcripts without adding noticeable delay to the conversation. This page walks through exactly what happens on a typical call, from first word to final audio.Documentation Index
Fetch the complete documentation index at: https://docs.masker.dev/llms.txt
Use this file to discover all available pages before exploring further.
End-to-end flow
Your voice platform handles speech-to-text and text-to-speech. Masker sits in the middle, between the transcript and your LLM:Custom LLM endpoint
POST /proxy/{agent_id}/v1/chat/completionsYour voice platform calls this instead of OpenAI directly. Masker masks the request, forwards to your upstream model, rehydrates the response, and streams it back.Assistant request webhook
POST /vapi/webhook/{agent_id}Vapi-specific. Masker returns the configuration the platform should use for this call — system prompt, model, function definitions — and attaches audit metadata to the session.Latency budget
Masker adds 45–95 ms of overhead on a typical chat completion request:| Stage | Cost |
|---|---|
| Receive request, parse JSON | ~2 ms |
| Pass 1 — regex catalogue | ~1 ms |
| Pass 2 — Gemma-4 NER | 30–80 ms |
| Tokenize and write events | ~5 ms |
| Forward to upstream model | network only |
| Rehydrate response stream | ~5 ms |
| Write session record | async, off the hot path |
Streaming
Masker fully supports streaming chat completions (stream: true). Here is what happens:
Mask the request
The full request body arrives and Masker runs detection and tokenization before forwarding anything. The request leg is not streamed — it waits for a complete, masked payload.
Stream the response
The upstream LLM streams chunks back to Masker. Masker buffers each chunk just long enough to scan for tokens, then rehydrates inline and forwards.
Partial transcripts
When voice platforms send partial transcripts — “the user said: ‘my number is five five five’…” — Masker treats them like any other input. Detection runs, partial spans get masked, and rehydration handles the response. If the ASR corrects a partial in the next update (“…my number is five five five one two”), the updated partial is a fresh, independent request to Masker. Masker does not reconcile partials across messages — each request stands alone.Rehydration failures
If a token cannot be rehydrated on the response leg — because a key was rotated out, a vault row is missing, or the token is malformed — Masker:- Emits a
rehydration_failedevent to the audit log - Replaces the token inline with
[REDACTED:KIND]— for example,[REDACTED:PHONE] - Continues processing the rest of the response
The three session artifacts
Every call produces three artifacts, all derived from the same event stream:Live firewall view
Live firewall view
A side-by-side view of the call, split by a vertical compliance firewall:
- Left of the firewall (regulated): the patient-to-voice-vendor channel. Real SSNs, phones, names, and addresses live here and only here.
- Right of the firewall (public): the Masker-to-LLM channel. Every PHI span is replaced with its token.
- Across the firewall: animated chips that visualize each redaction going out and each rehydration coming back.
Audit chain (real-time)
Audit chain (real-time)
Every detection becomes a tamper-evident event in a hash-chained journal:Each event carries a
prev_hash linking it to the previous event and a curr_hash covering its own contents. A single mutated byte breaks every downstream hash. You can verify the chain offline, or call POST /audit/verify to get {"ok": true, "event_count": N, "message": "chain ok"}.If the durable journal append fails, Masker returns
AuditUnavailable and does not process the call. There are no quiet drops.Session compliance report (signed)
Session compliance report (signed)
At call end, Masker mints a HIPAA Safe Harbor compliance report as two consistent artifacts from the same event chain:
- Masker Audit Schema v1 JSON — machine-checkable, shareable with automated compliance tooling
- Auditor-ready HIPAA PDF — human-readable, suitable for review by a compliance officer
merkle_root_hex, so you can prove the PDF and JSON describe identical chains. The report includes HIPAA Safe Harbor coverage, PCI-DSS scope, leak detection results, retention attestation, and BAA chain status.Download both from the Reports tab in one click.Platform-specific notes
- Vapi
- Bolna
- Retell
- Set Masker’s proxy URL as the Custom LLM field in your Vapi assistant.
- Set Masker’s webhook URL as the Server URL field.
- Set the Server URL Secret to a value you also configure as
MASKER_VAPI_WEBHOOK_SECRET. Masker validates HMAC signatures on every webhook. - Vapi’s own credit warnings pass through Masker unmodified — Masker does not filter platform-level messages.
What Masker does not do for voice
Masker only sees text. It works on transcripts produced by your voice platform’s ASR engine, and produces text that your platform’s TTS engine speaks back.
- Masker does not run ASR. Your voice platform handles speech-to-text.
- Masker does not run TTS. Your voice platform speaks the rehydrated response to the caller.
- Masker does not record audio. Configure call recordings in Vapi, Bolna, or Retell directly, and apply a retention policy there.