This is the integration endpoint — the URL you paste into your voice platform’s “custom LLM” field. It speaks the OpenAI chat completions API exactly, so any client that can talk to OpenAI can talk to Masker without code changes. Your voice platform calls this endpoint; you do not call it directly from your application code. When a request arrives, Masker redacts PHI from all message content before forwarding the sanitized payload to the upstream LLM. The LLM response is then scanned for replacement tokens, which are rehydrated back to their original values before the response is returned to the caller. The caller sees a clean response; the upstream LLM never sees raw PHI.Documentation Index
Fetch the complete documentation index at: https://docs.masker.dev/llms.txt
Use this file to discover all available pages before exploring further.
This endpoint sits outside the
/api/v1 namespace because it’s called by external systems that follow the OpenAI URL convention.Endpoint
Path parameters
Masker agent ID in
agt_* ULID format. Treat this like an API key — keep the proxy URL confidential.Authentication
This endpoint is not authenticated bymasker_session cookie. Voice platforms calling it do not have a session. Authentication relies on two mechanisms:
- The
agent_idin the URL acts as a shared secret. Do not expose the proxy URL publicly. - When configured, HMAC signature verification validates the
X-Vapi-Signatureheader againstMASKER_VAPI_WEBHOOK_SECRET.
Request body
The request body follows the standard OpenAI chat completions schema. Masker accepts every field OpenAI accepts and passes through unrecognized fields.The model to use. Must be compatible with the agent’s configured
upstream. If the request specifies a model the agent is not allowed to use, Masker returns 422 model_not_allowed.Array of message objects (
role + content). PHI is redacted from all content fields before forwarding.If
true, the response is streamed as Server-Sent Events (text/event-stream). Streaming is fully supported — response chunks are scanned for tokens and rehydrated inline.Sampling temperature, passed through to the upstream LLM unchanged.
Maximum tokens in the response, passed through unchanged.
Tool definitions. Tool descriptions and function names that contain PHI are also redacted.
Tool selection mode, passed through unchanged.
Processing pipeline
- Receive the request body.
- Detect and redact PHI in
messages[*].content, tool descriptions, and function names. - Forward the sanitized body to the upstream LLM provider.
- Buffer or stream the response from the upstream LLM.
- Scan the response for Masker replacement tokens and rehydrate them to original values.
- Return the rehydrated response to the caller.
Response
The response is identical in shape to an OpenAI chat completions response. PHI tokens in the LLM output are rehydrated before the response reaches the caller. For streaming requests, the response usestext/event-stream with standard OpenAI SSE chunks.
Latency
Masker adds approximately 45–95 ms of end-to-end latency on top of the upstream LLM’s response time.Rate limit
100 requests/second sustained, burst 200. Rate-limited requests receive429 with a Retry-After header.
Configuration in Vapi
Set the proxy URL as your Vapi assistant’s Custom LLM URL:gpt-4o-mini). No other code changes are required.
Example
Errors
Errors are returned in OpenAI-compatible shape so existing clients handle them naturally.| Status | Code | Meaning |
|---|---|---|
401 | bad_signature | HMAC signature verification failed |
404 | agent_not_found | The agent_id in the URL does not match any active agent |
422 | model_not_allowed | The requested model does not match the agent’s configured upstream |
429 | rate_limited | Account or agent quota exceeded; respect Retry-After |
502 | upstream_error | The upstream LLM returned an error; the error is passed through |
504 | upstream_timeout | The upstream LLM did not respond within the configured timeout |