Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.masker.dev/llms.txt

Use this file to discover all available pages before exploring further.

This is the integration endpoint — the URL you paste into your voice platform’s “custom LLM” field. It speaks the OpenAI chat completions API exactly, so any client that can talk to OpenAI can talk to Masker without code changes. Your voice platform calls this endpoint; you do not call it directly from your application code. When a request arrives, Masker redacts PHI from all message content before forwarding the sanitized payload to the upstream LLM. The LLM response is then scanned for replacement tokens, which are rehydrated back to their original values before the response is returned to the caller. The caller sees a clean response; the upstream LLM never sees raw PHI.
This endpoint sits outside the /api/v1 namespace because it’s called by external systems that follow the OpenAI URL convention.

Endpoint

POST /proxy/{agent_id}/v1/chat/completions

Path parameters

agent_id
string
required
Masker agent ID in agt_* ULID format. Treat this like an API key — keep the proxy URL confidential.

Authentication

This endpoint is not authenticated by masker_session cookie. Voice platforms calling it do not have a session. Authentication relies on two mechanisms:
  • The agent_id in the URL acts as a shared secret. Do not expose the proxy URL publicly.
  • When configured, HMAC signature verification validates the X-Vapi-Signature header against MASKER_VAPI_WEBHOOK_SECRET.
For high-security deployments, run Masker inside your VPC and add mTLS or IP allowlisting in front of the proxy.

Request body

The request body follows the standard OpenAI chat completions schema. Masker accepts every field OpenAI accepts and passes through unrecognized fields.
model
string
required
The model to use. Must be compatible with the agent’s configured upstream. If the request specifies a model the agent is not allowed to use, Masker returns 422 model_not_allowed.
messages
object[]
required
Array of message objects (role + content). PHI is redacted from all content fields before forwarding.
stream
boolean
default:"false"
If true, the response is streamed as Server-Sent Events (text/event-stream). Streaming is fully supported — response chunks are scanned for tokens and rehydrated inline.
temperature
number
Sampling temperature, passed through to the upstream LLM unchanged.
max_tokens
number
Maximum tokens in the response, passed through unchanged.
tools
object[]
Tool definitions. Tool descriptions and function names that contain PHI are also redacted.
tool_choice
string | object
Tool selection mode, passed through unchanged.

Processing pipeline

  1. Receive the request body.
  2. Detect and redact PHI in messages[*].content, tool descriptions, and function names.
  3. Forward the sanitized body to the upstream LLM provider.
  4. Buffer or stream the response from the upstream LLM.
  5. Scan the response for Masker replacement tokens and rehydrate them to original values.
  6. Return the rehydrated response to the caller.

Response

The response is identical in shape to an OpenAI chat completions response. PHI tokens in the LLM output are rehydrated before the response reaches the caller. For streaming requests, the response uses text/event-stream with standard OpenAI SSE chunks.

Latency

Masker adds approximately 45–95 ms of end-to-end latency on top of the upstream LLM’s response time.

Rate limit

100 requests/second sustained, burst 200. Rate-limited requests receive 429 with a Retry-After header.

Configuration in Vapi

Set the proxy URL as your Vapi assistant’s Custom LLM URL:
https://masker-voice.fly.dev/proxy/agt_01HYZ.../v1/chat/completions
Set the model field to match the agent’s configured upstream (e.g. gpt-4o-mini). No other code changes are required.

Example

curl -X POST \
  -H "Content-Type: application/json" \
  https://masker-voice.fly.dev/proxy/agt_01HYZ.../v1/chat/completions \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful healthcare assistant."},
      {"role": "user", "content": "Hi, this is Sarah Chen, my number is 415-555-2671."}
    ],
    "stream": false,
    "temperature": 0.4,
    "max_tokens": 512
  }'

Errors

Errors are returned in OpenAI-compatible shape so existing clients handle them naturally.
StatusCodeMeaning
401bad_signatureHMAC signature verification failed
404agent_not_foundThe agent_id in the URL does not match any active agent
422model_not_allowedThe requested model does not match the agent’s configured upstream
429rate_limitedAccount or agent quota exceeded; respect Retry-After
502upstream_errorThe upstream LLM returned an error; the error is passed through
504upstream_timeoutThe upstream LLM did not respond within the configured timeout