POST /proxy/{id}/v1/chat/completions — Masker LLM proxy

This is the integration endpoint — the URL you paste into your voice platform’s “custom LLM” field. It speaks the OpenAI chat completions API exactly, so any client that can talk to OpenAI can talk to Masker without code changes. Your voice platform calls this endpoint; you do not call it directly from your application code. When a request arrives, Masker redacts PHI from all message content before forwarding the sanitized payload to the upstream LLM. The LLM response is then scanned for replacement tokens, which are rehydrated back to their original values before the response is returned to the caller. The caller sees a clean response; the upstream LLM never sees raw PHI.

This endpoint sits outside the /api/v1 namespace because it’s called by external systems that follow the OpenAI URL convention.

Endpoint

POST /proxy/{agent_id}/v1/chat/completions

Path parameters

agent_id

string

required

Masker agent ID in agt_* ULID format. Treat this like an API key — keep the proxy URL confidential.

Authentication

This endpoint is not authenticated by masker_session cookie. Voice platforms calling it do not have a session. Authentication relies on two mechanisms:

The agent_id in the URL acts as a shared secret. Do not expose the proxy URL publicly.
When configured, HMAC signature verification validates the X-Vapi-Signature header against MASKER_VAPI_WEBHOOK_SECRET.

For high-security deployments, run Masker inside your VPC and add mTLS or IP allowlisting in front of the proxy.

Request body

The request body follows the standard OpenAI chat completions schema. Masker accepts every field OpenAI accepts and passes through unrecognized fields.

model

string

required

The model to use. Must be compatible with the agent’s configured upstream. If the request specifies a model the agent is not allowed to use, Masker returns 422 model_not_allowed.

messages

object[]

required

Array of message objects (role + content). PHI is redacted from all content fields before forwarding.

stream

boolean

default:"false"

If true, the response is streamed as Server-Sent Events (text/event-stream). Streaming is fully supported — response chunks are scanned for tokens and rehydrated inline.

temperature

number

Sampling temperature, passed through to the upstream LLM unchanged.

max_tokens

number

Maximum tokens in the response, passed through unchanged.

tools

object[]

Tool definitions. Tool descriptions and function names that contain PHI are also redacted.

tool_choice

string | object

Tool selection mode, passed through unchanged.

Processing pipeline

Receive the request body.
Detect and redact PHI in messages[*].content, tool descriptions, and function names.
Forward the sanitized body to the upstream LLM provider.
Buffer or stream the response from the upstream LLM.
Scan the response for Masker replacement tokens and rehydrate them to original values.
Return the rehydrated response to the caller.

Response

The response is identical in shape to an OpenAI chat completions response. PHI tokens in the LLM output are rehydrated before the response reaches the caller. For streaming requests, the response uses text/event-stream with standard OpenAI SSE chunks.

Latency

Masker adds approximately 45–95 ms of end-to-end latency on top of the upstream LLM’s response time.

Rate limit

100 requests/second sustained, burst 200. Rate-limited requests receive 429 with a Retry-After header.

Configuration in Vapi

Set the proxy URL as your Vapi assistant’s Custom LLM URL:

https://masker-voice.fly.dev/proxy/agt_01HYZ.../v1/chat/completions

Set the model field to match the agent’s configured upstream (e.g. gpt-4o-mini). No other code changes are required.

Example

curl -X POST \
  -H "Content-Type: application/json" \
  https://masker-voice.fly.dev/proxy/agt_01HYZ.../v1/chat/completions \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful healthcare assistant."},
      {"role": "user", "content": "Hi, this is Sarah Chen, my number is 415-555-2671."}
    ],
    "stream": false,
    "temperature": 0.4,
    "max_tokens": 512
  }'

Errors

Errors are returned in OpenAI-compatible shape so existing clients handle them naturally.

Status	Code	Meaning
`401`	`bad_signature`	HMAC signature verification failed
`404`	`agent_not_found`	The `agent_id` in the URL does not match any active agent
`422`	`model_not_allowed`	The requested `model` does not match the agent’s configured upstream
`429`	`rate_limited`	Account or agent quota exceeded; respect `Retry-After`
`502`	`upstream_error`	The upstream LLM returned an error; the error is passed through
`504`	`upstream_timeout`	The upstream LLM did not respond within the configured timeout

Overview

Agents

Sessions

Proxy & Webhooks

POST /proxy/{id}/v1/chat/completions — Masker LLM proxy

Endpoint

Path parameters

Authentication

Request body

Processing pipeline

Response

Latency

Rate limit

Configuration in Vapi

Example

Errors

Overview

Agents

Sessions

Proxy & Webhooks

Documentation Index

​Endpoint

​Path parameters

​Authentication

​Request body

​Processing pipeline

​Response

​Latency

​Rate limit

​Configuration in Vapi

​Example

​Errors

Endpoint

Path parameters

Authentication

Request body

Processing pipeline

Response

Latency

Rate limit

Configuration in Vapi

Example

Errors