Documentation Index
Fetch the complete documentation index at: https://docs.masker.dev/llms.txt
Use this file to discover all available pages before exploring further.
Masker is a Rust service that exposes an OpenAI-compatible /v1/chat/completions endpoint. You can run it on Masker-managed infrastructure, inside your own cloud account, or fully air-gapped with no outbound internet access. The right choice depends on where your PHI must reside and what your compliance program requires.
| Option | Best for | PHI residency | BAA |
|---|
| Masker hosted | Pilots, demos, early-stage teams | Fly.io US-West (sea/sjc) | Available on production beta |
| Self-hosted VPC | Production, regulated environments | Your cloud account | Not needed — PHI never leaves your VPC |
| Air-gapped / on-prem | Hospitals, DoD, strict security postures | Your own hardware | Not needed — Masker is fully offline |
Option 1 — Masker hosted
The hosted service runs at masker-voice.fly.dev. You do not deploy or manage any infrastructure.
Create an agent
Click New Agent, choose an upstream model, and copy the generated proxy URL.
Drop the URL into your voice platform
Paste the proxy URL as the Custom LLM URL in Vapi, ElevenLabs, Bolna, or any OpenAI-compatible platform. No SDK, no code changes.
What you get: zero infrastructure, instant setup, and a BAA available to all production beta customers.
What to consider: PHI transits Masker’s servers on Fly.io US-West before being tokenized. If your compliance program requires PHI to stay in your own environment, use one of the self-hosted options below and plan a migration before going live with real patient data.
Option 2 — Self-hosted in your VPC
Masker ships as a container image at ghcr.io/masker-dev/masker:latest. You can run it on Docker, Kubernetes, or your own Fly.io account.
Docker
Kubernetes
Fly.io (your account)
The quickest path to a self-hosted instance. Suitable for single-node deployments or development environments.docker run -d \
--name masker \
-p 8080:8080 \
-v masker-data:/data \
-e MASKER_LISTEN_ADDR=0.0.0.0:8080 \
-e MASKER_PUBLIC_URL=https://masker.your-vpc.example \
-e MASKER_DATABASE_URL=sqlite:///data/masker.db \
-e MASKER_OPENAI_API_KEY=$OPENAI_API_KEY \
--env-file ./masker.secrets.env \
ghcr.io/masker-dev/masker:latest
masker.secrets.env must contain at minimum:MASKER_KEY_K_HEALTHCARE=<base64 32-byte key>
MASKER_SESSION_SECRET=<base64 32-byte key>
MASKER_AUDIT_HMAC_KEY=<base64 32-byte key>
Place a TLS-terminating load balancer (ALB, GCLB, nginx) in front of the container. Masker speaks plain HTTP inside the VPC.For multi-replica Docker Compose setups, switch to tokenization: reversible-aead in your mask policy so replicas don’t need to share SQLite vault state. Alternatively, point MASKER_DATABASE_URL at a Postgres instance.
Sample manifests are available in the repo at deploy/k8s/. They include:
Deployment with resource requests (1 CPU, 2 GB RAM; 1 GPU optional for NER)
Service and Ingress with TLS termination
PersistentVolumeClaim for the /data volume
Secret and ConfigMap separation
HorizontalPodAutoscaler keyed on request latency
For multi-replica deployments (EKS, GKE, AKS), use tokenization: reversible-aead in your mask policy to eliminate the need for a shared vault, or configure MASKER_DATABASE_URL to point to a managed Postgres instance.A Helm chart is on the May 30 roadmap. Until then, apply the raw manifests from deploy/k8s/.
Same approach as the Masker-hosted deployment, but running entirely inside your Fly.io organization. PHI stays in your account.app = "masker-yourname"
primary_region = "sea"
[build]
image = "ghcr.io/masker-dev/masker:latest"
[http_service]
internal_port = 8080
force_https = true
[[mounts]]
source = "masker_data"
destination = "/data"
[env]
MASKER_LISTEN_ADDR = "0.0.0.0:8080"
MASKER_PUBLIC_URL = "https://masker-yourname.fly.dev"
MASKER_DATABASE_URL = "sqlite:///data/masker.db"
Then set secrets separately so they are never stored in fly.toml:flyctl secrets set \
MASKER_SESSION_SECRET=$(openssl rand -base64 32) \
MASKER_KEY_K_HEALTHCARE=$(openssl rand -base64 32) \
MASKER_AUDIT_HMAC_KEY=$(openssl rand -base64 32) \
MASKER_OPENAI_API_KEY=sk-...
Getting a self-hosted deployment
Self-hosted deployments require an activation step during the beta period. Email hello@masker.dev to request a self-hosted license. The team will provide the image credentials and walk through the initial setup.
Option 3 — Air-gapped / on-premises
Masker has no required outbound network calls beyond your chosen upstream LLM. If your environment runs the LLM internally — Ollama, vLLM, or your own Azure OpenAI tenant — Masker can run completely offline.
Configure the internal LLM endpoint:
MASKER_OPENAI_BASE_URL=https://internal-llm.your-vpc.example
MASKER_OPENAI_API_KEY=internal-token
Masker implements the OpenAI chat-completions API, so any compatible upstream works without code changes.
The Gemma-4 NER model ships inside the container image. There are no model downloads at runtime, no telemetry, and the container does not phone home.
For on-premises activation, contact us to request the offline activation flow. It uses a signed license file in place of GitHub OAuth for portal access.
Health checks
Every Masker deployment exposes a health endpoint you can wire into your load balancer or readiness probe:
A healthy instance returns:
{"status":"ok","version":"x.y.z","db":"ok","vault":"ok","upstream":"ok"}
Upgrades
Masker follows semantic versioning. Minor and patch releases are drop-in upgrades. Major releases include a migration note in the GitHub release.
docker pull ghcr.io/masker-dev/masker:latest
docker stop masker && docker rm masker
docker run ... # same flags, new image
Database migrations run automatically at boot. The vault token format (MSKV1.*) is forward-compatible across all minor versions.
Observability
Set MASKER_METRICS_ADDR to expose a Prometheus metrics endpoint on that address. Available metrics:
masker_requests_total{agent, status}
masker_request_duration_seconds{stage} — stages: detection, tokenize, upstream, rehydrate
masker_redactions_total{kind, pass}
masker_vault_size_bytes
Logs are emitted as JSON to stdout by default. Forward them with your existing pipeline — CloudWatch, Datadog, GCP Logging, or Loki all work without any Masker-specific configuration.
Current limitations
- No Helm chart yet. Raw Kubernetes manifests only. Helm chart is on the May 30 roadmap.
- No Terraform module yet. Same timeline.
- Each Masker deployment is independent. A multi-tenant control plane is not available in the current release.