Choosing how to deploy Masker in your infrastructure

Masker is a Rust service that exposes an OpenAI-compatible /v1/chat/completions endpoint. You can run it on Masker-managed infrastructure, inside your own cloud account, or fully air-gapped with no outbound internet access. The right choice depends on where your PHI must reside and what your compliance program requires.

Option	Best for	PHI residency	BAA
Masker hosted	Pilots, demos, early-stage teams	Fly.io US-West (`sea`/`sjc`)	Available on production beta
Self-hosted VPC	Production, regulated environments	Your cloud account	Not needed — PHI never leaves your VPC
Air-gapped / on-prem	Hospitals, DoD, strict security postures	Your own hardware	Not needed — Masker is fully offline

Option 1 — Masker hosted

The hosted service runs at masker-voice.fly.dev. You do not deploy or manage any infrastructure.

Open masker-voice.fly.dev/portal/login and sign in with GitHub.

Create an agent

Click New Agent, choose an upstream model, and copy the generated proxy URL.

Drop the URL into your voice platform

Paste the proxy URL as the Custom LLM URL in Vapi, ElevenLabs, Bolna, or any OpenAI-compatible platform. No SDK, no code changes.

What you get: zero infrastructure, instant setup, and a BAA available to all production beta customers. What to consider: PHI transits Masker’s servers on Fly.io US-West before being tokenized. If your compliance program requires PHI to stay in your own environment, use one of the self-hosted options below and plan a migration before going live with real patient data.

Option 2 — Self-hosted in your VPC

Masker ships as a container image at ghcr.io/masker-dev/masker:latest. You can run it on Docker, Kubernetes, or your own Fly.io account.

Docker
Kubernetes
Fly.io (your account)

The quickest path to a self-hosted instance. Suitable for single-node deployments or development environments.

docker run -d \
  --name masker \
  -p 8080:8080 \
  -v masker-data:/data \
  -e MASKER_LISTEN_ADDR=0.0.0.0:8080 \
  -e MASKER_PUBLIC_URL=https://masker.your-vpc.example \
  -e MASKER_DATABASE_URL=sqlite:///data/masker.db \
  -e MASKER_OPENAI_API_KEY=$OPENAI_API_KEY \
  --env-file ./masker.secrets.env \
  ghcr.io/masker-dev/masker:latest

masker.secrets.env must contain at minimum:

MASKER_KEY_K_HEALTHCARE=<base64 32-byte key>
MASKER_SESSION_SECRET=<base64 32-byte key>
MASKER_AUDIT_HMAC_KEY=<base64 32-byte key>

Place a TLS-terminating load balancer (ALB, GCLB, nginx) in front of the container. Masker speaks plain HTTP inside the VPC.

For multi-replica Docker Compose setups, switch to tokenization: reversible-aead in your mask policy so replicas don’t need to share SQLite vault state. Alternatively, point MASKER_DATABASE_URL at a Postgres instance.

Sample manifests are available in the repo at deploy/k8s/. They include:

Deployment with resource requests (1 CPU, 2 GB RAM; 1 GPU optional for NER)
Service and Ingress with TLS termination
PersistentVolumeClaim for the /data volume
Secret and ConfigMap separation
HorizontalPodAutoscaler keyed on request latency

For multi-replica deployments (EKS, GKE, AKS), use tokenization: reversible-aead in your mask policy to eliminate the need for a shared vault, or configure MASKER_DATABASE_URL to point to a managed Postgres instance.

A Helm chart is on the May 30 roadmap. Until then, apply the raw manifests from deploy/k8s/.

Same approach as the Masker-hosted deployment, but running entirely inside your Fly.io organization. PHI stays in your account.

fly.toml

app = "masker-yourname"
primary_region = "sea"

[build]
  image = "ghcr.io/masker-dev/masker:latest"

[http_service]
  internal_port = 8080
  force_https = true

[[mounts]]
  source = "masker_data"
  destination = "/data"

[env]
  MASKER_LISTEN_ADDR = "0.0.0.0:8080"
  MASKER_PUBLIC_URL  = "https://masker-yourname.fly.dev"
  MASKER_DATABASE_URL = "sqlite:///data/masker.db"

Then set secrets separately so they are never stored in fly.toml:

flyctl secrets set \
  MASKER_SESSION_SECRET=$(openssl rand -base64 32) \
  MASKER_KEY_K_HEALTHCARE=$(openssl rand -base64 32) \
  MASKER_AUDIT_HMAC_KEY=$(openssl rand -base64 32) \
  MASKER_OPENAI_API_KEY=sk-...

Getting a self-hosted deployment

Self-hosted deployments require an activation step during the beta period. Email hello@masker.dev to request a self-hosted license. The team will provide the image credentials and walk through the initial setup.

Option 3 — Air-gapped / on-premises

Masker has no required outbound network calls beyond your chosen upstream LLM. If your environment runs the LLM internally — Ollama, vLLM, or your own Azure OpenAI tenant — Masker can run completely offline. Configure the internal LLM endpoint:

MASKER_OPENAI_BASE_URL=https://internal-llm.your-vpc.example
MASKER_OPENAI_API_KEY=internal-token

Masker implements the OpenAI chat-completions API, so any compatible upstream works without code changes. The Gemma-4 NER model ships inside the container image. There are no model downloads at runtime, no telemetry, and the container does not phone home. For on-premises activation, contact us to request the offline activation flow. It uses a signed license file in place of GitHub OAuth for portal access.

Health checks

Every Masker deployment exposes a health endpoint you can wire into your load balancer or readiness probe:

GET /health

A healthy instance returns:

{"status":"ok","version":"x.y.z","db":"ok","vault":"ok","upstream":"ok"}

Upgrades

Masker follows semantic versioning. Minor and patch releases are drop-in upgrades. Major releases include a migration note in the GitHub release.

docker pull ghcr.io/masker-dev/masker:latest
docker stop masker && docker rm masker
docker run ...   # same flags, new image

Database migrations run automatically at boot. The vault token format (MSKV1.*) is forward-compatible across all minor versions.

Observability

Set MASKER_METRICS_ADDR to expose a Prometheus metrics endpoint on that address. Available metrics:

masker_requests_total{agent, status}
masker_request_duration_seconds{stage} — stages: detection, tokenize, upstream, rehydrate
masker_redactions_total{kind, pass}
masker_vault_size_bytes

Logs are emitted as JSON to stdout by default. Forward them with your existing pipeline — CloudWatch, Datadog, GCP Logging, or Loki all work without any Masker-specific configuration.

Current limitations

No Helm chart yet. Raw Kubernetes manifests only. Helm chart is on the May 30 roadmap.
No Terraform module yet. Same timeline.
Each Masker deployment is independent. A multi-tenant control plane is not available in the current release.

Get Started

Masking

Integrations

Compliance

Configuration

Choosing how to deploy Masker in your infrastructure

Option 1 — Masker hosted

Option 2 — Self-hosted in your VPC

Getting a self-hosted deployment

Option 3 — Air-gapped / on-premises

Health checks

Upgrades

Observability

Current limitations

Get Started

Masking

Integrations

Compliance

Configuration

Documentation Index

​Option 1 — Masker hosted

​Option 2 — Self-hosted in your VPC

​Getting a self-hosted deployment

​Option 3 — Air-gapped / on-premises

​Health checks

​Upgrades

​Observability

​Current limitations

Option 1 — Masker hosted

Option 2 — Self-hosted in your VPC

Getting a self-hosted deployment

Option 3 — Air-gapped / on-premises

Health checks

Upgrades

Observability

Current limitations