Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.masker.dev/llms.txt

Use this file to discover all available pages before exploring further.

The masking policy is the YAML file that drives every decision Masker makes at runtime: which entity types to scan for, which detection passes to run, whether to tokenize or redact each entity, and which key to use when minting tokens. Masker ships a default policy at configs/mask_policy.yaml named healthcare-default that covers HIPAA Safe Harbor identifiers out of the box. You can tune that file, create per-agent policies, or switch tokenization schemes — all without touching code.

Sample mask_policy.yaml

The annotated example below matches the structure Masker expects. Every field is optional except those marked required.
mask_policy.yaml
name: healthcare-default       # required — unique name, referenced by agents
version: 1                     # required — schema version, currently 1
description: |
  HIPAA Safe Harbor coverage for voice AI agents in healthcare.
  Covers 9 of 18 categories fully, 3 partially.

# Default key ID for tokenization.
# Must match an env var MASKER_KEY_<kid> on the running server.
kid: K_HEALTHCARE              # required

# Differential privacy budget for surrogate / synthetic generation.
epsilon: 0.5                   # optional, default 0.5

# Tokenization scheme applied to every entity unless overridden.
# vault-deterministic — HMAC lookup in a SQLite vault; same input = same token
# reversible-aead     — stateless AES-256-GCM-SIV; no vault state needed
# synthetic           — generate a realistic-looking but fake value
tokenization: vault-deterministic   # required

# Detection passes — order matters.
# regex   — fast pattern matching, runs first
# gemma   — on-device NER model, catches names and context-dependent spans
# diarize — speaker attribution for audio; auto-enabled for audio webhooks
passes:
  - regex
  - gemma           # comment out to skip NER and run regex-only

# Per-entity detection and action rules
entities:

  PHONE:
    enabled: true
    regex: true
    ner: true
    confidence_threshold: 0.6   # NER hits below this score are dropped
    action: tokenize

  SSN:
    enabled: true
    regex: true
    ner: false                  # regex covers SSN fully; NER not needed
    confidence_threshold: 0.0
    action: tokenize

  NAME:
    enabled: true
    regex: false                # names don't match regex patterns reliably
    ner: true
    confidence_threshold: 0.7
    action: tokenize

  EMAIL:
    enabled: true
    regex: true
    ner: false
    confidence_threshold: 0.0
    action: tokenize

  DOB:
    enabled: true
    regex: true
    ner: true
    confidence_threshold: 0.5
    action: tokenize

  ADDRESS:
    enabled: true
    regex: true                 # ZIP codes and street patterns
    ner: true                   # full address recognition
    confidence_threshold: 0.6
    action: tokenize

  MRN:
    enabled: true
    regex: true
    ner: true
    confidence_threshold: 0.6
    action: tokenize

  ACCOUNT:
    enabled: true
    regex: true
    ner: true
    confidence_threshold: 0.7
    action: tokenize

  IP_ADDRESS:
    enabled: true
    regex: true
    ner: false
    confidence_threshold: 0.0
    action: redact              # IPs aren't useful to the LLM; just remove them

# Audit log behavior
audit:
  log_events: true
  log_payloads: false           # encrypted payload retention; off by default
  retention_days: 2555          # 7 years — the HIPAA minimum

Field reference

Top-level fields

FieldTypeRequiredDescription
namestringyesUnique policy name. Referenced by agents and shown in the portal.
versionintyesSchema version. Currently 1.
descriptionstringnoFree-form description shown in the portal.
kidstringyesDefault key ID. Must match MASKER_KEY_<kid> in your environment.
epsilonfloatnoDifferential privacy budget for synthetic surrogates. Defaults to 0.5.
tokenizationenumyesOne of vault-deterministic, reversible-aead, or synthetic.
passeslistyesOrdered list of detection passes: regex, gemma, diarize.
entitiesmapyesPer-entity rules. See below.
auditmapnoAudit log behavior.

Per-entity fields

FieldTypeDefaultDescription
enabledbooltrueMaster switch for this entity. Set to false to skip it entirely.
regexbooltrueRun the regex pass for this entity.
nerbooltrueRun the NER pass (Gemma model) for this entity.
confidence_thresholdfloat0.6Minimum NER confidence score. Hits below this are discarded.
actionenumtokenizeWhat to do with detected spans: tokenize, redact, or passthrough.

Audit fields

FieldTypeDefaultDescription
log_eventsbooltrueWrite a per-redaction event to the audit log.
log_payloadsboolfalseRetain encrypted payloads alongside events. Off by default.
retention_daysint2555How long audit records are kept. 2555 days (7 years) is the HIPAA minimum.

Tokenization schemes

Masker stores a mapping of (plaintext, entity_kind) → token in a local SQLite vault. The same input always produces the same token, so LLM responses referring to MSKV1.PHONE.K_HEALTHCARE.abc123 can be correctly rehydrated even across turns.Best for: single-node deployments where vault state is easy to persist.Drawback: requires a shared vault in multi-replica setups. Use a Postgres database via MASKER_DATABASE_URL or switch to reversible-aead instead.

Tuning detection sensitivity

Every entity’s confidence_threshold controls how aggressively the NER pass fires. Lower values catch more but may introduce false positives; higher values are more precise but may miss edge cases.
Start with the defaults, run masker detect against real transcripts (with PHI scrubbed from the samples), and raise or lower thresholds based on what you observe.
To disable NER for a specific entity and rely only on regex, set ner: false. SSN and EMAIL are good candidates — their formats are regular enough that NER adds noise rather than coverage. To disable an entity type entirely, set enabled: false. This prevents Masker from running any detection pass for that kind.

Applying a policy

Global policy

Set MASKER_POLICY_PATH to point to your policy file before starting Masker. The default is configs/mask_policy.yaml. To reload a running server without restarting it:
curl -X POST https://masker-voice.fly.dev/api/v1/admin/policy/reload \
  -H "Cookie: masker_session=$MASKER_SESSION"
The reload is atomic — in-flight requests complete on the old policy; new requests immediately pick up the updated one.

Per-agent policy overrides

Each agent inherits the global policy by default. To assign a custom policy to one agent, pass policy_yaml when creating or updating the agent:
curl -X POST https://masker-voice.fly.dev/api/v1/agents \
  -H "Cookie: masker_session=$MASKER_SESSION" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "billing-bot",
    "upstream": "openai:gpt-4o-mini",
    "policy_yaml": "<contents of custom-policy.yaml>"
  }'
The custom YAML is stored alongside the agent record and loaded only for that agent’s requests.

CLI: validate and diff

Use the masker policy subcommands to validate and compare policies before deploying them.

Validate before deploying

masker policy validate configs/mask_policy.yaml
Validation catches the three most common errors:
  • unknown_kid — the policy references a kid with no matching MASKER_KEY_<kid> environment variable
  • invalid_pass — the passes list contains a name Masker doesn’t recognize
  • missing_entity — an entity referenced in passes is not declared under entities
Validation errors prevent boot in production mode. In development mode (MASKER_DEV=1) Masker logs the error and falls through to defaults — never rely on this in production.

Diff two policy versions

masker policy diff configs/mask_policy.yaml configs/mask_policy_v2.yaml
The diff shows which entities were added or removed, which thresholds changed, and which actions changed. Run this before replacing a live policy to understand the impact on detection coverage.