> ## Documentation Index
> Fetch the complete documentation index at: https://docs.masker.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# PHI token format, vault and AEAD tokenization schemes

> How Masker encodes detected PHI into typed, opaque tokens — and how it recovers the original value on the response leg without exposing PHI to your LLM.

Once detection marks a span, Masker replaces it with a **token**. The token travels to your LLM in place of the real value. On the response leg, Masker swaps it back before the text reaches your caller. The LLM never sees real PHI; your caller never hears a placeholder.

Every token Masker produces shares four properties:

* **Format-stable** — the same shape regardless of input length, so the model is never surprised by a short or long token
* **Type-aware** — the LLM can still tell a phone from a name from a date
* **Reversible** — Masker can rehydrate the original on the response leg
* **Non-revealing** — given only the token, you cannot recover the original value

## Token format

All tokens follow this structure:

```
{scheme}.{kind}.{kid}.{value}
```

| Field    | Meaning                                           | Example                                                   |
| -------- | ------------------------------------------------- | --------------------------------------------------------- |
| `scheme` | Versioned scheme identifier                       | `MSKV1` (vault-deterministic) or `MSK1` (reversible AEAD) |
| `kind`   | Entity type                                       | `PHONE`, `NAME`, `SSN`, `MRN`, `EMAIL`, `DOB`             |
| `kid`    | Key ID — which key was used to produce this token | `K_HEALTHCARE`                                            |
| `value`  | The opaque token body                             | Base32-encoded, approximately 22 characters               |

A masked phone number looks like:

```
MSKV1.PHONE.K_HEALTHCARE.A1B2C3D4E5F6G7H8I9J0KL
```

The LLM sees this in the message thread, recognizes it as a phone-shaped argument, and generates a response that references it naturally. Masker rehydrates the real number when streaming the response back to your voice agent.

## Two tokenization schemes

Masker offers two schemes. You choose one per agent.

<Tabs>
  <Tab title="Vault deterministic (default)">
    **Algorithm:** HMAC-SHA256 of `(kid_secret || normalized_input)` → first 16 bytes → base32-encoded.

    **Storage:** A row is written to a SQLite vault at `/data/vault.db` on the Fly volume.

    **Scheme prefix:** `MSKV1`

    ### Properties

    | Property            | Detail                                                                                                                                                                       |
    | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | Deterministic       | The same input always produces the same token within the same `kid`. The LLM can recognize "this is the same person as in the previous turn."                                |
    | Reversible          | Masker looks up the token in the vault to retrieve the original.                                                                                                             |
    | Per-agent isolation | Different agents use different `kid`s, so token namespaces don't collide across customers.                                                                                   |
    | Vault-bound         | If the vault is lost, tokens are opaque forever. This is a feature when you need hard erasure — drop the vault, and all tokens referencing it become permanently unreadable. |

    Use vault-deterministic when you want **same-value → same-token** behavior across a session — for example, so the LLM can refer to "the patient" consistently across multiple turns.
  </Tab>

  <Tab title="Reversible AEAD">
    **Algorithm:** AES-256-GCM-SIV with a 256-bit key derived per `kid`.

    **Storage:** None. The ciphertext *is* the token.

    **Scheme prefix:** `MSK1`

    ### Properties

    | Property          | Detail                                                                                                              |
    | ----------------- | ------------------------------------------------------------------------------------------------------------------- |
    | Self-contained    | No vault lookup needed to rehydrate — just decrypt with the `kid` key.                                              |
    | Non-deterministic | The same input produces a different token each time, because of the SIV nonce.                                      |
    | Stateless         | Works in horizontally scaled deployments without a shared vault volume.                                             |
    | Key-bound         | If the `kid` key is rotated and old keys are dropped, tokens minted with the old key become permanently unreadable. |

    Use reversible AEAD when you want **stateless** rehydration or you can't ship a SQLite volume to every deployment region.
  </Tab>
</Tabs>

## Choosing a scheme

| You want…                                                    | Pick                    |
| ------------------------------------------------------------ | ----------------------- |
| Same value → same token (consistent reference within a call) | **Vault deterministic** |
| No shared state across regions                               | **Reversible AEAD**     |
| Hard erasure (drop the vault, tokens are dead forever)       | **Vault deterministic** |
| Self-describing tokens that survive restarts                 | **Reversible AEAD**     |
| Default for healthcare voice agents                          | **Vault deterministic** |

The `healthcare-default` policy ships with vault-deterministic tokenization. You can change it per agent in the portal or via the create-agent API.

## Rehydration

On the response leg, Masker scans the LLM's output for any token matching the pattern `MSK*.*.*.*`. For each match:

* **Vault deterministic:** look up the original value in `/data/vault.db`, replace inline.
* **Reversible AEAD:** decrypt with the `kid` key, replace inline.

If a token cannot be rehydrated — the key was rotated out, the vault row is missing, or the token is malformed — Masker emits a `rehydration_failed` event and replaces the token with `[REDACTED:KIND]`. The failure is recorded in the audit log.

<Warning>
  A `[REDACTED:KIND]` in your TTS output means rehydration failed for that span. Check the audit log for `rehydration_failed` events to identify the cause — typically a key rotation that dropped a key while live tokens still referenced it.
</Warning>

## Key management

Each `kid` is a logical key identifier mapped to actual key material in your environment:

```bash theme={null}
MASKER_KEY_K_HEALTHCARE=base64(32-byte-key)
```

You can have multiple active `kid`s at once to support key rotation:

```bash theme={null}
MASKER_KEY_K_HEALTHCARE=...new key...
MASKER_KEY_K_HEALTHCARE_OLD=...previous key...
```

Masker uses the active key for new tokenization, and tries every registered key during rehydration. To rotate:

<Steps>
  <Step title="Add the new key">
    Set `MASKER_KEY_K_HEALTHCARE` to the new key material. Keep the old key registered as `MASKER_KEY_K_HEALTHCARE_OLD`.
  </Step>

  <Step title="Update the policy">
    Switch your agent's policy to reference the new `kid`. New tokens will use the new key; existing live tokens still rehydrate via the old key.
  </Step>

  <Step title="Drop the old key">
    Once no live tokens still reference the old key — typically after the longest call session you run — remove `MASKER_KEY_K_HEALTHCARE_OLD`. Tokens minted with it will no longer rehydrate.
  </Step>
</Steps>

<Tip>
  For production, store key material in your secret manager of choice: Fly Secrets, AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault. Masker does not ship its own key escrow.
</Tip>
