Column Masking - Isthmus

Column masking lets you define per-column data protection rules in your policy YAML. Masked columns are automatically redacted, hashed, partially hidden, or nullified in both query results and describe_table sample rows. No code changes, no database views, no application-layer middleware — just a YAML file.

Why column masking?

AI models are powerful data analysts, but they don’t need to see raw PII to write correct SQL. With column masking:

Emails, SSNs, phone numbers are redacted before the AI ever sees them
Query results are masked in real time — the AI gets useful structure without sensitive values
Sample rows in describe_table are also masked, so table analysis doesn’t leak PII
Masking is enforced server-side — the AI cannot bypass it, regardless of what SQL it generates

Enabling column masking

Column masking is part of the policy engine. Add mask directives to any column in your policy YAML:

context:
  tables:
    public.customers:
      description: "Customer accounts"
      columns:
        email:
          description: "Primary email address"
          mask: "redact"
        ssn:
          description: "Social Security Number"
          mask: "null"
        phone:
          description: "Phone number"
          mask: "partial"
        name:
          description: "Full name"
          mask: "hash"

Then point Isthmus at the file:

POLICY_FILE=./policy.yaml isthmus

At startup, Isthmus logs how many columns are masked:

{"level":"INFO","msg":"column masking enabled","masked_columns":4}

Mask types

There are four mask types. Each is designed for a different use case.

`redact`

Replaces the value with ***. Use for columns the AI never needs to see.

Input	Output
`"alice@example.com"`	`"***"`
`12345`	`"***"`
`NULL`	`NULL`

Best for: email addresses, API keys, passwords, tokens.

`hash`

Replaces the value with a deterministic SHA-256 hex string (64 characters). Same input always produces the same hash, so the AI can still detect duplicates, join on hashed values, and count distinct entries — without seeing the original data.

Input	Output
`"alice@example.com"`	`"b4a6c8d2e1f0..."` (64 hex chars)
`"alice@example.com"`	Same hash (deterministic)
`"bob@example.com"`	Different hash
`NULL`	`NULL`

Best for: columns where the AI needs to detect patterns (GROUP BY, JOIN, COUNT DISTINCT) without seeing raw values. Names, external IDs, usernames.

`partial`

Reveals only the last 4 characters, replacing the rest with asterisks. Works correctly with unicode characters.

Input	Output
`"1234567890"`	`"******7890"`
`"+1-555-867-5309"`	`"***********5309"`
`"ab"`	`"***ab"`
`NULL`	`NULL`

Best for: phone numbers, credit card numbers, account numbers — where partial visibility helps the AI understand the format.

`null`

Replaces the value with NULL. The AI sees that the column exists but has no data.

Input	Output
`"alice@example.com"`	`NULL`
`12345`	`NULL`
`NULL`	`NULL`

Best for: columns that must be completely hidden. The AI can still query other columns in the same table.

How masking works

Query results

When a query tool call returns results, Isthmus applies masks to every row before sending the response to the AI:

SELECT id, email, name FROM customers LIMIT 2

Without masking:

[
  {"id": 1, "email": "alice@example.com", "name": "Alice Johnson"},
  {"id": 2, "email": "bob@example.com", "name": "Bob Smith"}
]

With email: redact and name: hash:

[
  {"id": 1, "email": "***", "name": "a4f2e8c1d3b5..."},
  {"id": 2, "email": "***", "name": "c7d9f0a2b4e6..."}
]

Describe table sample rows

The describe_table tool returns up to 5 sample rows. These are masked identically — same rules, same mask types:

{
  "schema": "public",
  "name": "customers",
  "sample_rows": [
    {"id": 1, "email": "***", "name": "a4f2e8c1d3b5...", "phone": "***********5309"}
  ]
}

Column name matching

Masking matches by column name only, not by table. If you mask email, it applies to every column named email in every query result — regardless of which table it comes from, including JOINs, subqueries, and aliases. This is by design. SQL queries with JOINs, CTEs, and subqueries make it impossible to reliably map result column names back to source tables. Rather than building a fragile runtime mapper, Isthmus uses a simple, predictable rule: same column name = same mask.

Conflict detection

Because masking is by column name, Isthmus validates at startup that no column name has conflicting mask types across tables. If two tables define different masks for the same column name, Isthmus rejects the policy file:

# This will fail validation:
context:
  tables:
    public.users:
      columns:
        email:
          mask: "redact"      # redact here...
    public.contacts:
      columns:
        email:
          mask: "hash"        # ...but hash here? Conflict!

error: validating policy: column "email" has conflicting masks: "redact" in public.users vs "hash" in public.contacts

The same column name with the same mask across multiple tables is fine:

# This is valid — both use "redact"
context:
  tables:
    public.users:
      columns:
        email:
          mask: "redact"
    public.contacts:
      columns:
        email:
          mask: "redact"

Full example

A realistic policy YAML with masking:

context:
  tables:
    public.customers:
      description: "Registered platform customers"
      columns:
        id: "Unique customer identifier (UUID)"
        email:
          description: "Primary email address, used for login"
          mask: "redact"
        name:
          description: "Full display name"
          mask: "hash"
        phone:
          description: "Phone number with country code"
          mask: "partial"
        ssn:
          description: "Social Security Number (US)"
          mask: "null"
        created_at: "Account creation timestamp (UTC)"

    public.orders:
      description: "Purchase orders"
      columns:
        id: "Unique order identifier (UUID)"
        customer_id: "FK to customers.id"
        status: "Order lifecycle: draft, pending, paid, shipped, delivered, cancelled"
        amount_cents: "Order total in cents (USD)"

    public.employees:
      description: "Internal employee records"
      columns:
        id: "Employee ID"
        email:
          description: "Corporate email"
          mask: "redact"
        salary_cents:
          description: "Annual salary in cents (USD)"
          mask: "null"

Note that email appears in both customers and employees with the same mask (redact), which is valid.

Type behavior

Masked values may change type. This is intentional and documented here for completeness:

Original type	Mask	Result type
`string`	`redact`	`string` (`"***"`)
`int`	`redact`	`string` (`"***"`)
`string`	`hash`	`string` (64 hex chars)
`int`	`hash`	`string` (64 hex chars)
`string`	`partial`	`string`
`int`	`partial`	`string`
any	`null`	`null`
`NULL`	any	`null`

Choosing the right mask type

Data type	Example columns	Recommended	Why
Credentials	`password_hash`, `api_key`, `token`	`redact`	No reason for the AI to ever see these
Government IDs	`ssn`, `tax_id`, `passport_number`	`redact` or `null`	Highly regulated, zero analytical value
Email addresses	`email`, `contact_email`	`redact` or `hash`	Use `hash` if the AI needs to detect duplicates or join across tables
Phone numbers	`phone`, `mobile`, `fax`	`partial`	Last 4 digits help the AI understand the format (area codes, country codes)
Financial accounts	`card_number`, `iban`, `routing_number`	`partial`	Last 4 digits is industry standard (PCI DSS)
Names	`name`, `first_name`, `last_name`	`hash` or `partial`	`hash` for analytics, `partial` to preserve format
Addresses	`street`, `address_line_1`	`redact`	Low analytical value, high PII risk
Salaries / compensation	`salary`, `bonus`, `equity_grants`	`null`	Sensitive internal data, not useful for most AI queries
Internal identifiers	`user_id`, `account_id`	`hash`	Deterministic — AI can still join and group by without seeing real IDs
Non-sensitive business data	`status`, `category`, `amount_cents`	(no mask)	AI needs actual values to provide useful analysis

Compliance context

Column masking helps satisfy data protection requirements across common regulatory frameworks:

Framework	Requirement	How masking helps
GDPR	Data minimization (Art. 5(1)(c))	Mask PII so the AI processes only what’s necessary
HIPAA	Minimum necessary standard	`redact` or `null` PHI columns (patient names, diagnoses, SSNs)
PCI DSS	Mask PAN when displayed (Req. 3.3)	`partial` on card numbers shows only last 4 digits
SOC 2	Access controls on sensitive data	Server-side masking enforces data protection regardless of AI behavior
CCPA	Limit disclosure of personal information	Mask consumer PII in AI-accessible query results

Column masking is one layer of a data protection strategy — not a complete compliance solution. Combine it with a dedicated read-only database role, schema filtering, audit logging, and your organization’s data governance policies.

Interaction with other features

Feature	Interaction with masking
Schema filtering (`SCHEMAS`)	Complementary — filtering hides entire schemas, masking hides column values within visible schemas
Explain-only mode (`--explain-only`)	No interaction — EXPLAIN returns query plans, not data, so masking is not applied
Audit logging (`--audit-log`)	Audit logs record the SQL statement, not the results — masked values are never written to the audit log because the log captures input, not output
Business context (policy descriptions)	Additive — the AI sees the column description (“Primary email address”) alongside the masked value (`"***"`), giving it schema understanding without data exposure
Row limits (`MAX_ROWS`)	Independent — row limits cap the number of rows, masking transforms values within those rows
OpenTelemetry (`--otel`)	Traces record SQL statements and row counts, not result values — masking has no effect on telemetry data

Limitations

Column name scope — masks match by column name globally, not per table. You cannot mask email differently in users vs. contacts. This is a deliberate tradeoff: simplicity and predictability over per-table granularity.
SQL aliases — if a query uses SELECT email AS contact_email, the result column is named contact_email, and the email mask will not apply. The AI could theoretically use aliases to bypass masking. Mitigate this with a dedicated read-only database role that restricts access to sensitive columns at the PostgreSQL level.
Aggregations — SELECT COUNT(DISTINCT email) returns an integer count, not email values. Masking does not interfere with aggregations since the masked column is not in the result set.
WHERE clauses — masking does not affect query filters. SELECT id FROM users WHERE email = 'alice@example.com' executes against the real data. The AI can still filter by masked columns — it just cannot see the values in results.

Tips

Start with redact — it’s the safest default for PII columns.
Use hash when the AI needs to detect patterns — GROUP BY, JOIN, or COUNT DISTINCT still work on hashed values.
Use partial for phone numbers and account numbers — the last 4 digits help the AI understand the format without exposing the full value.
Use null for columns that should be completely invisible — salaries, SSNs, medical data.
Masking + business descriptions work together — the AI sees "description": "Primary email address" alongside "***", so it knows what the column is without seeing the values.
Mask liberally, describe generously — err on the side of masking more columns. The AI writes better SQL when it understands the schema (via descriptions) than when it sees raw data.
Think about JOINs — if you mask email in one table, mask it in all tables. Isthmus enforces this consistency at startup.
Combine with database-level controls — for maximum protection, mask columns in the policy YAML and revoke SELECT on those columns for the database role. This provides defense in depth.

​Why column masking?

​Enabling column masking

​Mask types

​redact

​hash

​partial

​null

​How masking works

​Query results

​Describe table sample rows

​Column name matching

​Conflict detection

​Full example

​Type behavior

​Choosing the right mask type

​Compliance context

​Interaction with other features

​Limitations

​Tips

Why column masking?

Enabling column masking

Mask types

`redact`

`hash`

`partial`

`null`

How masking works

Query results

Describe table sample rows

Column name matching

Conflict detection

Full example

Type behavior

Choosing the right mask type

Compliance context

Interaction with other features

Limitations

Tips