Skip to main content
Column masking lets you define per-column data protection rules in your policy YAML. Masked columns are automatically redacted, hashed, partially hidden, or nullified in both query results and describe_table sample rows. No code changes, no database views, no application-layer middleware — just a YAML file.

Why column masking?

AI models are powerful data analysts, but they don’t need to see raw PII to write correct SQL. With column masking:
  • Emails, SSNs, phone numbers are redacted before the AI ever sees them
  • Query results are masked in real time — the AI gets useful structure without sensitive values
  • Sample rows in describe_table are also masked, so table analysis doesn’t leak PII
  • Masking is enforced server-side — the AI cannot bypass it, regardless of what SQL it generates

Enabling column masking

Column masking is part of the policy engine. Add mask directives to any column in your policy YAML:
context:
  tables:
    public.customers:
      description: "Customer accounts"
      columns:
        email:
          description: "Primary email address"
          mask: "redact"
        ssn:
          description: "Social Security Number"
          mask: "null"
        phone:
          description: "Phone number"
          mask: "partial"
        name:
          description: "Full name"
          mask: "hash"
Then point Isthmus at the file:
POLICY_FILE=./policy.yaml isthmus
At startup, Isthmus logs how many columns are masked:
{"level":"INFO","msg":"column masking enabled","masked_columns":4}

Mask types

There are four mask types. Each is designed for a different use case.

redact

Replaces the value with ***. Use for columns the AI never needs to see.
InputOutput
"alice@example.com""***"
12345"***"
NULLNULL
Best for: email addresses, API keys, passwords, tokens.

hash

Replaces the value with a deterministic SHA-256 hex string (64 characters). Same input always produces the same hash, so the AI can still detect duplicates, join on hashed values, and count distinct entries — without seeing the original data.
InputOutput
"alice@example.com""b4a6c8d2e1f0..." (64 hex chars)
"alice@example.com"Same hash (deterministic)
"bob@example.com"Different hash
NULLNULL
Best for: columns where the AI needs to detect patterns (GROUP BY, JOIN, COUNT DISTINCT) without seeing raw values. Names, external IDs, usernames.

partial

Reveals only the last 4 characters, replacing the rest with asterisks. Works correctly with unicode characters.
InputOutput
"1234567890""******7890"
"+1-555-867-5309""***********5309"
"ab""***ab"
NULLNULL
Best for: phone numbers, credit card numbers, account numbers — where partial visibility helps the AI understand the format.

null

Replaces the value with NULL. The AI sees that the column exists but has no data.
InputOutput
"alice@example.com"NULL
12345NULL
NULLNULL
Best for: columns that must be completely hidden. The AI can still query other columns in the same table.

How masking works

Query results

When a query tool call returns results, Isthmus applies masks to every row before sending the response to the AI:
SELECT id, email, name FROM customers LIMIT 2
Without masking:
[
  {"id": 1, "email": "alice@example.com", "name": "Alice Johnson"},
  {"id": 2, "email": "bob@example.com", "name": "Bob Smith"}
]
With email: redact and name: hash:
[
  {"id": 1, "email": "***", "name": "a4f2e8c1d3b5..."},
  {"id": 2, "email": "***", "name": "c7d9f0a2b4e6..."}
]

Describe table sample rows

The describe_table tool returns up to 5 sample rows. These are masked identically — same rules, same mask types:
{
  "schema": "public",
  "name": "customers",
  "sample_rows": [
    {"id": 1, "email": "***", "name": "a4f2e8c1d3b5...", "phone": "***********5309"}
  ]
}

Column name matching

Masking matches by column name only, not by table. If you mask email, it applies to every column named email in every query result — regardless of which table it comes from, including JOINs, subqueries, and aliases. This is by design. SQL queries with JOINs, CTEs, and subqueries make it impossible to reliably map result column names back to source tables. Rather than building a fragile runtime mapper, Isthmus uses a simple, predictable rule: same column name = same mask.

Conflict detection

Because masking is by column name, Isthmus validates at startup that no column name has conflicting mask types across tables. If two tables define different masks for the same column name, Isthmus rejects the policy file:
# This will fail validation:
context:
  tables:
    public.users:
      columns:
        email:
          mask: "redact"      # redact here...
    public.contacts:
      columns:
        email:
          mask: "hash"        # ...but hash here? Conflict!
error: validating policy: column "email" has conflicting masks: "redact" in public.users vs "hash" in public.contacts
The same column name with the same mask across multiple tables is fine:
# This is valid — both use "redact"
context:
  tables:
    public.users:
      columns:
        email:
          mask: "redact"
    public.contacts:
      columns:
        email:
          mask: "redact"

Full example

A realistic policy YAML with masking:
context:
  tables:
    public.customers:
      description: "Registered platform customers"
      columns:
        id: "Unique customer identifier (UUID)"
        email:
          description: "Primary email address, used for login"
          mask: "redact"
        name:
          description: "Full display name"
          mask: "hash"
        phone:
          description: "Phone number with country code"
          mask: "partial"
        ssn:
          description: "Social Security Number (US)"
          mask: "null"
        created_at: "Account creation timestamp (UTC)"

    public.orders:
      description: "Purchase orders"
      columns:
        id: "Unique order identifier (UUID)"
        customer_id: "FK to customers.id"
        status: "Order lifecycle: draft, pending, paid, shipped, delivered, cancelled"
        amount_cents: "Order total in cents (USD)"

    public.employees:
      description: "Internal employee records"
      columns:
        id: "Employee ID"
        email:
          description: "Corporate email"
          mask: "redact"
        salary_cents:
          description: "Annual salary in cents (USD)"
          mask: "null"
Note that email appears in both customers and employees with the same mask (redact), which is valid.

Type behavior

Masked values may change type. This is intentional and documented here for completeness:
Original typeMaskResult type
stringredactstring ("***")
intredactstring ("***")
stringhashstring (64 hex chars)
inthashstring (64 hex chars)
stringpartialstring
intpartialstring
anynullnull
NULLanynull

Choosing the right mask type

Data typeExample columnsRecommendedWhy
Credentialspassword_hash, api_key, tokenredactNo reason for the AI to ever see these
Government IDsssn, tax_id, passport_numberredact or nullHighly regulated, zero analytical value
Email addressesemail, contact_emailredact or hashUse hash if the AI needs to detect duplicates or join across tables
Phone numbersphone, mobile, faxpartialLast 4 digits help the AI understand the format (area codes, country codes)
Financial accountscard_number, iban, routing_numberpartialLast 4 digits is industry standard (PCI DSS)
Namesname, first_name, last_namehash or partialhash for analytics, partial to preserve format
Addressesstreet, address_line_1redactLow analytical value, high PII risk
Salaries / compensationsalary, bonus, equity_grantsnullSensitive internal data, not useful for most AI queries
Internal identifiersuser_id, account_idhashDeterministic — AI can still join and group by without seeing real IDs
Non-sensitive business datastatus, category, amount_cents(no mask)AI needs actual values to provide useful analysis

Compliance context

Column masking helps satisfy data protection requirements across common regulatory frameworks:
FrameworkRequirementHow masking helps
GDPRData minimization (Art. 5(1)(c))Mask PII so the AI processes only what’s necessary
HIPAAMinimum necessary standardredact or null PHI columns (patient names, diagnoses, SSNs)
PCI DSSMask PAN when displayed (Req. 3.3)partial on card numbers shows only last 4 digits
SOC 2Access controls on sensitive dataServer-side masking enforces data protection regardless of AI behavior
CCPALimit disclosure of personal informationMask consumer PII in AI-accessible query results
Column masking is one layer of a data protection strategy — not a complete compliance solution. Combine it with a dedicated read-only database role, schema filtering, audit logging, and your organization’s data governance policies.

Interaction with other features

FeatureInteraction with masking
Schema filtering (SCHEMAS)Complementary — filtering hides entire schemas, masking hides column values within visible schemas
Explain-only mode (--explain-only)No interaction — EXPLAIN returns query plans, not data, so masking is not applied
Audit logging (--audit-log)Audit logs record the SQL statement, not the results — masked values are never written to the audit log because the log captures input, not output
Business context (policy descriptions)Additive — the AI sees the column description (“Primary email address”) alongside the masked value ("***"), giving it schema understanding without data exposure
Row limits (MAX_ROWS)Independent — row limits cap the number of rows, masking transforms values within those rows
OpenTelemetry (--otel)Traces record SQL statements and row counts, not result values — masking has no effect on telemetry data

Limitations

  • Column name scope — masks match by column name globally, not per table. You cannot mask email differently in users vs. contacts. This is a deliberate tradeoff: simplicity and predictability over per-table granularity.
  • SQL aliases — if a query uses SELECT email AS contact_email, the result column is named contact_email, and the email mask will not apply. The AI could theoretically use aliases to bypass masking. Mitigate this with a dedicated read-only database role that restricts access to sensitive columns at the PostgreSQL level.
  • AggregationsSELECT COUNT(DISTINCT email) returns an integer count, not email values. Masking does not interfere with aggregations since the masked column is not in the result set.
  • WHERE clauses — masking does not affect query filters. SELECT id FROM users WHERE email = 'alice@example.com' executes against the real data. The AI can still filter by masked columns — it just cannot see the values in results.

Tips

  • Start with redact — it’s the safest default for PII columns.
  • Use hash when the AI needs to detect patterns — GROUP BY, JOIN, or COUNT DISTINCT still work on hashed values.
  • Use partial for phone numbers and account numbers — the last 4 digits help the AI understand the format without exposing the full value.
  • Use null for columns that should be completely invisible — salaries, SSNs, medical data.
  • Masking + business descriptions work together — the AI sees "description": "Primary email address" alongside "***", so it knows what the column is without seeing the values.
  • Mask liberally, describe generously — err on the side of masking more columns. The AI writes better SQL when it understands the schema (via descriptions) than when it sees raw data.
  • Think about JOINs — if you mask email in one table, mask it in all tables. Isthmus enforces this consistency at startup.
  • Combine with database-level controls — for maximum protection, mask columns in the policy YAML and revoke SELECT on those columns for the database role. This provides defense in depth.