Column masking lets you define per-column data protection rules in your policy YAML. Masked columns are automatically redacted, hashed, partially hidden, or nullified in both query results and describe_table sample rows. No code changes, no database views, no application-layer middleware — just a YAML file.
Why column masking?
AI models are powerful data analysts, but they don’t need to see raw PII to write correct SQL. With column masking:
- Emails, SSNs, phone numbers are redacted before the AI ever sees them
- Query results are masked in real time — the AI gets useful structure without sensitive values
- Sample rows in
describe_table are also masked, so table analysis doesn’t leak PII
- Masking is enforced server-side — the AI cannot bypass it, regardless of what SQL it generates
Enabling column masking
Column masking is part of the policy engine. Add mask directives to any column in your policy YAML:
context:
tables:
public.customers:
description: "Customer accounts"
columns:
email:
description: "Primary email address"
mask: "redact"
ssn:
description: "Social Security Number"
mask: "null"
phone:
description: "Phone number"
mask: "partial"
name:
description: "Full name"
mask: "hash"
Then point Isthmus at the file:
POLICY_FILE=./policy.yaml isthmus
At startup, Isthmus logs how many columns are masked:
{"level":"INFO","msg":"column masking enabled","masked_columns":4}
Mask types
There are four mask types. Each is designed for a different use case.
redact
Replaces the value with ***. Use for columns the AI never needs to see.
| Input | Output |
|---|
"alice@example.com" | "***" |
12345 | "***" |
NULL | NULL |
Best for: email addresses, API keys, passwords, tokens.
hash
Replaces the value with a deterministic SHA-256 hex string (64 characters). Same input always produces the same hash, so the AI can still detect duplicates, join on hashed values, and count distinct entries — without seeing the original data.
| Input | Output |
|---|
"alice@example.com" | "b4a6c8d2e1f0..." (64 hex chars) |
"alice@example.com" | Same hash (deterministic) |
"bob@example.com" | Different hash |
NULL | NULL |
Best for: columns where the AI needs to detect patterns (GROUP BY, JOIN, COUNT DISTINCT) without seeing raw values. Names, external IDs, usernames.
partial
Reveals only the last 4 characters, replacing the rest with asterisks. Works correctly with unicode characters.
| Input | Output |
|---|
"1234567890" | "******7890" |
"+1-555-867-5309" | "***********5309" |
"ab" | "***ab" |
NULL | NULL |
Best for: phone numbers, credit card numbers, account numbers — where partial visibility helps the AI understand the format.
null
Replaces the value with NULL. The AI sees that the column exists but has no data.
| Input | Output |
|---|
"alice@example.com" | NULL |
12345 | NULL |
NULL | NULL |
Best for: columns that must be completely hidden. The AI can still query other columns in the same table.
How masking works
Query results
When a query tool call returns results, Isthmus applies masks to every row before sending the response to the AI:
SELECT id, email, name FROM customers LIMIT 2
Without masking:
[
{"id": 1, "email": "alice@example.com", "name": "Alice Johnson"},
{"id": 2, "email": "bob@example.com", "name": "Bob Smith"}
]
With email: redact and name: hash:
[
{"id": 1, "email": "***", "name": "a4f2e8c1d3b5..."},
{"id": 2, "email": "***", "name": "c7d9f0a2b4e6..."}
]
Describe table sample rows
The describe_table tool returns up to 5 sample rows. These are masked identically — same rules, same mask types:
{
"schema": "public",
"name": "customers",
"sample_rows": [
{"id": 1, "email": "***", "name": "a4f2e8c1d3b5...", "phone": "***********5309"}
]
}
Column name matching
Masking matches by column name only, not by table. If you mask email, it applies to every column named email in every query result — regardless of which table it comes from, including JOINs, subqueries, and aliases.
This is by design. SQL queries with JOINs, CTEs, and subqueries make it impossible to reliably map result column names back to source tables. Rather than building a fragile runtime mapper, Isthmus uses a simple, predictable rule: same column name = same mask.
Conflict detection
Because masking is by column name, Isthmus validates at startup that no column name has conflicting mask types across tables. If two tables define different masks for the same column name, Isthmus rejects the policy file:
# This will fail validation:
context:
tables:
public.users:
columns:
email:
mask: "redact" # redact here...
public.contacts:
columns:
email:
mask: "hash" # ...but hash here? Conflict!
error: validating policy: column "email" has conflicting masks: "redact" in public.users vs "hash" in public.contacts
The same column name with the same mask across multiple tables is fine:
# This is valid — both use "redact"
context:
tables:
public.users:
columns:
email:
mask: "redact"
public.contacts:
columns:
email:
mask: "redact"
Full example
A realistic policy YAML with masking:
context:
tables:
public.customers:
description: "Registered platform customers"
columns:
id: "Unique customer identifier (UUID)"
email:
description: "Primary email address, used for login"
mask: "redact"
name:
description: "Full display name"
mask: "hash"
phone:
description: "Phone number with country code"
mask: "partial"
ssn:
description: "Social Security Number (US)"
mask: "null"
created_at: "Account creation timestamp (UTC)"
public.orders:
description: "Purchase orders"
columns:
id: "Unique order identifier (UUID)"
customer_id: "FK to customers.id"
status: "Order lifecycle: draft, pending, paid, shipped, delivered, cancelled"
amount_cents: "Order total in cents (USD)"
public.employees:
description: "Internal employee records"
columns:
id: "Employee ID"
email:
description: "Corporate email"
mask: "redact"
salary_cents:
description: "Annual salary in cents (USD)"
mask: "null"
Note that email appears in both customers and employees with the same mask (redact), which is valid.
Type behavior
Masked values may change type. This is intentional and documented here for completeness:
| Original type | Mask | Result type |
|---|
string | redact | string ("***") |
int | redact | string ("***") |
string | hash | string (64 hex chars) |
int | hash | string (64 hex chars) |
string | partial | string |
int | partial | string |
| any | null | null |
NULL | any | null |
Choosing the right mask type
| Data type | Example columns | Recommended | Why |
|---|
| Credentials | password_hash, api_key, token | redact | No reason for the AI to ever see these |
| Government IDs | ssn, tax_id, passport_number | redact or null | Highly regulated, zero analytical value |
| Email addresses | email, contact_email | redact or hash | Use hash if the AI needs to detect duplicates or join across tables |
| Phone numbers | phone, mobile, fax | partial | Last 4 digits help the AI understand the format (area codes, country codes) |
| Financial accounts | card_number, iban, routing_number | partial | Last 4 digits is industry standard (PCI DSS) |
| Names | name, first_name, last_name | hash or partial | hash for analytics, partial to preserve format |
| Addresses | street, address_line_1 | redact | Low analytical value, high PII risk |
| Salaries / compensation | salary, bonus, equity_grants | null | Sensitive internal data, not useful for most AI queries |
| Internal identifiers | user_id, account_id | hash | Deterministic — AI can still join and group by without seeing real IDs |
| Non-sensitive business data | status, category, amount_cents | (no mask) | AI needs actual values to provide useful analysis |
Compliance context
Column masking helps satisfy data protection requirements across common regulatory frameworks:
| Framework | Requirement | How masking helps |
|---|
| GDPR | Data minimization (Art. 5(1)(c)) | Mask PII so the AI processes only what’s necessary |
| HIPAA | Minimum necessary standard | redact or null PHI columns (patient names, diagnoses, SSNs) |
| PCI DSS | Mask PAN when displayed (Req. 3.3) | partial on card numbers shows only last 4 digits |
| SOC 2 | Access controls on sensitive data | Server-side masking enforces data protection regardless of AI behavior |
| CCPA | Limit disclosure of personal information | Mask consumer PII in AI-accessible query results |
Column masking is one layer of a data protection strategy — not a complete compliance solution. Combine it with a dedicated read-only database role, schema filtering, audit logging, and your organization’s data governance policies.
Interaction with other features
| Feature | Interaction with masking |
|---|
Schema filtering (SCHEMAS) | Complementary — filtering hides entire schemas, masking hides column values within visible schemas |
Explain-only mode (--explain-only) | No interaction — EXPLAIN returns query plans, not data, so masking is not applied |
Audit logging (--audit-log) | Audit logs record the SQL statement, not the results — masked values are never written to the audit log because the log captures input, not output |
| Business context (policy descriptions) | Additive — the AI sees the column description (“Primary email address”) alongside the masked value ("***"), giving it schema understanding without data exposure |
Row limits (MAX_ROWS) | Independent — row limits cap the number of rows, masking transforms values within those rows |
OpenTelemetry (--otel) | Traces record SQL statements and row counts, not result values — masking has no effect on telemetry data |
Limitations
- Column name scope — masks match by column name globally, not per table. You cannot mask
email differently in users vs. contacts. This is a deliberate tradeoff: simplicity and predictability over per-table granularity.
- SQL aliases — if a query uses
SELECT email AS contact_email, the result column is named contact_email, and the email mask will not apply. The AI could theoretically use aliases to bypass masking. Mitigate this with a dedicated read-only database role that restricts access to sensitive columns at the PostgreSQL level.
- Aggregations —
SELECT COUNT(DISTINCT email) returns an integer count, not email values. Masking does not interfere with aggregations since the masked column is not in the result set.
- WHERE clauses — masking does not affect query filters.
SELECT id FROM users WHERE email = 'alice@example.com' executes against the real data. The AI can still filter by masked columns — it just cannot see the values in results.
Tips
- Start with
redact — it’s the safest default for PII columns.
- Use
hash when the AI needs to detect patterns — GROUP BY, JOIN, or COUNT DISTINCT still work on hashed values.
- Use
partial for phone numbers and account numbers — the last 4 digits help the AI understand the format without exposing the full value.
- Use
null for columns that should be completely invisible — salaries, SSNs, medical data.
- Masking + business descriptions work together — the AI sees
"description": "Primary email address" alongside "***", so it knows what the column is without seeing the values.
- Mask liberally, describe generously — err on the side of masking more columns. The AI writes better SQL when it understands the schema (via descriptions) than when it sees raw data.
- Think about JOINs — if you mask
email in one table, mask it in all tables. Isthmus enforces this consistency at startup.
- Combine with database-level controls — for maximum protection, mask columns in the policy YAML and revoke
SELECT on those columns for the database role. This provides defense in depth.