Skip to content

PII Detection

Rime includes an automatic PII detection scanner that analyses column data against a library of patterns to identify personally identifiable information. Detected PII is flagged for human review — the scanner never auto-classifies columns or changes masking policies on its own.

PII detection works alongside the masked-by-default model. Because all columns start masked, detected PII does not represent an immediate exposure risk. The scanner’s role is to accelerate classification by surfacing columns that likely contain personal data so governance administrators can prioritise their review.

How scanning works

When a scan runs, Rime samples rows from each unclassified column and evaluates the sample data against its pattern library. The process:

  1. Column selection — the scanner identifies columns that have not been classified or that have been modified since their last scan
  2. Row sampling — Rime reads a sample of rows from each column (typically 100-1000 rows, depending on table size). Sampling uses the Rime service account’s Snowflake role, which has access to unmasked data
  3. Pattern matching — each sampled value is tested against the full pattern library. Multiple patterns can match a single column
  4. Confidence scoring — based on the match rate across sampled rows, the scanner assigns a confidence level
  5. Flagging — columns that exceed the minimum confidence threshold are flagged for review in the data classification column browser

Scans run automatically when new data sources are connected and when schema changes are detected. You can also trigger a manual scan at any time.

Detection patterns

The pattern library includes both general patterns and New Zealand / Australia-specific patterns:

General patterns

PatternDescriptionExample matches
Email addressStandard email format matching[email protected], [email protected]
Phone (international)International format with country code+64 21 123 4567, +61 400 123 456
Phone (NZ local)New Zealand local formats021 123 4567, 09 123 4567, (09) 123-4567
Physical addressStreet address patterns with number, street type, and locality123 Queen Street, Auckland
Date of birthDate patterns in common formats, filtered by plausible age range1985-03-15, 15/03/1985
Passport numberAlphanumeric patterns matching passport formatsLN123456, AB1234567

New Zealand-specific patterns

PatternDescriptionValidation
IRD numberInland Revenue Department number (8-9 digits)Mod-11 check digit validation. Only values that pass the check digit algorithm are flagged, significantly reducing false positives
NHI numberNational Health Index numberFormat: 3 alpha characters followed by 4 digits (e.g., ABC1234). Alphabetic prefix is validated against known NHI character ranges
NZ bank accountBank account numberFormat: BB-bbbb-AAAAAAA-SSS (bank, branch, account, suffix). Bank and branch codes are validated against known ranges
NZ driver licenceDriver licence numberAlphanumeric format matching NZTA licence patterns

Australian patterns

PatternDescriptionValidation
Phone (AU)Australian phone formats+61 prefix, valid area codes
TFNTax File Number (9 digits)Check digit validation

Confidence levels

Each flagged column receives a confidence level based on what percentage of sampled rows matched the pattern:

ConfidenceMatch rateMeaning
High70% or more of sampled rows matchVery likely contains this PII type. The pattern consistently matches across the sample
Medium30-69% of sampled rows matchProbably contains this PII type, but some rows do not match, possibly due to mixed data, nulls, or formatting variations
Low10-29% of sampled rows matchPossibly contains this PII type. The match rate is low enough that the column may contain coincidental matches rather than actual PII

Columns with a match rate below 10% are not flagged. This threshold reduces noise from columns where a few values happen to match a pattern (for example, a product code column where some codes coincidentally look like phone numbers).

Review workflow

Flagged columns appear in the data classification column browser with a PII detection indicator showing the detected PII type and confidence level.

To review flagged columns:

  1. Navigate to Governance > Classifications and filter by “Flagged by PII detection”
  2. Select a flagged column to see the detection details: which pattern matched, the confidence level, and sample matched values
  3. Decide whether the detection is accurate:
    • If accurate, classify the column with the appropriate privacy level and PII type. Select Accept Detection to pre-fill the classification from the detection result
    • If inaccurate, select Dismiss to mark the detection as a false positive. The column will not be re-flagged for the same pattern unless you re-scan manually
  4. Repeat for remaining flagged columns

You can also bulk-review: select multiple flagged columns, accept all detections, or dismiss all as false positives.

Re-scanning

Automatic scans run in these situations:

  • New connector — when a new data source is connected and its first extraction completes
  • Schema change — when Rime detects that a table has new or renamed columns
  • Scheduled — a periodic scan runs daily (configurable) to catch changes that may not trigger event-based scans

To trigger a manual scan:

  1. Navigate to Governance > PII Detection
  2. Select Run Scan
  3. Choose the scope: full account, specific database, schema, or table
  4. Select Start

Manual scans are useful after bulk data loads, migrations, or when you want to re-evaluate columns that were previously dismissed as false positives.

False positive handling

False positives are inevitable in pattern-based detection. Rime provides several mechanisms to manage them:

  • Dismiss — mark a specific detection on a specific column as a false positive. The column will not be re-flagged for the same pattern
  • Exclude column — permanently exclude a column from PII scanning. Useful for columns that consistently trigger false matches (e.g., product codes that resemble phone numbers)
  • Exclude table — exclude an entire table from scanning. Useful for reference tables or lookup tables that contain no customer data
  • Adjust threshold — raise the minimum confidence level required for flagging. Setting it to “High only” eliminates most false positives at the cost of potentially missing some true positives

Dismissed false positives are logged in the audit log for compliance purposes.

Tier availability

PII detection is available at Business tier and above. Free/Trial and Small Business tiers do not include automatic PII scanning. See Masked by Default for full tier details.

Next steps