Skip to content

JSON Unpacking

Many data sources produce JSON-structured data — API responses, event payloads, webhook bodies, and NoSQL exports. When this data lands in Snowflake, it is stored in VARIANT columns as semi-structured data. JSON unpacking lets you extract fields from these VARIANT columns and transform them into standard typed columns that dimensions, facts, and marts can reference.

How it works

JSON unpacking is a configuration option available in the staging layer of both Kimball and Data Vault transformation projects. When you configure a staging model for a source table that contains VARIANT columns, Rime presents a JSON unpacking interface for each one.

Rime scans a sample of the VARIANT column data to detect the JSON structure, then lets you select which paths to extract. The extracted values become regular columns in your staging table.

Path selection

JSON data is often nested. Rime uses dot-notation paths to reference fields within the structure.

Given this JSON in a VARIANT column called payload:

{
"user": {
"id": 42,
"name": "Jane Smith",
"address": {
"city": "Auckland",
"country": "NZ"
}
},
"event_type": "purchase",
"amount": 149.99
}

The available paths are:

PathValue
payload.event_type"purchase"
payload.amount149.99
payload.user.id42
payload.user.name"Jane Smith"
payload.user.address.city"Auckland"
payload.user.address.country"NZ"

To configure extraction:

  1. In the staging model editor, find the VARIANT column
  2. Click Configure JSON Unpacking
  3. Rime displays the detected structure as a tree. Expand nodes to see nested fields.
  4. Check the fields you want to extract
  5. For each selected field, set the output column name and data type (see Type Casting below)
  6. Click Save

Each selected path becomes a column in the staging table. Unselected paths are ignored — they remain in the VARIANT column but do not appear as typed columns.

Flattening arrays

When a JSON field contains an array, extracting it as a single column is not useful. Instead, you can flatten the array so that each element becomes its own row.

Given this JSON:

{
"order_id": "ORD-1001",
"items": [
{ "product": "Widget A", "quantity": 3, "price": 12.50 },
{ "product": "Widget B", "quantity": 1, "price": 24.99 }
]
}

Flattening items produces two rows:

order_iditem_productitem_quantityitem_price
ORD-1001Widget A312.50
ORD-1001Widget B124.99

To flatten an array:

  1. In the JSON unpacking tree, locate the array field
  2. Click the Flatten toggle next to it
  3. Select which fields within each array element to extract
  4. Set column names and types for each extracted field
  5. Non-array fields from the parent object (like order_id) are repeated on each row automatically

Flattening changes the grain of your staging table. One source row with an array of five elements becomes five staging rows. Keep this in mind when designing downstream dimensions and facts.

Nested arrays

If your JSON contains arrays within arrays, you can flatten at multiple levels. Each level of flattening multiplies the row count. For example, an order with 3 items where each item has 2 discount codes produces 6 rows. Use nested flattening carefully and consider whether a separate staging model for the inner array is more appropriate.

Type casting

Extracted JSON values default to VARCHAR. You should cast them to appropriate Snowflake types for downstream use:

Target typeUse for
NUMBERIntegers, counts, IDs
FLOATDecimal numbers, prices, measurements
BOOLEANTrue/false flags
DATEDate strings (ISO 8601 format: 2024-03-15)
TIMESTAMP_NTZDatetime strings without timezone
TIMESTAMP_TZDatetime strings with timezone
VARCHARText values (the default)

To set a type, select it from the dropdown next to each extracted field in the JSON unpacking editor. Rime generates the appropriate Snowflake CAST expression. If a value cannot be cast (e.g., a non-numeric string to NUMBER), Snowflake returns null for that row.

Handling nulls and missing keys

JSON data is rarely uniform. Some records may be missing keys that others have, or a key may be present but set to null.

Rime handles these cases as follows:

  • Missing key — the extracted column value is NULL for that row
  • Key present with null value — the extracted column value is NULL
  • Key present with empty string — the extracted column value is '' (empty string, not null)
  • Key present with empty array [] — flattening produces zero rows for that record
  • Key present with empty object {} — extracting any child path returns NULL

If a key is consistently missing across all sampled records, it will not appear in the structure tree. You can manually add a path by typing it in the Custom Path field. This is useful when a key appears in only a small percentage of records that the sample did not include.

Default values

For extracted fields that are frequently null, you can set a default value. When the extracted value is null (either because the key is missing or explicitly null), Rime substitutes the default. Configure this in the Default column of the extraction table.

Common patterns

API response data

REST API connectors often return data wrapped in an envelope:

{
"status": "success",
"data": {
"id": 1,
"name": "Acme Corp",
"metadata": { "region": "APAC", "tier": "enterprise" }
},
"pagination": { "page": 1, "total_pages": 5 }
}

Extract from data.* paths. Ignore the envelope fields (status, pagination) unless you need them for auditing. Extract nested metadata fields individually (data.metadata.region, data.metadata.tier).

Event payloads

Event-driven systems produce records where the core data is in a payload column while metadata sits alongside it:

{
"event_id": "evt_abc123",
"timestamp": "2024-03-15T10:30:00Z",
"type": "user.signup",
"payload": {
"user_id": 42,
"email": "[email protected]",
"plan": "business"
}
}

Extract top-level fields (event_id, timestamp, type) as regular columns alongside payload fields (payload.user_id, payload.email, payload.plan). Cast timestamp to TIMESTAMP_TZ.

Nested objects with variable structure

Some JSON fields contain objects whose keys vary per record (e.g., custom form fields, tag maps). These are difficult to unpack into fixed columns. Two approaches:

  1. Extract known keys — if certain keys appear in most records, extract those and accept nulls for the rest
  2. Keep as VARIANT — leave the column as-is and handle it in a downstream model or mart. Not every VARIANT column needs to be unpacked at the staging layer.

Next steps