JSON Unpacking
Many data sources produce JSON-structured data — API responses, event payloads, webhook bodies, and NoSQL exports. When this data lands in Snowflake, it is stored in VARIANT columns as semi-structured data. JSON unpacking lets you extract fields from these VARIANT columns and transform them into standard typed columns that dimensions, facts, and marts can reference.
How it works
JSON unpacking is a configuration option available in the staging layer of both Kimball and Data Vault transformation projects. When you configure a staging model for a source table that contains VARIANT columns, Rime presents a JSON unpacking interface for each one.
Rime scans a sample of the VARIANT column data to detect the JSON structure, then lets you select which paths to extract. The extracted values become regular columns in your staging table.
Path selection
JSON data is often nested. Rime uses dot-notation paths to reference fields within the structure.
Given this JSON in a VARIANT column called payload:
{ "user": { "id": 42, "name": "Jane Smith", "address": { "city": "Auckland", "country": "NZ" } }, "event_type": "purchase", "amount": 149.99}The available paths are:
| Path | Value |
|---|---|
payload.event_type | "purchase" |
payload.amount | 149.99 |
payload.user.id | 42 |
payload.user.name | "Jane Smith" |
payload.user.address.city | "Auckland" |
payload.user.address.country | "NZ" |
To configure extraction:
- In the staging model editor, find the VARIANT column
- Click Configure JSON Unpacking
- Rime displays the detected structure as a tree. Expand nodes to see nested fields.
- Check the fields you want to extract
- For each selected field, set the output column name and data type (see Type Casting below)
- Click Save
Each selected path becomes a column in the staging table. Unselected paths are ignored — they remain in the VARIANT column but do not appear as typed columns.
Flattening arrays
When a JSON field contains an array, extracting it as a single column is not useful. Instead, you can flatten the array so that each element becomes its own row.
Given this JSON:
{ "order_id": "ORD-1001", "items": [ { "product": "Widget A", "quantity": 3, "price": 12.50 }, { "product": "Widget B", "quantity": 1, "price": 24.99 } ]}Flattening items produces two rows:
| order_id | item_product | item_quantity | item_price |
|---|---|---|---|
| ORD-1001 | Widget A | 3 | 12.50 |
| ORD-1001 | Widget B | 1 | 24.99 |
To flatten an array:
- In the JSON unpacking tree, locate the array field
- Click the Flatten toggle next to it
- Select which fields within each array element to extract
- Set column names and types for each extracted field
- Non-array fields from the parent object (like
order_id) are repeated on each row automatically
Flattening changes the grain of your staging table. One source row with an array of five elements becomes five staging rows. Keep this in mind when designing downstream dimensions and facts.
Nested arrays
If your JSON contains arrays within arrays, you can flatten at multiple levels. Each level of flattening multiplies the row count. For example, an order with 3 items where each item has 2 discount codes produces 6 rows. Use nested flattening carefully and consider whether a separate staging model for the inner array is more appropriate.
Type casting
Extracted JSON values default to VARCHAR. You should cast them to appropriate Snowflake types for downstream use:
| Target type | Use for |
|---|---|
NUMBER | Integers, counts, IDs |
FLOAT | Decimal numbers, prices, measurements |
BOOLEAN | True/false flags |
DATE | Date strings (ISO 8601 format: 2024-03-15) |
TIMESTAMP_NTZ | Datetime strings without timezone |
TIMESTAMP_TZ | Datetime strings with timezone |
VARCHAR | Text values (the default) |
To set a type, select it from the dropdown next to each extracted field in the JSON unpacking editor. Rime generates the appropriate Snowflake CAST expression. If a value cannot be cast (e.g., a non-numeric string to NUMBER), Snowflake returns null for that row.
Handling nulls and missing keys
JSON data is rarely uniform. Some records may be missing keys that others have, or a key may be present but set to null.
Rime handles these cases as follows:
- Missing key — the extracted column value is
NULLfor that row - Key present with
nullvalue — the extracted column value isNULL - Key present with empty string — the extracted column value is
''(empty string, not null) - Key present with empty array
[]— flattening produces zero rows for that record - Key present with empty object
{}— extracting any child path returnsNULL
If a key is consistently missing across all sampled records, it will not appear in the structure tree. You can manually add a path by typing it in the Custom Path field. This is useful when a key appears in only a small percentage of records that the sample did not include.
Default values
For extracted fields that are frequently null, you can set a default value. When the extracted value is null (either because the key is missing or explicitly null), Rime substitutes the default. Configure this in the Default column of the extraction table.
Common patterns
API response data
REST API connectors often return data wrapped in an envelope:
{ "status": "success", "data": { "id": 1, "name": "Acme Corp", "metadata": { "region": "APAC", "tier": "enterprise" } }, "pagination": { "page": 1, "total_pages": 5 }}Extract from data.* paths. Ignore the envelope fields (status, pagination) unless you need them for auditing. Extract nested metadata fields individually (data.metadata.region, data.metadata.tier).
Event payloads
Event-driven systems produce records where the core data is in a payload column while metadata sits alongside it:
{ "event_id": "evt_abc123", "timestamp": "2024-03-15T10:30:00Z", "type": "user.signup", "payload": { "user_id": 42, "plan": "business" }}Extract top-level fields (event_id, timestamp, type) as regular columns alongside payload fields (payload.user_id, payload.email, payload.plan). Cast timestamp to TIMESTAMP_TZ.
Nested objects with variable structure
Some JSON fields contain objects whose keys vary per record (e.g., custom form fields, tag maps). These are difficult to unpack into fixed columns. Two approaches:
- Extract known keys — if certain keys appear in most records, extract those and accept nulls for the rest
- Keep as VARIANT — leave the column as-is and handle it in a downstream model or mart. Not every VARIANT column needs to be unpacked at the staging layer.
Next steps
- Return to the Transformation Overview for the full transformation flow
- Learn about Hierarchy Models for recursive data structures
- View dependencies in the Lineage graph after building JSON-based models