Skip to content

Kimball Methodology

Kimball dimensional modelling is the default transformation methodology in Rime. It organises data into four layers: staging, dimensions, facts, and marts. This approach is well-suited to most analytics use cases and is the recommended starting point for new transformation projects.

How Kimball modelling works

The Kimball methodology structures data around business processes. Each process is represented as a fact table (events, transactions, measurements) surrounded by dimension tables (the who, what, where, when, and why of those events). Data flows through layers:

Raw tables -> Staging -> Dimensions + Facts -> Marts
  • Staging cleans and standardises raw data without changing its grain
  • Dimensions describe the entities involved in business processes
  • Facts capture measurable events at a specific grain
  • Marts aggregate facts and dimensions into views tailored to specific business domains

Staging layer

The staging layer is the first transformation applied to raw data. Each source table gets a corresponding staging model that handles:

  • Column renaming — apply consistent naming conventions (e.g., cust_nm becomes customer_name)
  • Type casting — convert string columns to appropriate types (dates, numbers, booleans)
  • Null handling — replace nulls with defaults or filter incomplete rows
  • Deduplication — remove duplicate rows based on a key you specify
  • Column selection — exclude columns you do not need downstream

To configure staging for a source table:

  1. Open your transformation project and go to the Staging tab
  2. Click Add Staging Model
  3. Select the source table from the dropdown
  4. For each column, configure renaming, type casting, and null handling as needed
  5. If the source contains duplicates, specify the deduplication key
  6. Click Save

Rime generates the staging model and adds it to the project. Staging models are the foundation for everything downstream — dimensions and facts reference staging tables, not raw tables.

Dimension tables

Dimension tables contain descriptive attributes about business entities. Common examples include customers, products, employees, locations, and dates.

Configuring a dimension

  1. Go to the Dimensions tab and click Add Dimension
  2. Enter a name (e.g., dim_customer)
  3. Select the staging table that feeds this dimension
  4. Choose the grain — the column or columns that uniquely identify each row (the business key)
  5. Select the attributes — the descriptive columns to include (name, email, region, etc.)
  6. Configure a surrogate key — Rime generates an integer surrogate key for each dimension row. This key is used in fact table joins instead of the business key.
  7. Choose the SCD type (see below)
  8. Click Save

Surrogate keys

Every dimension table gets a surrogate key column — a system-generated integer that uniquely identifies each row. Fact tables reference dimensions through these surrogate keys rather than business keys. This provides:

  • Consistent join performance regardless of business key complexity
  • Support for slowly changing dimensions (multiple rows per business key, each with its own surrogate key)
  • Insulation from business key changes in source systems

Slowly Changing Dimensions (SCD)

Rime supports two SCD strategies:

Type 1 — Overwrite. When a dimension attribute changes, the existing row is updated in place. No history is preserved. Use this for attributes where you only care about the current value (e.g., a customer’s current email address).

Type 2 — Add new row. When a dimension attribute changes, a new row is inserted with the updated values. The previous row is marked as inactive with an effective_end_date. This preserves full history. Rime automatically manages the effective_start_date, effective_end_date, and is_current columns.

Choose SCD Type 2 when historical accuracy matters — for example, if you need to report on what region a customer belonged to at the time of an order, not their current region.

Fact tables

Fact tables capture measurable business events. Each row represents a single event at a defined grain — one order line, one page view, one transaction.

Configuring a fact table

  1. Go to the Facts tab and click Add Fact
  2. Enter a name (e.g., fct_orders)
  3. Select the staging table that feeds this fact
  4. Define the grain — what does each row represent? This is typically a combination of keys (e.g., order_id + product_id for an order line fact)
  5. Select measures — the numeric columns to include (quantity, amount, duration, etc.)
  6. Add dimension references — for each foreign key, select the dimension table it references. Rime creates the join using surrogate keys.
  7. Add a date dimension reference to enable time-based analysis
  8. Click Save

Grain

Getting the grain right is the most important decision in fact table design. The grain determines:

  • What each row represents
  • Which dimensions are applicable
  • What measures make sense

If your grain is “one row per order line item”, then quantity and unit price are valid measures, and customer, product, and date are valid dimensions. If your grain is “one row per order”, line-level detail is aggregated away.

Data marts

Marts are aggregated views that combine facts and dimensions for specific business domains. They are the final layer that analysts query.

Configuring a mart

  1. Go to the Marts tab and click Add Mart
  2. Enter a name (e.g., mart_monthly_sales)
  3. Select the fact table to aggregate
  4. Choose which dimensions to include and at what level (e.g., product category rather than individual product)
  5. Define aggregations — sum, count, average, min, max for each measure
  6. Set the time grain if applicable (daily, weekly, monthly)
  7. Click Save

Marts are optional. Some organisations query facts and dimensions directly. Others build marts to simplify reporting and reduce query complexity for analysts.

Configuration walkthrough: orders example

Here is a complete example of building a Kimball project for an e-commerce orders dataset.

Source tables (in your Snowflake raw schema):

  • raw_orders — order_id, customer_id, order_date, status, total_amount
  • raw_order_items — item_id, order_id, product_id, quantity, unit_price
  • raw_customers — customer_id, name, email, region, signup_date
  • raw_products — product_id, name, category, brand, unit_cost

Step 1: Create staging models for each source table. Rename columns, cast types (e.g., order_date from string to date), and set deduplication keys.

Step 2: Create dimensions:

  • dim_customer — grain: customer_id, attributes: name, email, region, signup_date. SCD Type 2 on region (track when customers move).
  • dim_product — grain: product_id, attributes: name, category, brand, unit_cost. SCD Type 1 (overwrite on change).
  • dim_date — Rime provides a built-in date dimension. Enable it and set the date range.

Step 3: Create facts:

  • fct_order_items — grain: item_id, measures: quantity, unit_price, line_total (quantity * unit_price). Dimension references: dim_customer (via order -> customer_id), dim_product (via product_id), dim_date (via order_date).

Step 4: Create marts:

  • mart_daily_sales — aggregate fct_order_items by date and product category: total_quantity, total_revenue, order_count.
  • mart_customer_summary — aggregate by customer: total_orders, total_spend, first_order_date, last_order_date.

Step 5: Preview and execute. Rime shows the full model list with expected table structures. Click Execute to run the transformation. Review test results when it completes.

Once built, you can view the full dependency graph on the Lineage page and automate execution via a pipeline.

Next steps

  • Explore Data Vault if you need full history tracking with flexible schema evolution
  • Learn about JSON Unpacking for handling nested data in source tables
  • Set up automated transformation runs in Building Pipelines