Loading...

Building a Retail Semantic Layer with Unity Catalog Metric Views

In today’s retail environment, knowledge work depends on fast, consistent access to trusted information—and that’s exactly what a semantic layer powered by Databricks Unity Catalog Metric Views delivers. By translating complex data models into business-friendly concepts like “Total Sales” or “On-Time In-Full,” Metric Views eliminate the reporting chaos of hundreds of one-off SQL views and ensure every team works from the same, governed definitions of key metrics.

This architecture abstracts away technical complexity, allowing business leaders and analysts to focus on insights, not data plumbing. Because Metric Views live within Unity Catalog, they inherit fine-grained governance and access control automatically, creating a single, trusted information layer across the enterprise. The result is a self-service analytics environment that accelerates decision-making, improves coordination across merchandising, logistics, and operations, and transforms raw retail data into revenue-driving knowledge work—at scale and without dependency on data engineers for every new question.

What is a Semantic Layer?

A semantic layer is a business-friendly translation layer. It sits between complex, raw data tables and the end-users who need to consume that data, such as an analyst using a BI tool. (Learn more at the Databricks Glossary: What is a Semantic Layer?).

Think of it as a "data menu" for a restaurant. The kitchen (your data platform) has complex recipes (fct_sales, dim_customer). The customer just wants to order "Total Sales for the Midwest." The semantic layer is the menu that defines "Total Sales" with a specific recipe, allowing the user to get what they need by name.

As Databricks defines it, a semantic layer "creates a unified business view of data across an organization, regardless of where the data resides or how it’s structured.”

This layer is crucial because it:

  • Abstracts Complexity: Hides complex joins and technical table names, presenting users with familiar terms like "Revenue" or "Region."
  • Ensures Consistency: Provides a single source of truth. The definition of "Active Users" is defined once, ensuring all departments use the same calculation.
  • Enables Self-Service: Empowers business users to build their own reports without needing a data engineer for every new question.

Databricks Metric Views & Unity Catalog

In Databricks, the Metric View is the specific object you create to build your semantic layer. It’s a tangible metadata object where you formally define your key metrics (like "Total Sales") and dimensions (like "Product Category") on top of your Lakehouse tables. (See the documentation: Unity Catalog Metric Views).

This feature is built directly into Unity Catalog, which is the foundation for governance and metadata for your data and AI assets on the Lakehouse. (See the product page: Databricks Unity Catalog).

This integration is the key. Because Metric Views live within Unity Catalog, all your data governance is applied automatically. UC manages the fine-grained access control, so you can decide who is allowed to see sensitive metrics (like "Employee Salaries") or data from specific regions.

In summary: Unity Catalog provides the single, governed foundation, and Metric Views build the unified semantic layer on top of it.

Why Are Metric Views So Useful?

The primary value of a Metric View is that it separates the definition of a metric from how an end-user queries it. This solves one of the biggest challenges in analytics: inconsistent logic and report sprawl.

Without a metric view, you'd use a standard SQL VIEW. You might create v_sales_by_quarter. But when the business needs "sales by region," you have to create another view, v_sales_by_region. This grows into hundreds of views, each with slightly different logic, making the data difficult to manage.

Metric Views fix this by letting you define your "building blocks" one time:

  • Measure: Total Revenue (defined as SUM(fct_sales.amount))
  • Dimension: Region (from dim_customer.region)
  • Dimension: Time (from dim_date.calendar_date)

Now, a BI tool can ask for any combination. When a user requests "Total Revenue by Region," the system is smart enough to "generate the correct query based on the user’s selection, while preserving consistent, centralized metric logic.”

You define the logic once, and it can answer hundreds of questions.

A metric view is like the "swiss army knife" of views.

Key Benefits

  • Powers Truly Self-Service Dashboards: A BI tool (like Tableau or Power BI) can connect to a single Metric View and see a clean list of measures and dimensions. An analyst can drag-and-drop "Total Revenue" and "Region" to build a report without writing any SQL.
  • Creates a "Storefront" for Your Gold Layer: In a Medallion Data Architecture, your Gold tables are your ultimate source of truth. A Metric View is the consumption layer, or "storefront," that sits on top of those tables. It translates your clean data into interactive, easy-to-use assets for the entire organization.

Building a Practical Example: The "OTIF" Metric View

Let's continue building out a data lakehouse for the mythical retail company, "Big Cloud Dealz," running on the Medallion Data Architecture we started in the previous article.

The logistics team needs to track a complex KPI: On-Time In-Full (OTIF). This metric requires joining shipment data (was it on time?) with purchase order data (was it in full?) and enriching it with carrier and location details. This is a perfect use case for a metric view.

Defining the Metric (The "How")

Let's create a single metric view, big_cloud_dealz_development.gold.metric_view_otif, in YAML. Here's the text YAML definition:


version: 1.1
source: |
  WITH shipments AS (
    SELECT
      s.po_id,
      s.appt_ts,
      s.actual_arrival_ts,
      s.from_location_id,
      s.to_location_id,
      s.service_level,
      s.carrier_id
    FROM big_cloud_dealz_development.silver.fact_shipments s
  ), po_promises AS (
    SELECT
      h.po_id,
      h.promised_arrival_end
    FROM big_cloud_dealz_development.silver.fact_po_header h
  ), merged AS (
    SELECT
      sh.po_id,
      sh.appt_ts,
      CASE
        WHEN sh.actual_arrival_ts IS NOT NULL
         AND pr.promised_arrival_end IS NOT NULL
         AND sh.actual_arrival_ts <= pr.promised_arrival_end
        THEN 1 ELSE 0
      END AS on_time,
      sh.from_location_id,
      sh.to_location_id,
      sh.service_level,
      sh.carrier_id
    FROM shipments sh
    LEFT JOIN po_promises pr
      ON sh.po_id = pr.po_id
  ), po_line_sums AS (
    SELECT po_id, SUM(qty_ordered) AS qty_ordered
    FROM big_cloud_dealz_development.silver.fact_po_line
    GROUP BY po_id
  ), receipt_sums AS (
    SELECT po_id, SUM(received_quantity) AS received_quantity
    FROM big_cloud_dealz_development.silver.fact_shipment_receipts
    GROUP BY po_id
  ), in_full_po AS (
    SELECT
      pl.po_id,
      pl.qty_ordered,
      COALESCE(rc.received_quantity, 0) AS received_quantity,
      CASE WHEN COALESCE(rc.received_quantity, 0) >= pl.qty_ordered THEN 1 ELSE 0 END AS in_full
    FROM po_line_sums pl
    LEFT JOIN receipt_sums rc USING (po_id)
  ) SELECT
    CAST(m.appt_ts AS DATE) AS metric_date,
    m.po_id,
    m.on_time,
    f.in_full,
    CASE WHEN m.on_time = 1 AND f.in_full = 1 THEN 1 ELSE 0 END AS otif_flag,
    m.from_location_id,
    m.to_location_id,
    m.service_level,
    m.carrier_id
  FROM merged m LEFT JOIN in_full_po f
    ON m.po_id = f.po_id
joins:
  - name: from_loc
    source: big_cloud_dealz_development.silver.dim_locations
    "on": from_loc.location_id = source.from_location_id AND from_loc.is_current =
      TRUE
  - name: to_loc
    source: big_cloud_dealz_development.silver.dim_locations
    "on": to_loc.location_id = source.to_location_id AND to_loc.is_current = TRUE
  - name: carriers
    source: big_cloud_dealz_development.silver.dim_carriers
    "on": carriers.carrier_id = CAST(source.carrier_id AS STRING) AND carriers.is_current
      = TRUE
comment: |
  OTIF (On Time In Full) KPI with shipment-level flags, aligned to appointment date. Dimensions include multiple time grains and both origin/destination locations. Source logic mirrors Python compute_otif and the SQL view flags used in gold.view_otif_by_date.
dimensions:
  - name: date
    expr: metric_date
    display_name: Date
    format:
      type: date
      date_format: year_month_day
      leading_zeros: true
    synonyms:
      - day
      - appointment date
      - appt date
  - name: week
    expr: "DATE_TRUNC('week', metric_date)"
    display_name: Week
    synonyms:
      - iso week
      - week start
  - name: month
    expr: "DATE_TRUNC('month', metric_date)"
    display_name: Month
    synonyms:
      - calendar month
      - order month
  - name: quarter
    expr: "DATE_TRUNC('quarter', metric_date)"
    display_name: Quarter
  - name: year
    expr: "DATE_TRUNC('year', metric_date)"
    display_name: Year
  - name: from_location_id
    expr: from_location_id
    display_name: From Location ID
    synonyms:
      - origin id
      - ship-from id
  - name: from_location_name
    expr: from_loc.name
    display_name: From Location
    synonyms:
      - origin
      - ship-from
  - name: from_location_type
    expr: from_loc.location_type
    display_name: From Location Type
    synonyms:
      - origin type
  - name: from_region
    expr: from_loc.region
    display_name: From Region
  - name: from_state
    expr: from_loc.state
    display_name: From State
  - name: to_location_id
    expr: to_location_id
    display_name: To Location ID
    synonyms:
      - destination id
      - ship-to id
  - name: to_location_name
    expr: to_loc.name
    display_name: To Location
    synonyms:
      - destination
      - ship-to
  - name: to_location_type
    expr: to_loc.location_type
    display_name: To Location Type
    synonyms:
      - destination type
  - name: to_region
    expr: to_loc.region
    display_name: To Region
  - name: to_state
    expr: to_loc.state
    display_name: To State
  - name: carrier_id
    expr: carrier_id
    display_name: Carrier ID
  - name: carrier_name
    expr: carriers.carrier_name
    display_name: Carrier
  - name: mode
    expr: carriers.mode
    display_name: Transport Mode
    synonyms:
      - ship mode
      - freight mode
  - name: service_level
    expr: service_level
    display_name: Service Level
    synonyms:
      - svc level
      - promise tier
measures:
  - name: otif_rate
    expr: AVG(otif_flag)
    display_name: OTIF %
    format:
      type: percentage
      decimal_places:
        type: exact
        places: 2
    synonyms:
      - on-time in full
      - otif percent
      - otif rate
  - name: otif_count
    expr: SUM(otif_flag)
    display_name: OTIF Count
    format:
      type: number
      decimal_places:
        type: all
      hide_group_separator: false
    synonyms:
      - count otif
      - num otif
  - name: shipment_rows
    expr: COUNT(1)
    display_name: Shipment Rows
    format:
      type: number
      decimal_places:
        type: all
  - name: on_time_rate
    expr: AVG(on_time)
    display_name: On-time %
    format:
      type: percentage
      decimal_places:
        type: exact
        places: 2
    synonyms:
      - on-time rate
      - punctuality %
  - name: in_full_rate
    expr: AVG(in_full)
    display_name: In-full %
    format:
      type: percentage
      decimal_places:
        type: exact
        places: 2
    synonyms:
      - fill rate
      - in full percent
  - name: po_count
    expr: COUNT(DISTINCT po_id)
    display_name: PO Count (distinct)
    format:
      type: number
      decimal_places:
        type: all
    synonyms:
      - unique pos
      - distinct po

Let's break down the major sections of the yaml definition of the metric view above.

source

This section is defined by a SQL query that pulls from Silver-layer tables. It joins fact_shipments with fact_po_header to calculate an on_time flag. It also joins fact_po_line with fact_shipment_receipts to calculate an in_full flag. This is where all the complex business logic lives, hidden from the user.

joins

The joins section defines how the core OTIF metric logic connects to related dimension tables—such as locations and carriers—so the metric view can enrich shipment data with descriptive business context. For example, it links each shipment’s origin and destination IDs to dim_locations to retrieve region, state, and location type, and joins carrier IDs to dim_carriers for names and transport modes. These joins ensure that every OTIF record is not just a numerical flag but a fully described business event, enabling users to analyze performance by carrier, route, or region without manually writing complex SQL joins.

dimensions

The dimensions section defines the contextual attributes—such as date, week, month, locations, carrier, and service level—that allow the OTIF metric to be sliced, filtered, and analyzed across different business perspectives. Each dimension maps to a specific column or derived expression in the underlying data (for example, DATE_TRUNC('month', metric_date) for month-level rollups), and includes user-friendly display names and synonyms so that AI/BI tools and conversational interfaces like Genie can understand natural language queries (“show OTIF by carrier” or “OTIF by region last quarter”). In short, dimensions turn raw data fields into business-intuitive handles for exploring performance and driving insight.

measures

In the measures section of this Databricks metric view, each entry defines a quantitative KPI that the platform can automatically compute, aggregate, and visualize for business users—essentially translating raw operational data into trusted performance metrics. For example, otif_rate calculates the percentage of purchase orders that were delivered both on time and in full, serving as the core KPI for logistics reliability. Supporting metrics like on_time_rate and in_full_rate break that down further, showing punctuality and fulfillment quality separately. Meanwhile, otif_count and po_count give absolute counts of compliant shipments and unique purchase orders, and shipment_rows provides a baseline measure of total records available for analysis.

This section formalizes logic that would otherwise live in ad hoc SQL or spreadsheet formulas. Each measure is not just a calculation but a governed, reusable semantic definition that’s enforced across AI/BI dashboards, Genie queries, and analytics layers. By centralizing these in the metric view YAML, Databricks ensures every tool and user—from engineers to executives—references the same logic for OTIF and related KPIs, dramatically improving consistency, data governance, and time-to-insight across the enterprise.

Querying Metric Views

When querying a metric view in Databricks, think of it as querying a business-ready semantic layer, not a raw table. Each metric view already contains the necessary joins, calculations, and metadata to produce accurate, consistent KPIs—so analysts or AI/BI tools can simply SELECT dimensions (like region, carrier, or month) and measures (like otif_rate or on_time_rate) without rebuilding logic.

In practice, this means you query it the same way as any SQL view, but the results come pre-governed and semantically enriched, making it ideal for dashboards, Genie prompts, and executive reporting.

When querying measures from a Databricks metric view, you need to use an aggregate function — such as SUM(), AVG(), COUNT(), or similar — because measures are designed to be aggregated across the chosen dimensions. For example:


SELECT
  DATE_TRUNC('month', date) AS month,
  AVG(otif_rate) AS avg_otif
FROM big_cloud_dealz_development.gold.metric_view_otif
GROUP BY month;

If you want to just use the default measure logic with no aggregation function, then measures must be wrapped in MEASURE() aggregate functions when queried, since they represent numeric KPIs summarized over groups of dimension values.

How the MEASURE Function Differs from AVG() and SUM() in Databricks SQL

When you write SUM(column) or AVG(column) in a conventional SQL query, you are explicitly telling the query engine how to aggregate a column — either by summing or averaging all values. You must decide what metric to calculate and include the correct GROUP BY clause if you’re slicing by dimensions. These functions operate directly on the data and require you to define the aggregation logic manually each time you query.

By contrast, in a Databricks metric view, the MEASURE() function is used to reference pre-defined business metrics without redefining their aggregation logic. Each measure is already defined in the YAML configuration with its own expression (for example, AVG(otif_flag) or SUM(otif_flag)). When you query the metric view, calling MEASURE(otif_rate) tells Databricks to automatically apply that governed definition. This means you’re consuming a standardized KPI rather than recreating it, ensuring consistency across dashboards, BI tools, and Genie queries.

This abstraction allows teams to separate metric definition from metric consumption. Analysts and AI systems can safely use MEASURE() to retrieve accurate, governed results without worrying about how the measure is calculated internally — dramatically reducing risk, duplication, and inconsistency across the analytics stack.


-- Traditional SQL with explicit aggregation
SELECT
  DATE_TRUNC('month', date) AS month,
  AVG(otif_flag) AS otif_rate
FROM big_cloud_dealz_development.gold.metric_view_otif
GROUP BY month;

-- Databricks metric view query using MEASURE()
SELECT
  DATE_TRUNC('month', date) AS month,
  MEASURE(otif_rate) AS otif_percentage
FROM big_cloud_dealz_development.gold.metric_view_otif
GROUP BY month;

Ok, with all that explanation out of the way, let's get down to some example use cases of this handy OTIF metric view.

Example 1: Monthly OTIF Trend by Carrier

Business Question: "Which carriers are consistently meeting their delivery promises?"

SELECT
  month,
  carrier_name,
  MEASURE(otif_rate) AS otif_rate,
  MEASURE(on_time_rate) AS on_time_rate,
  MEASURE(in_full_rate) AS in_full_rate
FROM big_cloud_dealz_development.gold.metric_view_otif
GROUP BY month, carrier_name
ORDER BY month ASC, otif_rate DESC;

Why it's simple: The analyst just listed the dimensions and measures they wanted. The Metric View automatically generates the complex SQL to join all tables, calculate all three flags, and aggregate the results.

Example 2: Regional OTIF Comparison by Transport Mode

Business Question: "How do our different transportation modes perform across our destination regions?"

SELECT
  to_region,
  mode,
  MEASURE(otif_rate) AS otif_rate,
  MEASURE(shipment_rows) AS shipments
FROM big_cloud_dealz_development.gold.metric_view_otif
WHERE date >= DATE '2025-01-01'
GROUP BY to_region, mode
ORDER BY otif_rate DESC;

Why it's simple: The user is combining dimensions from two different joined tables (dim_locations and dim_carriers), and the Metric View handles these "fan-out" joins automatically.

Example 3: Daily OTIF Diagnostics

Business Question: "Which destination regions had the worst OTIF performance last month?"

-- Find the 50 worst-performing region-days last month
SELECT
  date,
  to_region,
  MEASURE(otif_rate),
  MEASURE(otif_count)
FROM big_cloud_dealz_development.gold.metric_view_otif
WHERE date BETWEEN DATE '2025-09-01' AND DATE '2025-09-30'
GROUP BY date, to_region
ORDER BY MEASURE(otif_rate) ASC
LIMIT 50;

Why it's simple: This ad-hoc query is easy to write. The user can filter, group, and order by any combination of metrics without needing to know anything about the underlying tables or logic.

Summary

In this article I showed how to build metric views for our semantic layer to power retail logistics knowledge work.

In the next article in this series we'll show how to start putting these information building blocks to work by integrating them into AI / BI Dashboards for management in a retail logistics division.

Next in Series

Next-Generation Retail Logistics Analytics with Databricks AI / BI Dashboards

By surfacing key performance indicators like On-Time In-Full (OTIF) and Stockout Exposure through dynamic visualizations and natural-language insights, the dashboard transforms complex warehouse, carrier, and inventory data into actionable intelligence for the COO, logistics leads, and merchandising teams alike.

Read next article in series