Building a Retail Semantic Layer with Unity Catalog Metric Views
In today’s retail environment, knowledge work depends on fast, consistent access to trusted information—and that’s exactly what a semantic layer powered by Databricks Unity Catalog Metric Views delivers. By translating complex data models into business-friendly concepts like “Total Sales” or “On-Time In-Full,” Metric Views eliminate the reporting chaos of hundreds of one-off SQL views and ensure every team works from the same, governed definitions of key metrics.
This architecture abstracts away technical complexity, allowing business leaders and analysts to focus on insights, not data plumbing. Because Metric Views live within Unity Catalog, they inherit fine-grained governance and access control automatically, creating a single, trusted information layer across the enterprise. The result is a self-service analytics environment that accelerates decision-making, improves coordination across merchandising, logistics, and operations, and transforms raw retail data into revenue-driving knowledge work—at scale and without dependency on data engineers for every new question.
What is a Semantic Layer?
A semantic layer is a business-friendly translation layer. It sits between complex, raw data tables and the end-users who need to consume that data, such as an analyst using a BI tool. (Learn more at the Databricks Glossary: What is a Semantic Layer?).
Think of it as a "data menu" for a restaurant. The kitchen (your data platform) has complex recipes (fct_sales, dim_customer). The customer just wants to order "Total Sales for the Midwest." The semantic layer is the menu that defines "Total Sales" with a specific recipe, allowing the user to get what they need by name.
As Databricks defines it, a semantic layer "creates a unified business view of data across an organization, regardless of where the data resides or how it’s structured.”
This layer is crucial because it:
- Abstracts Complexity: Hides complex joins and technical table names, presenting users with familiar terms like "Revenue" or "Region."
- Ensures Consistency: Provides a single source of truth. The definition of "Active Users" is defined once, ensuring all departments use the same calculation.
- Enables Self-Service: Empowers business users to build their own reports without needing a data engineer for every new question.
Databricks Metric Views & Unity Catalog
In Databricks, the Metric View is the specific object you create to build your semantic layer. It’s a tangible metadata object where you formally define your key metrics (like "Total Sales") and dimensions (like "Product Category") on top of your Lakehouse tables. (See the documentation: Unity Catalog Metric Views).
This feature is built directly into Unity Catalog, which is the foundation for governance and metadata for your data and AI assets on the Lakehouse. (See the product page: Databricks Unity Catalog).
This integration is the key. Because Metric Views live within Unity Catalog, all your data governance is applied automatically. UC manages the fine-grained access control, so you can decide who is allowed to see sensitive metrics (like "Employee Salaries") or data from specific regions.
In summary: Unity Catalog provides the single, governed foundation, and Metric Views build the unified semantic layer on top of it.
Why Are Metric Views So Useful?
The primary value of a Metric View is that it separates the definition of a metric from how an end-user queries it. This solves one of the biggest challenges in analytics: inconsistent logic and report sprawl.
Without a metric view, you'd use a standard SQL VIEW. You might create v_sales_by_quarter. But when the business needs "sales by region," you have to create another view, v_sales_by_region. This grows into hundreds of views, each with slightly different logic, making the data difficult to manage.
Metric Views fix this by letting you define your "building blocks" one time:
- Measure:
Total Revenue(defined asSUM(fct_sales.amount)) - Dimension:
Region(fromdim_customer.region) - Dimension:
Time(fromdim_date.calendar_date)
Now, a BI tool can ask for any combination. When a user requests "Total Revenue by Region," the system is smart enough to "generate the correct query based on the user’s selection, while preserving consistent, centralized metric logic.”
You define the logic once, and it can answer hundreds of questions.
A metric view is like the "swiss army knife" of views.
Key Benefits
- Powers Truly Self-Service Dashboards: A BI tool (like Tableau or Power BI) can connect to a single Metric View and see a clean list of measures and dimensions. An analyst can drag-and-drop "Total Revenue" and "Region" to build a report without writing any SQL.
- Creates a "Storefront" for Your Gold Layer: In a Medallion Data Architecture, your Gold tables are your ultimate source of truth. A Metric View is the consumption layer, or "storefront," that sits on top of those tables. It translates your clean data into interactive, easy-to-use assets for the entire organization.
Building a Practical Example: The "OTIF" Metric View
Let's continue building out a data lakehouse for the mythical retail company, "Big Cloud Dealz," running on the Medallion Data Architecture we started in the previous article.
The logistics team needs to track a complex KPI: On-Time In-Full (OTIF). This metric requires joining shipment data (was it on time?) with purchase order data (was it in full?) and enriching it with carrier and location details. This is a perfect use case for a metric view.
Defining the Metric (The "How")
Let's create a single metric view, big_cloud_dealz_development.gold.metric_view_otif, in YAML. Here's the text YAML definition:
version: 1.1
source: |
WITH shipments AS (
SELECT
s.po_id,
s.appt_ts,
s.actual_arrival_ts,
s.from_location_id,
s.to_location_id,
s.service_level,
s.carrier_id
FROM big_cloud_dealz_development.silver.fact_shipments s
), po_promises AS (
SELECT
h.po_id,
h.promised_arrival_end
FROM big_cloud_dealz_development.silver.fact_po_header h
), merged AS (
SELECT
sh.po_id,
sh.appt_ts,
CASE
WHEN sh.actual_arrival_ts IS NOT NULL
AND pr.promised_arrival_end IS NOT NULL
AND sh.actual_arrival_ts <= pr.promised_arrival_end
THEN 1 ELSE 0
END AS on_time,
sh.from_location_id,
sh.to_location_id,
sh.service_level,
sh.carrier_id
FROM shipments sh
LEFT JOIN po_promises pr
ON sh.po_id = pr.po_id
), po_line_sums AS (
SELECT po_id, SUM(qty_ordered) AS qty_ordered
FROM big_cloud_dealz_development.silver.fact_po_line
GROUP BY po_id
), receipt_sums AS (
SELECT po_id, SUM(received_quantity) AS received_quantity
FROM big_cloud_dealz_development.silver.fact_shipment_receipts
GROUP BY po_id
), in_full_po AS (
SELECT
pl.po_id,
pl.qty_ordered,
COALESCE(rc.received_quantity, 0) AS received_quantity,
CASE WHEN COALESCE(rc.received_quantity, 0) >= pl.qty_ordered THEN 1 ELSE 0 END AS in_full
FROM po_line_sums pl
LEFT JOIN receipt_sums rc USING (po_id)
) SELECT
CAST(m.appt_ts AS DATE) AS metric_date,
m.po_id,
m.on_time,
f.in_full,
CASE WHEN m.on_time = 1 AND f.in_full = 1 THEN 1 ELSE 0 END AS otif_flag,
m.from_location_id,
m.to_location_id,
m.service_level,
m.carrier_id
FROM merged m LEFT JOIN in_full_po f
ON m.po_id = f.po_id
joins:
- name: from_loc
source: big_cloud_dealz_development.silver.dim_locations
"on": from_loc.location_id = source.from_location_id AND from_loc.is_current =
TRUE
- name: to_loc
source: big_cloud_dealz_development.silver.dim_locations
"on": to_loc.location_id = source.to_location_id AND to_loc.is_current = TRUE
- name: carriers
source: big_cloud_dealz_development.silver.dim_carriers
"on": carriers.carrier_id = CAST(source.carrier_id AS STRING) AND carriers.is_current
= TRUE
comment: |
OTIF (On Time In Full) KPI with shipment-level flags, aligned to appointment date. Dimensions include multiple time grains and both origin/destination locations. Source logic mirrors Python compute_otif and the SQL view flags used in gold.view_otif_by_date.
dimensions:
- name: date
expr: metric_date
display_name: Date
format:
type: date
date_format: year_month_day
leading_zeros: true
synonyms:
- day
- appointment date
- appt date
- name: week
expr: "DATE_TRUNC('week', metric_date)"
display_name: Week
synonyms:
- iso week
- week start
- name: month
expr: "DATE_TRUNC('month', metric_date)"
display_name: Month
synonyms:
- calendar month
- order month
- name: quarter
expr: "DATE_TRUNC('quarter', metric_date)"
display_name: Quarter
- name: year
expr: "DATE_TRUNC('year', metric_date)"
display_name: Year
- name: from_location_id
expr: from_location_id
display_name: From Location ID
synonyms:
- origin id
- ship-from id
- name: from_location_name
expr: from_loc.name
display_name: From Location
synonyms:
- origin
- ship-from
- name: from_location_type
expr: from_loc.location_type
display_name: From Location Type
synonyms:
- origin type
- name: from_region
expr: from_loc.region
display_name: From Region
- name: from_state
expr: from_loc.state
display_name: From State
- name: to_location_id
expr: to_location_id
display_name: To Location ID
synonyms:
- destination id
- ship-to id
- name: to_location_name
expr: to_loc.name
display_name: To Location
synonyms:
- destination
- ship-to
- name: to_location_type
expr: to_loc.location_type
display_name: To Location Type
synonyms:
- destination type
- name: to_region
expr: to_loc.region
display_name: To Region
- name: to_state
expr: to_loc.state
display_name: To State
- name: carrier_id
expr: carrier_id
display_name: Carrier ID
- name: carrier_name
expr: carriers.carrier_name
display_name: Carrier
- name: mode
expr: carriers.mode
display_name: Transport Mode
synonyms:
- ship mode
- freight mode
- name: service_level
expr: service_level
display_name: Service Level
synonyms:
- svc level
- promise tier
measures:
- name: otif_rate
expr: AVG(otif_flag)
display_name: OTIF %
format:
type: percentage
decimal_places:
type: exact
places: 2
synonyms:
- on-time in full
- otif percent
- otif rate
- name: otif_count
expr: SUM(otif_flag)
display_name: OTIF Count
format:
type: number
decimal_places:
type: all
hide_group_separator: false
synonyms:
- count otif
- num otif
- name: shipment_rows
expr: COUNT(1)
display_name: Shipment Rows
format:
type: number
decimal_places:
type: all
- name: on_time_rate
expr: AVG(on_time)
display_name: On-time %
format:
type: percentage
decimal_places:
type: exact
places: 2
synonyms:
- on-time rate
- punctuality %
- name: in_full_rate
expr: AVG(in_full)
display_name: In-full %
format:
type: percentage
decimal_places:
type: exact
places: 2
synonyms:
- fill rate
- in full percent
- name: po_count
expr: COUNT(DISTINCT po_id)
display_name: PO Count (distinct)
format:
type: number
decimal_places:
type: all
synonyms:
- unique pos
- distinct po
Let's break down the major sections of the yaml definition of the metric view above.
source
This section is defined by a SQL query that pulls from Silver-layer tables. It joins fact_shipments with fact_po_header to calculate an on_time flag. It also joins fact_po_line with fact_shipment_receipts to calculate an in_full flag. This is where all the complex business logic lives, hidden from the user.
joins
The joins section defines how the core OTIF metric logic connects to related dimension tables—such as locations and carriers—so the metric view can enrich shipment data with descriptive business context. For example, it links each shipment’s origin and destination IDs to dim_locations to retrieve region, state, and location type, and joins carrier IDs to dim_carriers for names and transport modes. These joins ensure that every OTIF record is not just a numerical flag but a fully described business event, enabling users to analyze performance by carrier, route, or region without manually writing complex SQL joins.
dimensions
The dimensions section defines the contextual attributes—such as date, week, month, locations, carrier, and service level—that allow the OTIF metric to be sliced, filtered, and analyzed across different business perspectives. Each dimension maps to a specific column or derived expression in the underlying data (for example, DATE_TRUNC('month', metric_date) for month-level rollups), and includes user-friendly display names and synonyms so that AI/BI tools and conversational interfaces like Genie can understand natural language queries (“show OTIF by carrier” or “OTIF by region last quarter”). In short, dimensions turn raw data fields into business-intuitive handles for exploring performance and driving insight.
measures
In the measures section of this Databricks metric view, each entry defines a quantitative KPI that the platform can automatically compute, aggregate, and visualize for business users—essentially translating raw operational data into trusted performance metrics. For example, otif_rate calculates the percentage of purchase orders that were delivered both on time and in full, serving as the core KPI for logistics reliability. Supporting metrics like on_time_rate and in_full_rate break that down further, showing punctuality and fulfillment quality separately. Meanwhile, otif_count and po_count give absolute counts of compliant shipments and unique purchase orders, and shipment_rows provides a baseline measure of total records available for analysis.
This section formalizes logic that would otherwise live in ad hoc SQL or spreadsheet formulas. Each measure is not just a calculation but a governed, reusable semantic definition that’s enforced across AI/BI dashboards, Genie queries, and analytics layers. By centralizing these in the metric view YAML, Databricks ensures every tool and user—from engineers to executives—references the same logic for OTIF and related KPIs, dramatically improving consistency, data governance, and time-to-insight across the enterprise.
Querying Metric Views
When querying a metric view in Databricks, think of it as querying a business-ready semantic layer, not a raw table. Each metric view already contains the necessary joins, calculations, and metadata to produce accurate, consistent KPIs—so analysts or AI/BI tools can simply SELECT dimensions (like region, carrier, or month) and measures (like otif_rate or on_time_rate) without rebuilding logic.
In practice, this means you query it the same way as any SQL view, but the results come pre-governed and semantically enriched, making it ideal for dashboards, Genie prompts, and executive reporting.
When querying measures from a Databricks metric view, you need to use an aggregate function — such as SUM(), AVG(), COUNT(), or similar — because measures are designed to be aggregated across the chosen dimensions. For example:
SELECT
DATE_TRUNC('month', date) AS month,
AVG(otif_rate) AS avg_otif
FROM big_cloud_dealz_development.gold.metric_view_otif
GROUP BY month;
If you want to just use the default measure logic with no aggregation function, then measures must be wrapped in MEASURE() aggregate functions when queried, since they represent numeric KPIs summarized over groups of dimension values.
How the MEASURE Function Differs from AVG() and SUM() in Databricks SQL
When you write SUM(column) or AVG(column) in a conventional SQL query, you are explicitly telling the query engine how to aggregate a column — either by summing or averaging all values. You must decide what metric to calculate and include the correct GROUP BY clause if you’re slicing by dimensions. These functions operate directly on the data and require you to define the aggregation logic manually each time you query.
By contrast, in a Databricks metric view, the MEASURE() function is used to reference pre-defined business metrics without redefining their aggregation logic. Each measure is already defined in the YAML configuration with its own expression (for example, AVG(otif_flag) or SUM(otif_flag)). When you query the metric view, calling MEASURE(otif_rate) tells Databricks to automatically apply that governed definition. This means you’re consuming a standardized KPI rather than recreating it, ensuring consistency across dashboards, BI tools, and Genie queries.
This abstraction allows teams to separate metric definition from metric consumption. Analysts and AI systems can safely use MEASURE() to retrieve accurate, governed results without worrying about how the measure is calculated internally — dramatically reducing risk, duplication, and inconsistency across the analytics stack.
-- Traditional SQL with explicit aggregation
SELECT
DATE_TRUNC('month', date) AS month,
AVG(otif_flag) AS otif_rate
FROM big_cloud_dealz_development.gold.metric_view_otif
GROUP BY month;
-- Databricks metric view query using MEASURE()
SELECT
DATE_TRUNC('month', date) AS month,
MEASURE(otif_rate) AS otif_percentage
FROM big_cloud_dealz_development.gold.metric_view_otif
GROUP BY month;
Ok, with all that explanation out of the way, let's get down to some example use cases of this handy OTIF metric view.
Example 1: Monthly OTIF Trend by Carrier
Business Question: "Which carriers are consistently meeting their delivery promises?"
SELECT
month,
carrier_name,
MEASURE(otif_rate) AS otif_rate,
MEASURE(on_time_rate) AS on_time_rate,
MEASURE(in_full_rate) AS in_full_rate
FROM big_cloud_dealz_development.gold.metric_view_otif
GROUP BY month, carrier_name
ORDER BY month ASC, otif_rate DESC;
Why it's simple: The analyst just listed the dimensions and measures they wanted. The Metric View automatically generates the complex SQL to join all tables, calculate all three flags, and aggregate the results.
Example 2: Regional OTIF Comparison by Transport Mode
Business Question: "How do our different transportation modes perform across our destination regions?"
SELECT
to_region,
mode,
MEASURE(otif_rate) AS otif_rate,
MEASURE(shipment_rows) AS shipments
FROM big_cloud_dealz_development.gold.metric_view_otif
WHERE date >= DATE '2025-01-01'
GROUP BY to_region, mode
ORDER BY otif_rate DESC;
Why it's simple: The user is combining dimensions from two different joined tables (dim_locations and dim_carriers), and the Metric View handles these "fan-out" joins automatically.
Example 3: Daily OTIF Diagnostics
Business Question: "Which destination regions had the worst OTIF performance last month?"
-- Find the 50 worst-performing region-days last month
SELECT
date,
to_region,
MEASURE(otif_rate),
MEASURE(otif_count)
FROM big_cloud_dealz_development.gold.metric_view_otif
WHERE date BETWEEN DATE '2025-09-01' AND DATE '2025-09-30'
GROUP BY date, to_region
ORDER BY MEASURE(otif_rate) ASC
LIMIT 50;
Why it's simple: This ad-hoc query is easy to write. The user can filter, group, and order by any combination of metrics without needing to know anything about the underlying tables or logic.
Summary
In this article I showed how to build metric views for our semantic layer to power retail logistics knowledge work.
In the next article in this series we'll show how to start putting these information building blocks to work by integrating them into AI / BI Dashboards for management in a retail logistics division.
Next in Series
Next-Generation Retail Logistics Analytics with Databricks AI / BI Dashboards
By surfacing key performance indicators like On-Time In-Full (OTIF) and Stockout Exposure through dynamic visualizations and natural-language insights, the dashboard transforms complex warehouse, carrier, and inventory data into actionable intelligence for the COO, logistics leads, and merchandising teams alike.
Read next article in series