Cloud giants like Google and Snowflake, unicorns like dbt Labs, and a host of venture-backed startups are now talking about a critical new layer in the data and analytics stack. Some call it a “metrics layer,” or a “metrics hub” or “headless BI,” but most call it a “semantic layer.” I prefer to call it a “semantic layer” because it best describes a business-friendly interface to data that serves a variety of use cases and user personas.
What Does a Semantic Layer Do?
A semantic layer makes data usable for everyone and presents a consistent, business-friendly interface to corporate data. It also does the following:
- Connects users to live data, of any shape and size, wherever it landed
- Delivers queries at the “speed of thought” on any size of data
- Governs user access to sensitive data for every query, regardless of the tool used
- Connects and blends data across silos from on-premise to cloud to SaaS applications
- Bridges the business and data science teams by integrating historical and predictive data
In the following sections, we’ll discuss the core capabilities of a semantic layer platform that you can use as a guide when evaluating vendors and solutions.
The Seven Capabilities of a Semantic Layer
A semantic layer platform needs to deliver on seven main vectors of value. The following diagram illustrates the core capabilities:
1. Consumption Integration
A semantic layer needs to be truly universal. This means it must support a variety of use cases and personas including business analysts, data scientists, and application developers. It also needs to support a wide range of query tools using their native protocols including SQL, MDX, DAX, Python REST, JDBC, and ODBC.
2. Semantic Modeling
The core of the semantic layer is the data model. A semantic layer maps the logical elements (dimensions, metrics, hierarchies, KPIs) to the physical entities of databases, tables, and relationships. In order to deliver a digital twin of the business, a semantic layer must support reusable models and components to drive a hub and spoke (data mesh) analytics management style backed by a CI/CD compatible markup language and GUI-based modeling environment.
3. Multi-Dimensional Calculation Engine
The semantic layer data model must be backed by a scalable, multi-dimensional engine to express a wide range of business concepts in a variety of contexts. The engine must support matrix-style calculations (time intelligence, multi-pass, etc.) using a multidimensional expression language like MDX or DAX and query underlying cloud data platforms “live” without data movement or a separate data store.
4. Performance Optimization
Without query acceleration, a semantic layer will likely be bypassed using BI tool extracts and imports, which defeats its purpose. As such, a semantic layer must automatically tune and improve performance using machine learning and user query patterns without moving data outside the native cloud data platform or requiring a separate cluster for managing aggregates.
5. Analytics Governance
A semantic layer needs to satisfy a wide range of data governance scenarios. It must integrate with corporate directory services (i.e., AD, LDAP, Okta) for user identity management, apply row-level security to every query and be able to hide and mask data columns based on user, group, and role-based (RBAC) access data rules.
6. Data Integration
Data lives in multiple silos, including on-premise, legacy data warehouses, data lakes, cloud data warehouses, and SaaS applications. A semantic layer must be capable of accessing and modeling data across these multiple sources and support a variety of data types including nested data like JSON.
A Semantic Layer – All or Nothing
A universal semantic layer is quickly becoming a critical component in a modern data and analytics stack. However, when evaluating semantic layer options, it’s important to keep one thing in mind: If any of the above requirements is missing, a semantic layer is unusable. In other words, it’s binary – it either works 100% or it doesn’t work at all. Don’t let this be an impediment, though, because a universal semantic layer makes everyone a data-driven decision-maker.