Data Product
π§ Please note that the Data Products concept still has some missing and some beta properties.
Definitionβ
A Data Product is a data set exposed for consumption outside the boundaries of the producing application or service via APIs. They are described through high quality metadata that can be accessed through the Data Product Directory (ORD Aggregator).
The Data Product concept is based on Data Mesh Principles (see also this book).
While that provides a concise definition, let's support that with a few more clarification points:
The following aspects of the definition are essential: (1) data, (2) APIs, (3) metadata and (4) product. If they are not covered, itβs not a Data Product. Optionally, a Data Product can also have (5) business semantics.
Data Aspectβ
- Applications produce data within a domain. An application takes responsibility for the data it produces, and the application offers data for consumption outside the original context in the form of a Data Product.
- A data product is a "data set" β which can include:
- Business Objects: master data, transaction data
- Other objects, e.g.: config data
- Analytical data, including cubes, measures and dimensions
- Graph data (e.g. who knows whom, recommendations)
- Documents (e.g. raw log entries, events, multi-level-aggregates, hierarchies)
- Spatial data
- A data product is exposed by a "producer" to fulfill the needs of "consumers".
- The data set is optimized toward "intensive reads" and consumed in a read-only fashion.
API Aspectβ
- Above we say that Data Products are consumed via APIs, but to be precise, they are consumed via APIs or Events (we treat events as a special form of API). In this doc, we generally use the term APIs to include Events (it is just more readable than always saying "APIs and/or Events").
- There is a clear expectation that the APIs are described via metadata for machine- and human-readable documentation.
- For Data Products only certain types of API Protocols and qualities (performant mass read) are adequate. E.g. SAP uses Delta Sharing, which we additionally describe with CSN Interop for richer metadata.
- Data Products are also expected to describe their data lineage. This is done via Data Product input ports, which are described in details as an ORD Integration Dependency
Metadata Aspectβ
- A data product is described by the producer via ORD, which fulfills the role as its discoverability API / port. Through this, the discoverability of a Data Product is decentralized and therefore "shifted left": It's the data products responsibility to describe itself. The ORD Aggregators(s) take on the responsibility of the Data Product Directory.
- Please note that ORD is only used to describe Data Products on (slow changing) metadata level. It is not intended as an active control API or as an API to fetch fast moving runtime data (e.g. log metrics)
- However, those can be added to the Data Product as dedicated APIs, which follow a standardized SPI contract and be marked as such via the ORD
implementationStandard
. This way, such APIs can still be discovered via API, but are treated as a separate concern.
- However, those can be added to the Data Product as dedicated APIs, which follow a standardized SPI contract and be marked as such via the ORD
Product Aspectβ
- The word "Product" does not imply that itβs something on the price list. Instead it only implies a product mindset towards its consumers. Typically, Data Products are not independent "products" but are available as part of a larger product that produces them.
- Data Products have owners that are responsible for defining what Data Products to produce to meet the needs of consumers. All data products have owners.
- The owners of the data product (at least of its definition) are ideally the domain owners / the same team that is responsible for the operational data (decentralized data products).
Business Semantic Aspectβ
- In ORD, it's not just possible to describe the APIs - and through them the data model / schema / syntax of the data. There are also Entity Types which can be used to describe the semantic model (underlying conceptual model) and map it with the technical API / data model.
Data Products at SAPβ
At SAP, the minimum required metadata is the description of the Data Product as an ORD resource. Additional metadata, e.g. CSN, can also be provided.
There are internal guidance what qualities and protocols a Data Product needs to or should have. Those are currently worked our during the beta phase. Once they are clear, they may find their way into a SAP specific policy level.
Architecture Overviewβ
Modelβ
The diagram is not a complete ER model, but highlights the most important relationships from Data Product perspective.
Rolesβ
Data Products are exposed by Producers so that they can be used by Consumers. Consumers can use Aggregators / Data Product Directories to discover, explore and understand Data Products.
- Data Product Producers are applications or services that expose data via one or more APIs and describe relevant contracts and information via metadata. Note that there are various types of producers.
- Data Product Consumers are applications or services that access and use the data from Data Products. Consumers can be of various types and cover both transactional and analytical applications. An application that processes operational data can be as Data Product consumer, as can analytical products like SAP Datasphere and SAP Analytics Cloud (SAC).
- The Data Product Directory (ORD Aggregator) is used by Consumers to find and discover available Data Products.
Current Statusβ
Please note that the Data Product concept still contains some BETA properties.
This has the following implications
- The beta-level properties are potentially subject to changes, although we aim to avoid breaking changes if possible.
- Many data product relevant attributes are currently not explicitly defined in the specification yet.
- Some attributes should be handled via documentation, e.g. Service Level Agreements via dataProductLinks of
type
:service-level-agreement
- Such attributes need to be defined through generic extensibility mechanisms like
labels
anddocumentationLabels
or added as text to the documentation. - We do this to gain more experience on what information we need to collect and how to structure them best. Later ORD Data Product releases will add more standardized properties or define a dedicated Data Product definition specification that can be attached.
- Which information needs to be added as additional extensibility attributes is currently only defined as SAP internal guidance.
- Some attributes should be handled via documentation, e.g. Service Level Agreements via dataProductLinks of