Given a few days off, now seems like a good time to take what we’ve learned and extrapolate what we can do. While the concepts of Data Mesh and Data Fabric are not new, we are seeing an increase in adoption. In particular, I have heard more discussions with partners and clients around Data Mesh. As for what’s next, I think AI/ML will play a larger role in data operations. In this first of a series of articles, we will look at Data Mesh.
Like fruits in a supermarket a Data Mesh treats data as a consumable product.
What is Data Mesh?
While Data Mesh is not suitable for every organization, the goals of reducing the time to actionable intelligence and serving data as a curated product certainly sound promising. Data Mesh is not a technology; rather it is an architecture or methodology for how to approach curating data for analytics. The four main tenets of the Data Mesh architecture include:
1. Data is a product to be consumed.
2. Data products are consumed through self-service offerings.
3. Data is curated by Subject Matter Experts (SMEs) of data domains.
4. Data governance is federated across all domains.
What is Data Mesh, really?
Data as a Product
The best analogy for a well-implemented Data Mesh is a supermarket. Data products are equivalent to bread products, fresh produce and canned goods. Shopping in a supermarket is largely a self-service task. The products are produced by SMEs, e.g., bread is produced from raw ingredients which are refined, baked and packaged into a self-service product. Federated governance comes in the form of labels and conventions. For example, products in the US display nutrition labels and sell-by or expiration dates.
The application of this methodology in data analytics will vary but must adhere to the tenets. The end result must be self-service data products. Like the supermarket, a consumer might purchase bread to be consumed directly or the bread might be combined with other products to produce a sandwich. Regardless, all products are consumable with minimal additional preparation.
Decentralized Data Domains
Like a supermarket, SMEs produce the data products. The production of bread and dairy are separate domains. In a modern organization, an ERP might be a central data source, but more and more SaaS offerings fill in gaps for HR and CRM applications for example. WIth Data Mesh, data production is no longer centralized to a Business Intelligence (BI) department which is responsible for ingesting, transforming and curating data.
Data Mesh leverages decentralized SMEs to produce the data product. The decentralization of domains addresses two common problems. First, the centralized BI team often suffers a backlog of data to produce. Second, the centralized BI team often does not have the expertise to quickly deliver data products. Both points contribute to delays in actionable intelligence.
While the data domains and SMEs are decentralized, the Data Mesh architecture does not require a decentralized platform. All of the data should reside in a data lake (supermarket) to abstract storage and compute, but the production of the data (bread, dairy, canned goods, etc.) might use different tools.
Federated Data Governance
Just as a supermarket is organized to make products easier to find, data in a Data Mesh must be easy to find. Generally, this need is addressed by a centralized data catalog.The data catalog also provides data labels, definitions and lineage to facilitate self-service data consumption. Federated data governance ensures data quality and consistency. For example, data governance can require all dates to follow a specific format or are based in the same time zone. Federated governance also provides consistent data definitions for terms such as “Year to Date” and “Price Per Unit.” Federated governance does not, however, dictate how data are ingested and transformed as those tasks are important to the data domains.
Self-Service Data Products
Perhaps the most delicate tenet of Data Mesh is the supermarket itself. How will data consumers access the data? Ideally, all data in a Data Mesh is stored in a data lake in an open-source format such as parquet, thus affording data consumers the choice of a multitude of tools. While the tenets of Data Mesh do not prescribe tools, some implementations might ease consumer access to data through the use of semantic layers and data set collections. To continue the analogy, the data supermarket might offer pre-cooked or ready-to-heat meals of data using data found on the shelves in the supermarket. In other words, semantic layers and data sets should avoid violating the tenets of Data Mesh by manipulating the data product beyond the federated governance of the data as prepared by the domain SMEs.
The Data Supermarket
In summary, Data Mesh is what’s new in big data and builds on the data lakes and data lakehouses to provide self-service data as a product. SImply put, Data Mesh provides guidance to build data supermarkets which allow business users, data analysts and data scientists to prepare delicious actionable intelligence. As for what’s next, a follow on article will look at how developments such as chat gpt might impact data analytics.