because writing is clarifying

because writing is clarifying

Subbu Allamaraju’s Journal

Don’t Chase Data Mesh, Yet

Data mesh is a nice wishy-washy set of ideas to improve the current state of data. The principles are based on sound reasoning and are well intended, but I find the story incomplete to transform the current state of data radically. There is plenty of money flowing into the industry. So there are companies to be funded, differentiated products to be built, customer segments to be carved out, conferences to be held, and we may still miss the opportunity to drive change.

Here is why I say this.

  1. You need modern developer-facing abstractions to decentralize data ownership to domain teams. Period. Otherwise, decentralizing centralized functions like data engineering would be more expensive and chaotic.
  2. Those primitives need to expose automated polyglot access patterns to support different types of use cases and users. These patterns include CRUD, search, real-time analytics, streaming to data lakes, generating business events, etc.
  3. More importantly, you need to argue for data mesh in terms of value and not pain. What’s broken is clear, but that’s insufficient to motivate and drive radical changes. Opportunity drives innovation at a much more rapid scale and rate than pain. I find today’s Data Mesh arguments generally pain-based.

Let me dive into each of these arguments. This article is a continuation of my previous article on the broken state of data.

Modern Data Abstractions

Let’s look back to see forward. Today, thanks to systems like Kubernetes, containers, and cloud APIs, most developers interact with and make changes to complex infrastructure without realizing the complexity underneath. The developer and operator-facing abstractions of systems like Kubernetes eliminated several time-confusing and frictional steps for deploying, scaling, remediating, and managing apps. These made principles like automating everything, infrastructure as code, immutability, and repeatability easy to do. A successive generation of tools made implementing these principles progressively easier, which made DevOps successful. In other words, principles alone won’t drive innovation; you need developer-facing abstractions to make it easy to do the right things.

So, what type of abstractions do we need on the data side? I surmise that these include usual CRUD operations, search, real-time analytics, schema tracking, creating business events, transforming such events, etc. Such abstractions should make domain teams own domain data and do most things without handoffs across org silos or needing an army of people to put these together manually.

Polyglot interfaces and multi-sided abstractions

Data is always a shared resource. Regardless of how you organize data ownership, you will end up with multiple teams producing and consuming related data. Modern data abstractions must facilitate multiple teams to contribute and utilize data for their needs without manually shoveling data around. For instance, some of your operational systems might use a proprietary query language for CRUD operations, your analytics teams might use SQL, and some app teams might use GraphQL to access that data. It does not mean you need to explicitly clone the data, create glue layers, or rewrite database engines to support polyglot access. Your core data store might do the internal reorganization for you and expose simple operational knobs. Such multi-sided abstractions will further shuffle complexity to be manageable.

This trend is already beginning to happen for some cloud-based offerings. Companies like MongoDB and Datastax seem to be headed in this direction, and there must be others. We should expect the innovation in this space to continue to make polyglot access easy. I suspect that disaggregation of database engines from storage will accelerate this trend and lower performance and cost penalties.

Value

My final point is about value. We tend to accept known pains. Yup, the current state is painful, but why drive a change when you have ten other value-generating projects in your enterprise? If you are a leader of a data organization in an enterprise, you need value-based arguments to lead socio-technical changes. Without it, your initiative won’t make it to the top-10 key results for your CEO.

Remember that DevOps did not happen just because we said so a thousand times. There was a clear value driver — reduce time to value. That was the single metric to go after. Once you aligned on that metric, change was possible. You were releasing once or twice a month, and now suddenly, you get to release hundreds or thousands of times a day. You’re taking less time to recover from incidents, and your teams are learning from production systems. Such stories broke the dev vs. ops silos and made cultural and organizational changes possible. We had some remarkable examples to show during the early days, like 10+ Deploys per Day: Dev and Ops Cooperation at Flickr in 2009.

But we don’t hear such clear arguments for value today. There are plenty of articles on “what is data mesh,” but I could not find any on “why data mesh?” to propose a singular value argument. For example, in one of those “what is” articles, the author says, “the faster access to query data directly translates into faster time to value without needing data transportation.” But how? What’s the return on investment? Another similar article highlights “greater data experimentation and innovation while lessening the burden on data teams to field the needs of every data consumer through a single pipeline.” That’s a pain argument. There is nothing wrong with these points, but these are not enough to drive step-function changes.

Quantifying value from data is already hard and quantifying value for a significant socio-technical transformation around data is harder. But that’s what you need to drive momentum. I’ve not seen arguments and examples along these lines. Sorry, I don’t know what that is either.

No socio-technical change will happen in one go, and winners won’t emerge overnight. We need multiple attempts over the next several years. Incumbents in the data landscape might not be willing to lead the path initially as it would impact their current business models. Open source will likely need to play a role to make it easier to put things together with standard parts. That’s my hypothesis for the future.

If you enjoyed this article, consider subscribing for future articles.

See Also