because writing is clarifying

because writing is clarifying

Subbu Allamaraju’s Journal

Don’t Build Private Clouds

I’ve been noodling on this post for over a year. I discussed, debated and explained parts of what I write below with several folks during this time. I also changed my jobs this year. From mid 2012 to early this year, I lead a team that built one of the largest mid-sized fairly successful private clouds. I now lead an effort to migrate several large-scale mission critical systems from on-prem enterprise data centers to a public cloud. This transition gave me the time and opportunity to refine and expand the scope of my thinking. So, here is my appeal. Slow down on your private cloud projects, and get out of enterprise data centers as fast you can. You may be shooting for a local optimum with your private cloud strategy, and not the global maximum for the business.

You don’t need to own data centers unless you’re special

There are very few enterprises in the planet right now that need to own, operate and automate data centers. Unless you’ve at least 200,000 servers in multiple locations, or you’re in specific technology industries like communications, networking, media delivery, power, etc, you shouldn’t be in the data center and private cloud business. If you’re below this threshold, you should be spending most of your time and effort in getting out of the data center and not on automating and improving your on-prem data center footprint.

While the overall demand for compute footprint grew across the board in the industry, the number of enterprises that need to build and operate data centers to host that compute has been steadily shrinking. There are multiple factors at play behind this trend.

  • The scale, quality and the breadth of cloud services has increased manifold in the last few years. There are very few use cases that the big three public clouds can’t deal with today.
  • You no longer go to a public cloud because you needed virtual machines on demand. You go to a public cloud to consume a large buffet of services.
  • Physical compute, storage and network infrastructure is brittle, prone to failure and is not malleable. Automating these infrastructure primitives and making them ready to host apps and data is an as-a-service exercise. These services are large distributed systems that require talent, focus, trial-and-error and years of learning and operational experience. Typical enterprise IT departments are not setup to attack such problems. Trying to emulate the same within your data centers takes years, and most likely shifts your focus away from your core business. More about that below.
  • Despite what infrastructure vendors claim in their brochureware, there is no single vendor that can provide you with a full stack of capabilities that meet or exceed what a public cloud can provide.
  • There are fewer snowflake workloads that require special purpose-built hardware today than there were a decade ago. In most cases, the choice you get with designing servers is illusionary and likely backwards looking. With each passing year, it is getting cheaper and less time-consuming to solve problems using commodity software building blocks running on commodity compute.
  • Despite hundreds of millions of dollars of capex investments, most private clouds are not resilient to common infrastructure or software failures. Services to enable modern resiliency patterns rarely exist in private clouds. Consequently resiliency remains a pipe dream.

Private cloud makes you procrastinate doing the right things

When executed to its completion, a typical private cloud journey involves four key phases:

  • Phase 1: Build private cloud, starting with compute, and then storage and network, then scale out to several independent fault domains (like public cloud regions), automate the network to make it possible to implement load balancing, DNS, and various failover patterns.
  • Phase 2: Move your stateless monoliths to the private cloud. Most enterprise have at least one generation of such monoliths.
  • Phase 3: Then deal with the stateful monoliths. These are your large monolithic databases running on handcrafted hardware. This is usually where private cloud journey hits the wall due to the risk and complexity in making such monoliths cloud native.
  • Phase 4: Then transform your culture to operate as a cloud native organization.

This is a multi-year journey with each phase involving several hurdles and taking years to execute.

Would you start with Phase 1 in an on-prem data center, or go directly to Phases 2 on a public cloud?

Private cloud cost models are misleading

A typical server with modern specs can cost between $5000 and $10,000 and can last for 4 years. A public cloud virtual machine with comparable specs can cost between $1000 and $1500 per month. Such comparisons make private cloud strategies compelling. However, there are additional costs to add.

  1. Engineering costs to build and operate cloud services
  2. Cost of automating the network (note that no network vendor wants you to automate with open APIs)
  3. Cost of lost agility due to long planning, procurement and on-boarding cycles
  4. Cost of lost business opportunity due to time spend building a private cloud

Don’t underestimate on-prem data center influence on your organization’s culture

The state of infrastructure influences your organizational culture. A modern enterprise running on programmable cloud contributes to autonomous teams, rapid learning, and faster iterations of ideas. Brittle, time-consuming, human-operator driven, ticket based on-premises infrastructure on the other-hand brews a culture of mistrust, centralization, dependency and control.

Say, for instance, a team wants to enable TLS all the way from load balancers to their app servers. Such a team will likely have to deal with networking teams, security teams, and potentially several middle managers to execute the change over a period of several weeks if not months. The same team could execute this change on a public cloud in under a week and move on to the next thing. There are numerous examples like this.

These difference between on-premises data centers and public clouds influence how teams think, plan and execute. These are nothing but attributes of culture.

If you enjoyed this article, consider subscribing to my journal.

See Also