Private Cloud Operating Principles
Monday, September 16, 2013
Given the modular architecture of OpenStack as a cloud controller software, operators of OpenStack based private clouds have many choices on how to shape cloud as a service. For public cloud operators the answer is clear — emulate AWS. The answer can be difficult for companies building private clouds, particularly if those companies have been around for a while. What operating principles would you choose in such cases and why?
When we faced the same question at eBay last year, we chose a simple operating principle that we continue to practice every day.
Think and act like a public cloud provider to provide unfettered self-service access to native OpenStack capabilities.
Why should a private cloud operator choose such an operating principle? Isn’t it constraining? Yes, it can be constraining in the short-term, but in the long run, this simple principle allows for more innovation, agility, as well increased operational efficiency. Here is why.
Self-service: A public cloud mindset forces to care for self-service. No capability exists in a public cloud without an API that is integrated with all other cloud capabilities and works the same for every user. Users get access to those capabilities on their own with no tickets and approvals. For instance, in our cloud, users not only get compute, network and storage on their own, they can also bring in their own images and customize those to meet their needs unbeknownst to us. We in fact chose to put a lightly customized version of the Horizon dashboard along with all the public APIs in front of users and let them figure out what to do with those. This turned out to be infectious.
In the private cloud context, self-service brings in some challenges the biggest of which is compliance to various policies and processes in place. Most hurdles that users in large enterprises face do start with processes designed to enforce those policies. But policy enforcement is solvable. In stead of putting a person or committee in charge of approving and verifying compliance, you will need to build and put some software in-charge of compliance.
In addition to productivity gains due to agility, self-service helps democratize capacity planning and increase efficiency. I once worked in a company where teams used go in front of a committee a few months in-advance to get their capacity. When you’re faced with such a process, you tend to over-estimate capacity needs with the worry that you may not get the capacity you need when you need it. This is capacity hoarding. Self-service on the other-hand improves data center utilization and reduces waste since users know they will get the capacity that they need when they need it.
Abstractions: Self-service also forces a well-defined API layer abstracting the infrastructure from applications above. In the case of OpenStack, users don’t need to ask how to use those APIs since they can google for help on their own. Consequently, we get to focus on building and operationalizing the cloud while our users get to take charge of their use cases on top of the APIs. The challenge for the operator is to ensure compatibility with documented public APIs. For the opportunist, this challenge helps improve quality of OpenStack customizations needed to suite the business.
Unified Control Plane: One of the tempting patterns for private cloud operators is to spin up a separate instance of the cloud for each use case — such as a cloud for developers, a different cloud for QA, and an entirely different and isolated cloud for production use cases. In this model, cloud is treated as a software and tenants go to different places for different needs. Each cloud will have a different user experience, a different set of capabilities, and a different set of rules to play by. Such a model is less attractive for a public cloud operator due to user confusion, increased cost of operations and reduced flexibility to move capacity around. When you think and act like a public cloud provider, users don’t see multiple clouds. They see multiple regions and availability zones behind a unified control plane and can pick regions and availability zones to meet performance and availability constraints.
In the end it turns out that what’s good for business is not different for what is good for cloud users even when the cloud is private with the control plane behind a firewall.