in OpenStack

OpenStack is not Cloud

In recent weeks, I’ve heard a number of opinions about OpenStack. Some want OpenStack to get compatible with AWS while others don’t care. Some say that you need vendor-built production-grade OpenStack distributions to succeed while others want you to buy solutions – with software and people – to help you build and operate an OpenStack cloud. Some others want you to buy racks of OpenStack. But here is the crux. OpenStack is not cloud. AWS is cloud. The difference is extremely significant.

My team at eBay builds and operates an OpenStack based private cloud. Note that the opinions I express here are my own and do not represent eBay. Except for the network virtualization layer, we use publicly available OpenStack and other open-source software. We offer cloud-primitives such as virtual compute, network and storage on demand directly to anyone that wants. Like any OpenStack power-user, we’ve got our cuts and bruises amidst happy user testimonials, but this post is not about those experiences.

A cloud is a service, and not just software. As far as the users of the service are concerned, a cloud is a set of APIs and tools backed by an elastic infrastructure that offers what the APIs and tools promise. Users care about availability of the cloud, elasticity of infrastructure, and on-demand self-service access to maintain business agility. APIs and dashboards are critical components of user experience, but that’s just a small part of a Cloud.

AWS is certainly a cloud. It includes things that drive user experience such as APIs, dashboards, and an ecosystem around those. Behind the scenes it includes an elastic infrastructure. Users remain agnostic of how that infrastructure is built and managed except for certain qualities like availability, performance, scalability, elasticity, and efficiency.

However, OpenStack is a cloud controller software. Though the community did a nice job at putting together this software, an instance of an OpenStack installation does not make a cloud. As an operator you will be dealing with many additional activities not all of which users see. These include infra onboarding, boostrapping, remediation, config management, patching, packaging, upgrades, high availability, monitoring, metrics, user support, capacity forecasting and management, billing or chargeback, reclamation, security, firewalls, DNS, integration with other internal infrastructure and tools, and on and on and on. These activities are bound to consume a significant amount of time and effort. OpenStack gives some very key ingredinents to build a cloud, but it is not cloud in a box.

Since a cloud is a service, you can’t approach it like it is boxed software like some pundits want us to believe. When you run a service, stuff happens. For instance, your hypervisors clog up disk space late in the night due to an obscure bug in the version of the OpenStack distribution you’re using. Or RabbitMQ gets into a split brain problem and the control plane freezes over. These are not OpenStack problems, but operational incidents that are bound to happen.

AWS API compatibility is what an operator should worry about above all these? Nah. You can fix API incompatibility with glue code.