in Uncategorized

My Gripes with Cloud Foundry

I tried setting up Cloud Foundry this week using Micro Cloud Foundry. See my post on ql.on Cloud Foundry for details. The setup and experience is not that hard – it took two late nights to figure out the details. But I’m not sure if it is ready for the prime time. Here are my observations.

  • Cloud Foundry has the notion of multiple apps on a given VM, but is this the right model? I would let the VM worry about isolating CPU, memory and the disk so that apps remain independent of one another and tuning can be done be independently. The multi-app model may be okay for several small apps running on a single VM, but not otherwise.
  • Cloud Foundry uses an L7 router (ngnix) between clients and apps. The router needs to parse HTTP before it can route requests to apps. This approach does not work for non-HTTP protocols like WebSockets. Folks running node.js are going to run into this issue but there are no easy fixes in the current architecture of Cloud Foundry.
  • As of now, I can’t get chunked responses work on node.js apps deployed on Cloud Foundry VM. See https://gist.github.com/2164855 for an example.
  • In the real world, I need to explicitly control routing and load balancing at a tier in front of my apps. But Cloud Foundry adds one more unnecessary routing tier on each VM.
  • There is no support for rolling upgrades. May be there are ways to enable this, but it is not clear from the documentation. So availability becomes a manual exercise.
  • The story about logging, monitoring and metrics is vague. I see some references to third-party plug ins for certain kinds of apps, but support seems to be missing in the PaaS layer.

If you find that my understanding is incorrect, please help clarify and set us in the right direction. My interest is in getting node.js stack run well on Cloud Foundry.

Update: A work-around for the chunked encoding issue is to always add the Content-Length to responses.

Write a Comment

Comment

    • What can I say! The node.js app is serving chunked response. Without CF, the response is

      < HTTP/1.1 200 OK
      < X-Powered-By: ql.io/node.js v0.6.10
      < vary: accept-encoding
      < Server: ql.io-0.4.12/node.js-v0.6.10
      < Date: Thu, 22 Mar 2012 22:08:29 GMT
      < Connection: keep-alive
      < Transfer-Encoding: chunked
      < content-type: application/json

  1. Maybe try sticking both the back-end URLs into REDbot, see if it catches anything. If not, looks like they’re not properly de-chunking responses — although I’d be really surprised if this was an nginx bug, given how widely deployed it is. Maybe they’re using some sort of custom client side?

  2. I agree there is room for a few improvements. Here are a few counter points from my own observations:

    * Cloud Foundry itself doesn’t do anything with VMs. It doesn’t provide anything at the IaaS level. You might be mislabeling it as the DEA. The DEAs have some built in handling for ensuring instances are more balanced between nodes, but think some of that changed in the latest release, supposedly for the better.

    * Although it would be nice to have more control over the nginx layer, if you wanted lower level routing of requests, you’d basically need your own IP per application or non-standard external ports. This isn’t necessarily practical either.

    * If running your own Cloud Foundry setup, you could customize the process to allow have the router in front, or behind something non-nginx like an F5, and then look at running nginx on each DEA with a port/config per app. I’ve been looking at doing something along this line, putting nginx closer to the app to make it more customizable.

    * Restarts (or deploys overall), metrics, and monitoring is definitely something that is on my mind with the PaaS I’ve been building.

    • * Cloud Foundry itself doesn’t do anything with VMs. It doesn’t provide anything at the IaaS level. You might be mislabeling it as the DEA. The DEAs have some built in handling for ensuring instances are more balanced between nodes, but think some of that changed in the latest release, supposedly for the better.

      You’re right about VM vs DEA. Thanks for correcting.

      * Although it would be nice to have more control over the nginx layer, if you wanted lower level routing of requests, you’d basically need your own IP per application or non-standard external ports. This isn’t necessarily practical either.

      I would prefer having no L7 level interception. AFAIU, the route is a consequence of using a particular architecture for DEA and not otherwise.

  3. Good observations, Subbu. Our experiences, in building our private cloud for X.commerce, align well with Ken’s:

    * We’re building from the IaaS up, so the application/DEA/VM allocation is completely within our control. We’re currently doing 1-1-1 allocation to simplify our resource management, but will likely move to Linux containers in the future, to improve efficiency.

    * Like Ken notes, we’re going to, over time, need to look at changing where nginx sits in the architecture. Right now, it’s worked fine (aside from minor issues), and has let us get up and running very quickly. Our application architecture is uniform, such that we can push routing and load balancing out of the application’s control.

    * Rolling upgrades, logging, monitoring, etc. are things we’re implementing as part of our cloud build.

    And thanks for getting chunked responses working quickly!