in Uncategorized

Nodejs vs Play for Front-End Apps

Mar 29, 2011: The source used for these tests is now available at https://github.com/s3u/ebay-srp-nodejs and https://github.com/s3u/ebay-srp-play.

Mar 27, 2011: I updated the charts based on new runs and some feedback. If you have any tips for improving numbers for either Nodejs or Play, please leave a comment, and I will rerun the tests.

We often see “hello world” style apps used for benchmarking servers. A “hello world” app can produce low-latency responses under several thousands of concurrent connections, but such tests do not help make choices for building real world apps. Here is a test I did at eBay recently comparing a front-end app built using two different stacks:

  1. nodejs (version 0.4.3) as the HTTP server, using Express (with NODE_ENV=production) as the web framework with EJS templates and cluster for launching node instances (cluster launches 8 instances of nodejs for the machine I used for testing)

  2. Play framework (version 1.1.1) as the web framework in production mode on Java 1.6.0_20.

The intent behind my choice of the Play framework is to pick up a stack that uses rails-style controller and view templates for front-end apps, but runs on the JVM. The Java-land is littered with a large number of complex legacy frameworks that don’t even get HTTP right, but I found Play easy to work with. I spent nearly equal amounts of time (under two hours) to build the same app on nodejs and Play.

The test app is purpose built. It includes a single search results page that renders search results fetched from a backend source. The flow is simple – the user submits some text, the front-end fires off a request to the backend, the backend responds with JSON, the front end parses it, and renders the results using a set of HTML templates. The idea of this app is to represent front-end apps that produce markup with/without backend IO.

In my test setup, the average result from backend is about 150k – it is JSON formatted and not compressed. The results page consists of 8 templates – each for different parts of the page like header, footer, sidebar etc. The sizes of the template files range from 250 bytes to under 2k. In order to ensure that backend latency does not influence testing, search requests are proxied through Apache Traffic Server acting as a forward proxy. The cache is tuned to always generate a hit. Such a high cache hit is not realistic, but it helped me isolate the cost of having to go through uncontrolled public Internet to get search results for my testing.

Note that the test environment is not the most ideal – the test client, the server, and the cache
were all running on the same box. The box is a quad-core Xeon with 12GB of RAM running Fedora 14 (2.6.35.6-45.fc14.x86_64 kernel).

I ran the tests using ab.

ab -k -c 300 -n 200000 {URI}

The tests include the following configurations

  • Render – No IO: Render the page without any IO – this configuration generates HTML from the templates with empty results.
  • IO + Render: Render the page with results.
  • IO – No Render: Fetch results but don’t render – this is an unrealistic case, but it helps highlight the cost of IO vs cost of template processing.

The charts below show requests per second and mean response time.

From these, you can see that Nodejs beats Play on performance as well as throughput. However, in the pure IO case, I would not discount non-blocking IO on the JVM. I plan to post more results dealing with IO + computation scenarios.

The charts below show the percentage of requests completed within a certain amount of time in msec. The shorter the bars the better. Also less variance as you read from left to right on each chart is better – I would ignore the last set of bars on the right (time to complete 100% of the requests) as they may contain outliers.

When the workload involves generating HTML from templates off the file system without performing any other IO, nodejs does twice better than JVM based Play. As we introduce IO, performance across the board suffers, but more so with blocking IO on Play. But Play is able to catchup with non-blocking IO (via continuations).

I’m unable to make the source code for the test apps available publicly at this time. But I plan to create and post some new tests on github soon.

Write a Comment

Comment

149 Comments

  1. how many cluster workers? NODE_ENV=production? seems like lots of information is missing… I can get better numbers than this on my machine, but if you leave NODE_ENV to development with Express view rendering will be much much slower

  2. I should note as well ejs is not highly optimized due to the use of with. I don’t know of any faster for node, but it would be easy to write something like mustache that is very very fast, the restricted syntax would allow removal of the slow with(){}

  3. “Any idea why the rendering phase is so slow compared to Node? If that was solved, Play! would be a clear winner when using NBIO.”

    @Frank: I’m going to find out this week, and try to post an independent test.

  4. BTW, what does that “IO” consist of that is referenced in the article? I think it means “backend IO” but it’s not clear (at least to me) what it means?

    • @Ile: The same – it involves making an HTTP GET request to get JSON-formatted response through a proxy cache (Traffic Server) using play.libs.WS. I chose Traffic Server because, from my experience at Yahoo!, it handles large number of concurrent connections very well.

  5. hi,
    this test is a good initiative.
    but i have got some remarks:
    -about the jvm:
    can you publish the arguments of the command line used to launch the jvm?
    which garbage collector is used? have you done some tweaks on the jvm?

    is it an openJDK jvm, or the sun/oracle jvm (‘java -version’ can help you).

    -about the charts:
    your charts are confusing, because the Y axis does not start at 0.
    for example, in the second chart (render no io), it seems that play! has got a mean time per request close to 100 msec, and nodejs close to 55 msec: there is a ratio of 2.
    but your chart implies a ratio of 10, because the y axis start at 50 and not 0.
    so, the y axis display 50 msec, with 5 for nodeJS and 50 for play!, in case of (50+5) for nodeJs, and (50+50) for play!

    despite these two remarks, thank you for your this informative post.
    best regards, charles.

    • @Charles

      - JVM: The VM is OpenJDK

      java version “1.6.0_20″
      OpenJDK Runtime Environment (IcedTea6 1.9.7) (fedora-52.1.9.7.fc14-x86_64)
      OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

      - VM settings – defaults set by Play. Note that neither CPU nor memory was too busy to warrant any tuning.

      - Regarding the graphs, those were generated by Google docs. See http://tinyurl.com/3vgvfw2 for the raw data.

      • In production mode, Play will use the ‘server’ JVM that need a pretty long time to warm up since it uses a lot of dynamic analysis and just in time compilation during the first requests.

        Do the results change if you run your tests a second time? (without stopping the servers).

  6. Hi,
    thanks Subbu to publish the google docs used to build graphics.
    i’ve copied the spreedsheet, and updated the graphics with an ‘y’-axis origin starting from 0.
    informations given by these graphics seems more impartial.
    here is the link of the updated spreedsheet:
    http://tinyurl.com/62p67yj
    are you ok to publish the updated graphics ?
    best regards,
    Charles.

  7. Hi,

    There has been some optimizing in Play 1.2 – some related to groovy-template-rendering, so you might try to rerun the test using Play 1.2

    -Morten

    • yeah, me too!
      Play has seen a lot of development in the past couple years, and so did node.js.
      I’m eager to see a comparison.

  8. Yes, me too

    Play2 already has a RC4 version, so I guess a public release will be ready pretty soon. They rewrote the core in scala, with akka, and the templates are now statically compiled, so I think it should perform better

  9. It looks like the Play execution pool was left at the default setting [1], meaning play requests would have been processed over 5 threads. This is already a worker concurrency deficit over the 8 allocated for the node app on top of that.

    Being that the workload is more IO than CPU oriented, and that threads are in ways a higher level abstraction over event loops [2] it would be interesting to see the benchmarks on an even playing field (8 worker threads). I also suspect at this level CPU usage at max throughput on the play app would still be quite low, you could probably increase this number quite a bit (16? 32?) and allow the thread scheduler to handle the non blocking IO. This should easily give you a much higher level of throughput, while still keeping a nice, non-callback oriented programming model.

    [1] https://github.com/s3u/ebay-srp-play/blob/master/conf/application.conf#L172
    [2] It’s a rant, but it covers the area pretty well: http://teddziuba.com/2011/10/straight-talk-on-event-loops.html

  10. I created a Play 2 version of this app:
    https://github.com/jamesward/ebay-srp-play/tree/play2

    The WS parts don’t work yet. But the basic index page is testable. On my laptop the compiled Scala templates gets about the same requests/second as the Groovy templates in Play 1.

    I also tested the Play 1 app with just a very minimal server-side template and it went from about 2800 requests/second (the original index page) to 9155.21 requests/second. So there is definitely a lot of overhead as the complexity of the template increases.

    Interesting stuff.

    -James