MessagePack Anyone?

by Subbu Allamaraju on June 9, 2011 · 22 comments

I came across MessagePack from multiple sources in recent weeks. MessagePack claims to be a "binary-based efficient object serialization library". The bar chart on MessagePack’s site claims that MessagePack is four timers faster than JSON serialization and deserialization.

Really? If true, it is able to optimize for both speed and space at the same time. I needed to find out. I did a quick test using @pgriess's node-msgpack implementation for Node (if you’re using recent versions of Node, use the fork https://github.com/jmars/node-msgpack in stead). Here are the numbers for 18k of JSON for a total of 10000 test runs on Node v0.4.8 on Ubuntu.

msgpack pack:   12705 ms
msgpack unpack: 1668 ms
json    pack:   1575 ms
json    unpack: 1271 ms

For this size, MessagePack’s serialization is roughly 8 times slower than v8′s JSON serialization. Here are the results of a very small piece of data (48 bytes – same as the one used by Peter’s bench.js) for 10000 runs.

msgpack pack:   108 ms
msgpack unpack: 69 ms
json    pack:   31 ms
json    unpack: 19 ms

In other words, I’m unable to reproduce Peter’s numbers. The differences are likely due to changes in Node and v8 since the time Peter did his tests. You can try it out yourself by doing the following:

git clone https://github.com/jmars/node-msgpack
make
# before the next step, edit bench.js to require lib/msgpack
node bench.js

This is no good. There are other features that concern me as well. It is not clear why the developers of MessagePack chose to bundle together RPC (an application issue), pipelining (a protocol level issue), connection pooling (again an application issue) etc with a format design.

{ 21 comments… read them below or add one }

Frank Denis (@jedisct1) (@jedisct1) June 9, 2011 at 3:51 pm

MessagePack 9 times slower than v8′s JSON serialization? http://www.subbu.org/blog/2011/06/messagepack-anyone

Reply

Enterprise4J (@enterprise4j) June 9, 2011 at 5:31 pm

MessagePack Anyone?: subbu.orgI came across MessagePack from multiple sources in recent weeks. MessagePack claim… http://bit.ly/mDVBOx

Reply

cowtowncoder June 9, 2011 at 5:33 pm

I think performance may not even be the biggest challenge — although Java version appears to be about same speed as latest Thrift or fastest JSON codecs (and slightly below protobuf), its lack of documentation is nasty.
Trying to actually use data binding resulted in lots of cursing, because if you don’t know to add all those @Optionals for data that may be null; or register bean types (or annotate them), you will get seemingly random exceptions.

And I completely agree with concern over bundling unrelated concerns, starting with bundling of serialization/databinding with other aspects.

Reply

WayneB June 9, 2011 at 6:39 pm

I like bundling as long as it’s modular. Give me an end-to-end system that works well for general purpose and don’t make me think too much so I can focus on the problem that I’m actually trying to solve, not which pipelining technique I’m going to use. If it matters, I’ll switch out that module later.

Reply

Subbu Allamaraju June 9, 2011 at 7:08 pm

No, not those kinds of features. RPC pushes you to the dark ages of distributed systems. It’s role is very limited. Moreover, pipelining belongs to HTTP and that gets you interoperability. Connection pooling is again related to persistent connections in HTTP. There is no need to reinvent those features.

Reply

Flinn June 9, 2011 at 7:42 pm

It’s great that node json serialization is fast (it should be) but that doesn’t make it fast for any other systems that it has to interoperate with.

As far as the comparison of a binary protocol+connection pooling+event handling to http+pipelining that’s silly. Your own writing points out the problems with pipelining.

MsgPack is packaged separately from MsgPackRPC. The fact RPC adds value to the format is reason enough to bundle them in the same project, I really don’t see that as a legitimate complaint.

Reply

Subbu Allamaraju June 9, 2011 at 7:54 pm

> It’s great that node json serialization is fast (it should be) but that doesn’t make it fast for any other systems that it has to interoperate with.

Agreed. However, the claim about speed is misleading. That claim is what caught many people’s attention.

> As far as the comparison of a binary protocol+connection pooling+event handling to http+pipelining that’s silly.

You’re taking that out of context. Reinventing the wheel is not necessary.

Reply

Cowtowncoder June 12, 2011 at 9:01 pm

But JSON is rather efficient on many many platforms; not just on node.js.
On Java, for example, it is competitive with msgpack, as well as all other binary protocols, even regarding performance.
In fact, all measurements I have seen that are done by people other than msgpack authors cast doubt on msgpack’s performance claims — where are independent verifications of alleged performance superiority? Or even details on exactly how authors have done their tests? On what platform, comparing against which other libraries?

Reply

Flinn June 14, 2011 at 6:46 pm

First – @Cowtowncoder, can you give a specific example of speed claims from the author that aren’t substantiated? Here Subbu’s questions a test that isn’t even from the MsgPack author, rather from the author of node-msgpack. In fact this post tries to call into question the Message Pack author’s claim of 4x over Protocol Buffers and 12x over JSON by running a totally different benchmark on a totally different platform?

The primary claim from the MsgPack author is right on the MsgPack.org homepage:

“In this test (source code), it measured the elapsed time of serializing and deserializing 200,000 target objects. The target object consists of the three integers and 512 bytes string.”

The source code links here (not the node-msgpack benchmark):
http://msgpack.org/releases/benchmark/msgpack-protobuf-json-speed-test.tar.gz

Second – The comment on the RPC bundle is silly. Serialization formats are very commonly accompanied by RPC solutions. Avro, BERT, Thrift, Protocol Buffers all either explicitly provide RPC interfaces or suggest the use of protocol definitions as RPC interfaces. The author is promoting REST as an alternative but it’s been my experience that message serialization and RPC is more performant but less flexible.

Reply

Subbu Allamaraju June 14, 2011 at 6:50 pm

“The author is promoting REST as an alternative but it’s been my experience that message serialization and RPC is more performant but less flexible.”

Do you have numbers to back this up? I’m curious because I’m aware of systems with efficient connection management in place but still using HTTP as a transfer protocol.

Reply

Flinn June 15, 2011 at 3:32 am

I can’t point to many HTTP implementations that are slower and do not implement efficient connection management. I’m not saying HTTP by nature is slow but I am saying binary serialization+RPC protocols such as Message Pack are generally more efficient than HTTP and therefor more performant. I will say though HTTP is far more open, enjoys ubiquitous adoption and certain implementations are very performant.

Flinn June 15, 2011 at 3:35 am

That should be “I can point to many HTTP implementations that are slower and do not implement efficient connection management.” sorry too early.

Subbu Allamaraju June 14, 2011 at 7:02 pm

“The primary claim from the MsgPack author is right on the MsgPack.org homepage:”

Understood, but claims should not be made based on such a limited test.

Reply

Flinn June 15, 2011 at 3:21 am

Neither should claims to the contrary. There are actually a number of other tests besides the one quoted on the home page, which by the way your results do not invalidate.

The point is you can’t call the whole serialization protocol into question because you can’t verify a single third party implementation. You also note this is likely due to some improvement in node.js since the original test. To follow that acknowledgement up with “This is no good.” implying the whole project is bad doesn’t make sense.

The only thing you can say with your test is, someone implemented a node.js Message Pack binding some time ago and it was faster than JSON. According to my test that may no longer true.

Harsh June 10, 2011 at 7:07 am

What’s your take on Apache Avro then?

Reply

Subbu Allamaraju June 10, 2011 at 7:57 am

Avro is meant for large datasets, but I would recommend running https://github.com/eishay/jvm-serializers/wiki/ which now includes MessagePack to see how it fares against Avro and others on the JVM.

Reply

Ikeda Takafumi (@ikeike443) (@ikeike443) June 12, 2011 at 3:46 am

messagepackは遅い、という記事だけど僕には評価できない / MessagePack Anyone? http://htn.to/4D59gZ

Reply

Shantanu Kumar (@kumarshantanu) (@kumarshantanu) June 13, 2011 at 12:32 pm

Subbu Allamaraju refutes MessagePack’s performance claim over plain old JSON http://j.mp/khuOkH

Reply

Oleg Sukhoroslov (@osukhoroslov) July 18, 2011 at 10:24 pm

http://bit.ly/nxISc8 (MessagePack Anyone?)

Reply

Ryan R. July 21, 2011 at 8:56 pm

Has anyone here run the msgpack-protobuf-json-speed-test benchmark on the MessagePack site (src link under chart)? The version of yajl I had to install was 1.0.12 which seems a bit outdated. Maybe better results for JSON with 2.0.1 parser?

Reply

Yi Wang (@Yi_Wang) August 30, 2011 at 2:00 pm

MessagePack Anyone? http://t.co/XoDYXSR

Reply

Leave a Comment

Previous post:

Next post: