Obvious Choices

by Subbu Allamaraju on June 18, 2008 · 6 comments

One downside of working with experienced people is that, some experienced people tend to be opinionated, and the trouble with opinions is that opinions are, well, instantaneous reactions based on things they learned in the past. You can wake up an opinionated person and ask a question that you know this person has an opinion on, and that person (self included, on occasion) will most likely blurt out that opinion without even understanding the context. Unless careful, it is easy to get trapped into that mindset, and start enjoy being opinionated, and consequently, making or advocating "obvious" choices. Welcome to my soft rant!

Here is one opinion heard several times on the blogosphere. If X says that he/she had performance issues with this new cool website, the "opinionated" would immediately ask, "is that Rails based?" and without even waiting for an answer, would offer his/her expert opinion that "Rails is slow" and that X should use Grails, PHP, or whatever that person is currently a fan of.

No offense to the commenter, but I received such a comment when I blogged that I had performance issues with DreamHost. I did not buy the comment that Rails is slow then, and I don’t buy it now. There are other things that influence scalability and these vary from case to case.

Here is another example. In response to FaceStat scales!, the first commenter asks "Why on earth didn’t you use something like Grails that actually DOES scale?". I doubt if this commenter fully understands the architecture of FaceStat. To me, that is an automatic reaction, and so, I would be suspicious of that "obvious" choice that Grails or some other alternative would magically fix the scalability problems.

Say, for instance, if a PHP or a rails app is slow, an "obvious" choice is to throw memcache. But no amount of caching would rescue a poorly architected app. In fact, MySQL query caching may be sufficient for typical caching needs.

After writing the initial prototype of Cyclogz, I wanted to check if there are similar sites, and if so, how they are approaching the problem. The problem was simple – retrieve a large amount of data from a GPS, and then extract lots of info from it for both presentation and analysis purposes. I came across two sites (one of which is MotionBased, owned by Garmin). Both these sites did what seemed like an "obvious choice". Both sites upload all the data (in megabytes) to a server, store it in the database, and then kick off a background process to analyze and extract the data. This approach requires more hardware on the server side, more code to manage the background work, and more importantly, takes away instant gratification from the user. I won’t describe alternatives here in detail, but better and more scalable design choices do exist. It was also fun thinking about the alternatives, and implementing those.

In the REST community, there is a similar pattern. Every now and then someone comes along with a need to do batch processing asserting that making "n" connections is slower than POSTing all requests in a single batch. When told of the problems with this approach, that person would immediately conclude that REST and/or HTTP is broken and that he/she "needs" to do batch processing.

So, if a choice seems too obvious to be incorrect, and is given by a smart but opinionated person, I say, take a step back, question and ponder. The obvious is not necessarily the best choice. The fun in writing software is figuring out those non-obvious choices. Repeating obvious choices is not fun.

{ 6 comments… read them below or add one }

Mike Amundsen June 18, 2008 at 11:33 am

Good Post.

Reply

Will Sargent June 18, 2008 at 5:22 pm

“In the REST community, there is a similar pattern. Every now and then someone comes along with a need to do batch processing asserting that making “n” connections is slower than POSTing all requests in a single batch. When told of the problems with this approach, that person would immediately conclude that REST and/or HTTP is broken and that he/she “needs” to do batch processing.”

So it isn’t slower? Or the slowness isn’t important? I’m confused.

Reply

Subbu Allamaraju June 19, 2008 at 8:00 am

It is not necessarily slower. More importantly, it is necessary to consider a few questions:

a. Optimistic concurrency: How will optimistic concurrency work when you want to batch a number of changes into a single request?

b. How will clients be notified of partial success or failures?

c. What is the client expected to do on partial failures? How will the client know what parts of the request can be resubmitted without worrying about duplicate processing, i.e., is the bulk request idempotent as a whole, or each update within the bulk idempotent?

Some of these may be easy to address in certain specific use cases, but I am not sure if the solution can be generalized.

If these question can be answered well, and if it is proven that the client/server cannot pipeline connections, batch processing over HTTP can be justified.

Reply

Subbu Allamaraju June 19, 2008 at 8:37 am

To add to my previous reply, for use cases needing bulk GETs, proper use of caching can avoid the need for such GETs without losing on performance.

Reply

Anand June 20, 2008 at 11:02 pm

Great post Subbu.

Scalability is a factor of the architecture, not the framework or language – that is the single biggest thing I have learnt.

In my current job [for one more week], we use Java and all the nice stuff only to stumble at the O-R layer for large datasets doing silly things very similar to what twitter guys did initially, badly formed queries for M-2-M relationships and breaking the tiering in a few places.

I think choosing a language / framework is a factor of 4 things.

- ease of development
- raw performance
- maintainability
- hardware requirements

Rails, to me, is the best in 2 out of 4 of those categories. With ruby 1.9 [and 2.0 roadmap], raw performance will at least be comparable to Java, which leaves us with hardware requirements.

Unless you are google and have 10s of millions of servers [even they use python], what difference does it make between using 4 vs 6 servers?

Reply

brorgoHit January 21, 2009 at 4:31 pm

Nothing seems to be easier than seeing someone whom you can help but not helping.
I suggest we start giving it a try. Give love to the ones that need it.
God will appreciate it.

Reply

Leave a Comment

Previous post:

Next post: