in Software Engineering

Obvious Choices

One downside of working with experienced people is that, some experienced people tend to be opinionated, and the trouble with opinions is that opinions are, well, instantaneous reactions based on things they learned in the past. You can wake up an opinionated person and ask a question that you know this person has an opinion on, and that person (self included, on occasion) will most likely blurt out that opinion without even understanding the context. Unless careful, it is easy to get trapped into that mindset, and start enjoy being opinionated, and consequently, making or advocating "obvious" choices. Welcome to my soft rant!

Here is one opinion heard several times on the blogosphere. If X says that he/she had performance issues with this new cool website, the "opinionated" would immediately ask, "is that Rails based?" and without even waiting for an answer, would offer his/her expert opinion that "Rails is slow" and that X should use Grails, PHP, or whatever that person is currently a fan of.

No offense to the commenter, but I received such a comment when I blogged that I had performance issues with DreamHost. I did not buy the comment that Rails is slow then, and I don’t buy it now. There are other things that influence scalability and these vary from case to case.

Here is another example. In response to FaceStat scales!, the first commenter asks "Why on earth didn’t you use something like Grails that actually DOES scale?". I doubt if this commenter fully understands the architecture of FaceStat. To me, that is an automatic reaction, and so, I would be suspicious of that "obvious" choice that Grails or some other alternative would magically fix the scalability problems.

Say, for instance, if a PHP or a rails app is slow, an "obvious" choice is to throw memcache. But no amount of caching would rescue a poorly architected app. In fact, MySQL query caching may be sufficient for typical caching needs.

After writing the initial prototype of Cyclogz, I wanted to check if there are similar sites, and if so, how they are approaching the problem. The problem was simple – retrieve a large amount of data from a GPS, and then extract lots of info from it for both presentation and analysis purposes. I came across two sites (one of which is MotionBased, owned by Garmin). Both these sites did what seemed like an "obvious choice". Both sites upload all the data (in megabytes) to a server, store it in the database, and then kick off a background process to analyze and extract the data. This approach requires more hardware on the server side, more code to manage the background work, and more importantly, takes away instant gratification from the user. I won’t describe alternatives here in detail, but better and more scalable design choices do exist. It was also fun thinking about the alternatives, and implementing those.

In the REST community, there is a similar pattern. Every now and then someone comes along with a need to do batch processing asserting that making "n" connections is slower than POSTing all requests in a single batch. When told of the problems with this approach, that person would immediately conclude that REST and/or HTTP is broken and that he/she "needs" to do batch processing.

So, if a choice seems too obvious to be incorrect, and is given by a smart but opinionated person, I say, take a step back, question and ponder. The obvious is not necessarily the best choice. The fun in writing software is figuring out those non-obvious choices. Repeating obvious choices is not fun.

Write a Comment

Comment

  1. “In the REST community, there is a similar pattern. Every now and then someone comes along with a need to do batch processing asserting that making “n” connections is slower than POSTing all requests in a single batch. When told of the problems with this approach, that person would immediately conclude that REST and/or HTTP is broken and that he/she “needs” to do batch processing.”

    So it isn’t slower? Or the slowness isn’t important? I’m confused.

  2. It is not necessarily slower. More importantly, it is necessary to consider a few questions:

    a. Optimistic concurrency: How will optimistic concurrency work when you want to batch a number of changes into a single request?

    b. How will clients be notified of partial success or failures?

    c. What is the client expected to do on partial failures? How will the client know what parts of the request can be resubmitted without worrying about duplicate processing, i.e., is the bulk request idempotent as a whole, or each update within the bulk idempotent?

    Some of these may be easy to address in certain specific use cases, but I am not sure if the solution can be generalized.

    If these question can be answered well, and if it is proven that the client/server cannot pipeline connections, batch processing over HTTP can be justified.

  3. Great post Subbu.

    Scalability is a factor of the architecture, not the framework or language – that is the single biggest thing I have learnt.

    In my current job [for one more week], we use Java and all the nice stuff only to stumble at the O-R layer for large datasets doing silly things very similar to what twitter guys did initially, badly formed queries for M-2-M relationships and breaking the tiering in a few places.

    I think choosing a language / framework is a factor of 4 things.

    - ease of development
    - raw performance
    - maintainability
    - hardware requirements

    Rails, to me, is the best in 2 out of 4 of those categories. With ruby 1.9 [and 2.0 roadmap], raw performance will at least be comparable to Java, which leaves us with hardware requirements.

    Unless you are google and have 10s of millions of servers [even they use python], what difference does it make between using 4 vs 6 servers?

  4. Nothing seems to be easier than seeing someone whom you can help but not helping.
    I suggest we start giving it a try. Give love to the ones that need it.
    God will appreciate it.