TSS recently had an active thread on Why most large-scale Web sites are not written in Java. This is a provocative title and naturally caused a lot of passionate readers comment on this thread. The thread was started in response to a posting by Nati Shalom, which seems to have been prompted by a similar posting at highscalability.com. While I don’t disagree with the analysis presented in Nati Shalom’s post, I don’t think Java, the programming language, or JEE is to be blamed. The key driver behind most successful large scale web sites is that they are designed to perform, often taking the most unorthodox approaches possible towards scalability, and most JEE developers have neither the incentive nor the experience to make their sites scalable. The techniques used by large scale web sites often seem bizarre to JEE web developers.
For the record, I have worked for over seven years for BEA Systems until I joined Yahoo about a month ago, and have great respect and admiration for BEA’s middleware stack. Given the complexity and general-purpose nature of JEE specifications, WebLogic Server does a great job in providing a scalable and manageable runtime for general-purpose enterprise applications. Mark general-purpose here. Highly scalable sites are not general-purpose and you, the developer or the architect, is to be blamed if your site is supposed to scale but is not scaling for not making the right choices.
Of course, JEE and layered frameworks have their share of bad choices when it comes to running large scale web sites. Here are a few that come to my mind:
- Sessions: HTTP sessions don’t, in general, scale. However, most JEE based applications including those using other frameworks like Struts, JSF etc heavily rely on sessions. Developers who learn their web development skills via a framework like Struts or JSF automatically rely on sessions without realizing how sessions impact performance and scalability. Add the cost of fail-over to this. Most commonly used fail-over strategies rely on replicating in-memory data over a cluster of nodes. This is not cheap. The solution to provide better fail-over experience is not replication, but designing the application such that it does not keep interesting data in memory in the first place. Now, when an application is running on a framework like JSF, the problem gets worse – you now have your own application data plus all the data that the JSF implementation keeps around to run JSF. The problem gets worse here. So, it is important to know not only your applications needs for things like sessions, but also the needs of any underlying frameworks.
- Deployment model: The web deployment model in JEE needs some serious overhaul. Over years JEE web development has become less and less productive mostly because iterative development has gotten worse. Application server makers and tool makers are spending a lot of money and time to improve the iterative development experience, but I suspect that the problem is more fundamental and tied to the structure of web applications in JEE. Here is a simple example. In a typical LAMP deployment, you don’t here things like redeploy or restart where as a typical JEE web developer does these tasks a zillion times during the development process. In a LAMP deployment, you keep making the changes to your PHP scripts but never worry about restarting your Apache server. Neither are you required to hit a button someplace to take down currently running application and replace it with an updated one. The current JEE web deployment has gotten so complex that some large companies maintain armies of developers to make sure the artifacts are moved from development to staging to production systems. It is time to rethink the current model at a fundamental level.
- Too many things: Application servers have gotten monolithic and complex for two reasons. JEE has too many specs to support starting from servlets and JSPs to JMX to JAX-WS. Not every application needs all these. Secondly, application servers have to deploy and manage large applications, and over time, this deployment and management infrastructure itself becomes a large piece of the application server runtime. So, on one end of the spectrum we have small lightweight web containers with less emphasis on application management, while on the end of the spectrum we have heavy application servers with everything bundled. Of course, application servers should modularize their runtime so that each usage scenario could use the right set of modules. Although application server vendors tend to offer versions like Tall, Grade and Venti, unfortunately, such offerings don’t meet the needs in real world. They miss the bigger picture and often focus on licensing and pricing not realizing that what is needed is a LAMP-like loosely coupled modularized runtime and not a 400 MB monolith with license restrictions on components that you don’t want to use.
Is the LAMP environment free of its own limitations? Of course not. It has its own irks, but that is not the point. The key point is that building large web sites requires a different mindset than what we typically find in typical JEE architectures. Nati Shalom mentions a few things like caching, data partitions (sharding) etc. These are true. At a more fundamental level, what is required is the willingness to take unorthodox approaches to developing software and avoid typical design patterns and abstractions. Here are some examples:
- Pull out the required data as fast as possible with as few steps as possible to render a page. At the end of the day, rendering the page as fast as possible improves scalability while introducing a zillion layers of abstractions in terms of O-R mappings and other such features do the exact opposite. In fact, if you are interested in scalability, fire all the developers that speak passionately about design patterns, object-relational mappings, and such frameworks. All those are good in theory, but don’t necessarily help serve pages faster.
- Ready to change course or dump solutions and invent your own for performance reasons. When I was at BEA, after some performance considerations, we threw away JAX-RPC in favor of lower-level web service APIs (including DOM and SAAJ) to pass SOAP messages around. This gave us huge performance boost. JAX-RPC is well-and-good when you prefer OO abstractions over web services, but not necessarily so for performance.
- Relational database is not the only data storage solution. Look for other solutions and be prepared to mix and match. Most large scale web sites like Yahoo, Google, and Amazon have their own versions of distributed file systems and runtimes for churning data. See, e.g., Yahoo’s efforts behind Hadoop and Amazon’s Dynamo.
By no means these are tips for building scalable sites. I am just listing some possible design choices to highlight the point that, once you focus on performance and scalability, it is not difficult to find solutions to meet those demands. It does not matter whether you are using Ruby on Rails, or a LAMP stack, or good old JEE.
One last point. JEE application servers are general-purpose in nature, and often have competing requirements. Besides implementing the right set of JEE specifications and providing ways to deploy, manage and monitor large applications, they also have a lot of mechanisms built in to pardon developer’s mistakes and wrong choices. Just think about sessions again. As a developer you are freed from cleaning up data from sessions since you know that your servlet container is going to clean it up for you. But this extra cushion that the container is providing you comes at some expense because it needs to spend some cycles to monitor sessions and clean them up. This is just one example, and there are several others like this. All such features are required for a general-purpose server runtime. Let me repeat, highly scalable sites are not general-purpose, and they need special considerations for scalability reasons. Technology can help scalability, but it won’t automatically make your code scale.