Google Scalability Conference
I attended the Google Scalability Conference today, and for the most part I was very pleased with it. Valuable information, food, drink and networking for the cost of a Saturday. Fair trade, I'd say.
There were two keynotes and four break-out sessions, each break-out with two talks. I chose fairly well for my break-out sessions, but luckily they said video of each presentation will be made available on YouTube.
Here is a break-down of what I attended and my thoughts:
- Keynote I: MapReduce, BigTable, and Other Distributed System Abstractions for Handling Large Datasets by Jeff Dean, Google, Inc.
This was a great talk, giving a fairly in-depth breakdown of the major distributed computing systems that Google uses, from the GFS (Google File System) to BigTable to MapReduce. Working at Google must be like being a kid in a candy store to a dataphile like myself.
- Breakout I: Building A Scalable Resource Management Layer for Grid Computing by Khalid Ahmed, Platform Computing.
I chose wrong on this one... corporate shill. Discussed their computing platform and some of the specifics, but I probably would have been better off in the other session.
- Breakout II: Using MapReduce on Large Geographic Datasets, Barry Brummit, Software Engineer, Google, Inc. / Google Talk: Lessons in Building Scalable Systems by Reza Behforooz, Google, Inc.
Two for the price of one in this breakout. Barry's talk was great, he gave some real examples of how they have used MapReduce to solve problems that would be much more difficult through other means. I get the feeling that MapReduce is a hammer for Google engineers to pound away at anything with, but from what I can tell, any lost efficiency by using the wrong tool is made up for in computational scale. He also mentioned Hadoop, which is an open-source MapReduce knock-off. I'll have to take a look at that.
The second half was also interesting, with some interesting scaling issues they've had with Google Talk. Their method of phasing in new partners via making all the calls but not exposing the UI elements to test the load while avoiding user experience degradation is so simple, yet such a good idea. I'll have to remember that one.
- Keynote II: [Something to the effect of] Scaling User Experience by Marissa Mayer Google, Inc.
- Breakout III: Scalable Test Selection Using Source Code Deltas by Ryan Gerard, Symantec Corporation.
There was some interesting food-for-thought in this one, but some of the more interesting elements weren't the intended take-home message, for example use a requirements tracking system that associates requirements with tests, in addition to your defect tracking system. Oh man I wish we were doing that now. I'll have to remember that one for the future. Other than that it was interesting to hear him talk about building this optimized test-what-you-need as fast as possible system when his team has a test suite that only takes two hours. I guess they are planning for the future.
- Breakout IV: Challenges in Building an Infinite Scalable Datastore, Swami Sivasubramanian and Werner Vogels, Amazon.com.
I really enjoyed this one, it may have been my favorite breakout session. Werner took the corporate part of the talk and discussed the SLAs and the overall aspects of Amazon Dynamo, and Swami took the technical details. Swami presented us with the most technical (of the sessions I attended) piece of the day, and I really appreciated that. I was hoping to see more of this at the conference today. I will definitely be taking a close look at what S3 and EC2 can do for me and my future plans.