2 posts tagged “programming”
Monday, Day 1 of OSCON is a tutorial day, composed of two four-hour in-depth tutorials. These days (M, T) cost a pretty penny more than the W-F session days, but can be worth it. Plus it gives you two extra days for networking.
The two tutorials I registered for were 'pthreads Programming' and 'A Taste of Haskell'.
1. pthreads Programming - Adrien Lamothe
I chose this session to branch-out; I haven't done any C in a very long time (college) and even then I didn't get into threaded programming. This was my chance to get a pretty good introduction to it, and the description indicated it would be a hands-on introduction.
It did turn out to be pretty hands-on, but the interesting thing was that Adrien discovered O'Reilly published a brand new book just prior to OSCON about Intel's new threaded programming framework: TBB. He also heard through-the-grapevine that Intel was going to be announcing something during the conference. This led him to make a prediction:
A. Intel's announcement would be they are open-sourcing TBB
B. They wouldn't source the compiler, where much of the speed and optimization occurs.
C. He was being set up as a straw man to allow Intel to show how much easier their framework is.
The verdict? Correct on all three accounts. However, his predictions allowed him to get in the first strike, and thus steal their thunder a bit. He worked the TBB into his presentations mentioning from what he could tell it did seem like a pretty good framework, and the book looked decent as well. He talked about the somewhat empty gesture of open-sourcing the framework without the compiler, and noted that for the government to use threaded applications they must be POSIX compliant.
The tutorial itself was decent, despite a few technical glitches (a power outage, and a ceiling leak, making opportunities for him to build on his straw-man joke) he was able to both give a good introduction to pthreads Programming, but distribute a set of example programs for us to work along with him.
Beyond pthreads, some other interesting topics were brought up, foremost being the future of CPUs and the upcoming multiple core chips. This digressed into a discussion of the Cell chip in the PS3, where one woman (Rachel Madsen) mentioned that she uses it for scientific computing and is able to destroy other multi-core chips with it, claiming it was better for scientific computing than gaming. Interestingly it sounded like the dev kit for the Cell processor is really well liked and those using the system are very pleased, but apparently the legal ledger on the dev kit says you aren't allowed to publish any testimonials about it, so they are barred from soapboxing about it.
Some tidbits:
MySQL's source is a great place to see pthread Programming done well.
The Pthreads Programming book is still the best place to learn about it, despite being 10 years old.
I have a copy of the presentation slides and the sample code, but I'm still looking for a place to host it.
2. A Taste of Haskell - Simon Peyton-Jones
I chose this session because I've been interested in Functional Programming for a while, but haven't had an opportunity to take a class on it. I figured it would give me a good opportunity to bend my brain a bit by thinking about things in a new way.
Haskell is a Functional Programming Language, in a very pure form. It has a compiler available for use, the GHC.
It was a treat to see Simon talk, he really knows his stuff (having written the language) and is a fantastic educator. The sheer brain-power in the room was impressive, too-- the audience was filled with the Who's Who of the Open Source world. I also have to give O'Reilly props here, Simon mentioned he wished he had a whiteboard and not 5 minutes later a whiteboard arrived for him.
* Pattern Matching
This is a really slick tool for functional languages: you have a data structure, and you provide the matching logic that determines what actions to take with your data based on the pattern.
You use it when defining functions to get overloaded behavior: depending on what data you call the function on, different code is executed.
* Laziness
Nothing is done in Haskell until it is asked for. This allows you to do the least work possible to get a result, and work with infinite lists so long as you don't ask for the last item in the list.
* Side-Effects and Monads
Functional Languages have no side-effects, that is, they don't interfere with anything going on in other places in the system. Threads don't need locks because they don't do anything that might impact the other thread. This is all well and good except with no side-effects, you can't actually do anything (output is considered a side-effect). This means that you do occasionally need to do side-effects, but they must be done in a special way, using a monad. Monads are tools by which you can specify things need to happen in a given order.
* Quicktest
Simon spent some time talking about quicktest, which was originally thought of in Haskell and due to its utility is being ported to other functional languages.
"if you are going to sit in the bath for an hour and ponder Haskell, these are a good four lines"
instance (Arbitrary a, Testable b)
=> Testable (a->b) where
test f r = test ( f (arby r1)) r2
where (r1,r2) = split r
Tidbits:
There is a Portland, OR company hiring Haskell Programmers
The presentation was filmed, and the film is available here and here.
When you program in Haskell, your input is the universe, which is destroyed, with a new universe as output. (I guess you should make sure your code works!)
The slides are available here.
3. Erlang BoF - Patrick Logan
That night I attended an Erlang Birds of a Feather session. I was pleased to find that this was more of a tutorial than a session, and Patrick had prepared a code example for us that demonstrated Erlang's concurrency system. I had been interested in Erlang enough in the past to install it and run through the start of the online tutorials, but this motivated me to buy my first Erlang Book. (The book is great so far, btw.)
While Erlang isn't as pure and complex as Haskell, it seems like it has a much larger install-base, and perhaps is a little more pragmatic. If I decide to learn Haskell later, learning Erlang now will probably make it all the easier.
* (In)Variables
Variables in Erlang are not variable. This was a bit bizarre at first, but the more I thought about it, the more it made sense. It is more like math where if you have some function f(x), the function produces output for a given value of x. The variable x does not change for the life of the calculation of f(x). Just like when learning math, you could replace all instances of x with whatever value chosen for x and determine the result.
* Mailboxes
All of the different processes communicate by sending messages to each other's mailboxes. This gives you a nice queue without having to worry about locks or mutexes or any of the other ugly things normally associated with threaded programming. (In procedural languages, that is)
* Hot Swapping Code
This was one of the cooler things; since Erlang was designed for high availability systems (embedded telecommunications), you can actually swap code out while it is running without interrupting any of the services.
I attended the Google Scalability Conference today, and for the most part I was very pleased with it. Valuable information, food, drink and networking for the cost of a Saturday. Fair trade, I'd say.
There were two keynotes and four break-out sessions, each break-out with two talks. I chose fairly well for my break-out sessions, but luckily they said video of each presentation will be made available on YouTube.
Here is a break-down of what I attended and my thoughts:
- Keynote I: MapReduce, BigTable, and Other Distributed System Abstractions for Handling Large Datasets by Jeff Dean, Google, Inc.
This was a great talk, giving a fairly in-depth breakdown of the major distributed computing systems that Google uses, from the GFS (Google File System) to BigTable to MapReduce. Working at Google must be like being a kid in a candy store to a dataphile like myself.
- Breakout I: Building A Scalable Resource Management Layer for Grid Computing by Khalid Ahmed, Platform Computing.
I chose wrong on this one... corporate shill. Discussed their computing platform and some of the specifics, but I probably would have been better off in the other session.
- Breakout II: Using MapReduce on Large Geographic Datasets, Barry Brummit, Software Engineer, Google, Inc. / Google Talk: Lessons in Building Scalable Systems by Reza Behforooz, Google, Inc.
Two for the price of one in this breakout. Barry's talk was great, he gave some real examples of how they have used MapReduce to solve problems that would be much more difficult through other means. I get the feeling that MapReduce is a hammer for Google engineers to pound away at anything with, but from what I can tell, any lost efficiency by using the wrong tool is made up for in computational scale. He also mentioned Hadoop, which is an open-source MapReduce knock-off. I'll have to take a look at that.
The second half was also interesting, with some interesting scaling issues they've had with Google Talk. Their method of phasing in new partners via making all the calls but not exposing the UI elements to test the load while avoiding user experience degradation is so simple, yet such a good idea. I'll have to remember that one.
- Keynote II: [Something to the effect of] Scaling User Experience by Marissa Mayer Google, Inc.
- Breakout III: Scalable Test Selection Using Source Code Deltas by Ryan Gerard, Symantec Corporation.
There was some interesting food-for-thought in this one, but some of the more interesting elements weren't the intended take-home message, for example use a requirements tracking system that associates requirements with tests, in addition to your defect tracking system. Oh man I wish we were doing that now. I'll have to remember that one for the future. Other than that it was interesting to hear him talk about building this optimized test-what-you-need as fast as possible system when his team has a test suite that only takes two hours. I guess they are planning for the future.
- Breakout IV: Challenges in Building an Infinite Scalable Datastore, Swami Sivasubramanian and Werner Vogels, Amazon.com.
I really enjoyed this one, it may have been my favorite breakout session. Werner took the corporate part of the talk and discussed the SLAs and the overall aspects of Amazon Dynamo, and Swami took the technical details. Swami presented us with the most technical (of the sessions I attended) piece of the day, and I really appreciated that. I was hoping to see more of this at the conference today. I will definitely be taking a close look at what S3 and EC2 can do for me and my future plans.