Tom Limoncelli's LISA 2009 Training: "Design Patterns for System Administrators"
Tom Limoncelli is one of the preeminent system administrators in the world right now. He is the co-author of "The Practice of System and Network Administration", what is considered by many to be something akin to a religious tome, and he has presented training sessions at LISA for several years on the subject of his second book, "Time Management for System Administrators". This year, he is offering a new training course, titled "Design Patterns for System Administrators". Tom and I sat down and discussed design patterns, and how they can help system administrators become better and more efficient in their positions.
Q: What exactly is a ‘design pattern’ and what will it do for us?
Tom Limoncelli: "design pattern" is a little bit of knowledge that is worth sharing, and useful for repeating. For example, the design pattern of "a network" or "active-active redundancy" or "RAID-0" or "helpdesk". Ever visit an IT shop where there was no way for people to get help? I have. So I explained the design pattern of a "helpdesk" and without having to re-invent the wheel, they were able to implement one.
There are things that we do that are hard to explain all the benefits. Maybe there are a few obvious benefits but there are plenty benefits that can't be explained. They can only understood through experience. That's called "tacit knowledge". In an effort to record "tacit knowledge" researchers developed the concept of recording "design patterns": the things that experts do, recorded in a way that others can replicate. They don't have to know 100% of why this is "the right way to do something", they just have to know that in a certain situation with a certain problem, experts solve it a particular way most of the time. It doesn't prevent you from doing it your own way, but gives a guide or instruction book.
For example, in architecture there is a design pattern of putting the men's room and women's room back-to-back. It makes the plumbing less expensive to install, obviously, but think about the other benefits that one might not readily see until after the building is built: putting signs in the building to lead people to the bathrooms becomes easier, men waiting outside the bathroom for women (and vice versa) have an obvious place to stand, etc. When you hear someone say "it just works better", they are describing tacit knowledge.
In software engineering an example design pattern is "the iterator". When you make a data structure that stores multiple things people need a way to iterate over those things. If you don't provide it, people will make their own iterators constantly reinventing the wheel, usually badly.
In system administration we have design patterns such as, "automate what you do often" and "multiple machines with the same purpose should have one configuration". These things seem obvious to experienced sysadmins, but are flashes of brilliance to new people. A friend tells me the story of mentoring a new sysadmin who had to set up a package on 5 machines. On each machine she (the new sysadmin) downloaded the source to a different place, ran ./configure, and installed the software. Obviously one improvement would be to download it to the same place on each machine ("do the same thing the same way on multiple machines"). If each machine was a different operating system this would have been less confusing and eased the installation. However, the machines were all the same architecture, so a better pattern would have been to make a binary package and install it on all the machines. The concept of making and using packages is a design pattern.
Design patterns don't just help new sysadmins. They help experienced sysadmins too. I'm an experienced sysadmin and yet I've never directly had to configure a SAN myself. If at my next job I had to build a SAN from scratch I could guess my way through it based on my understanding of RAID and general storage know-how. However, my hope is that I'm able to find a series of SAN patterns that I can use to guide me.
Q: What do you think is the best way to guard against "Design Patterns" becoming "Cargo Cult Administration"?
Tom Limoncelli: Cargo Cult refers to doing something because you've seen other people do it, even though you don't know why. This can lead to disasters, or at least wasted time.
Design Patterns prevent "cargo cult system administration" because it explains why and when to use a pattern. I'm sure there will come a day when design patterns are so pervasive that we find someone using them blindly. I look forward to that kind of problem. In the meanwhile, don't throw the baby out with the bathwater.
(See Cargo Cult programming (http://en.wikipedia.org/wiki/Cargo_cult_programming) if you've never heard of this term.)
Q: What was it that first made you aware that there were patterns in administration that seemed consistent across a range of disciplines?
Tom Limoncelli: I think I've always known it but I didn't have the words for it. Unix/Linux is great because you can configure things any way you want. You have tons of choices. The negative side of this is there are so many decisions to make. Nobody knows enough to get all those decisions right. What I soon learned that there is a difference between having the ability to do things a million different ways and deciding that only certain ways are worth doing. If you look closely at most vendor-based certifications they're really teaching you the 2-5 configurations that they want to see when you call their tech-support phone line. They've taken a product that can be configured a zillion ways (it's just simple combinatorics), figured out the best configuration for certain situations (usually "small, medium, and large"), and are encouraging people to use those patterns.
In the open source world we're proud of the flexibility we have. It is part of the freedom we want. However, it gets us in trouble. Instead we should be proud of the fact that we have openly shared our experience and and helped other people benefit. The analogy I make is: What would you do if a stranger asked you, "Can I drink this?" and you knew it was poison. Would you say, "Yes" and be satisfied that you are technically correct (there is no law of physics preventing the person from drinking it, the ramifications are terrible but he can do what he wants) or would you say "No, it is poison. Don't do it!". In real life, no sane person would recommend that a person drink poison. In system administration our tools permit us to construct configurations that will come back to bite us. I would rather live in a world where we pride ourselves in sharing our tacit knowledge and experience as freely as we share our source code.
I wish I had understood the power of design patterns when Christine and I wrote The Practice of System and Network Administration. If you read the book, each section describes what we would now call a design pattern, but without the formalism of patterns. A friend recently pointed out that most pattern systems use a template with up to 10 items. He quoted Chin-Hsiang Chang et al., 2008's list of: (a) name, (b) problem, (c) context, (d) solution, (e) forces, (f) rationale, (g) resulting context, (h) relative pattern, (i) examples, (j) and known use. When you have all that data, the decisions become more scientific. (I have to admit that in the tutorial I usually provide only 2-3 of those elements.)
Q: Do you have a favorite pattern that you'd like to share?
Tom Limoncelli: The checklist is my favorite pattern. I find that repetition is the source of all operational efficiency. If we do something over and over, it can be improved a little each time. It can be taught to others. It can be practiced and we can get better at doing it. A checklist is a simple tool that helps us do all that. It helps us record the right way to do something, improve how we do it over time (by updating the checklist), and helps us convey knowledge (teach a person each item in the checklist, now they can do the task). They are portable: You can keep checklists on a wiki page, in a book, or even put one in your wallet. I have a checklist of who to call and what to do in case of certain major outages. I have a wiki page that is a checklist of the training new co-workers receive. In The Practice of System and Network Administration we include dozens of checklists for various situations (the "Look Inside" text on Amazon includes a chapter that is nothing but checklists http://www.amazon.com/o/ASIN/0321492668/tomontime-20)
Using a checklist it reduces errors and conserves brainpower for more important things. Also, once a checklist exists, delegating to junior people becomes easier taking it off their plate entirely (w00t!). Automation becomes easier. You don't have to automate the entire process, just the most painful steps!
Sometimes we don't see the opportunities for repetition because we are too close to the situation. Even smart people miss these opportunities. When we turn chaos into repetition everyone benefits. At a previous company I had a team of highly experienced people that spent half of each meeting deciding when the next meeting would be. They "wanted to be flexible". I put my foot down and demanded that we set aside 10am each Wednesday and cancel if there was nothing to talk about. It saved us 30 minutes each week (multiplied by 15 people) and the people that "valued their flexibility" (whatever the hell that meant) found they could plan better because they always knew when the next meeting would be. They could save their seemingly biological need for "flexibility" for when they did something important (like write code) instead of when they met to talk about writing code.
Q: Are smaller infrastructures going to be able to take advantage of these techniques, too?
Tom Limoncelli: There are patterns that only work in small companies. Then again, most big companies that I've visited are really the amalgamation of many small entities.
Extremely small sites benefit from patterns because it gives them access to knowledge that is otherwise sequestered outside the company. Wouldn't we all work together better if we had a common language for describing what we are doing? In server architecture we have terms like "active-active", "load balancer", and "RAID". Imagine if we didn't have those patterns and had to re-explain RAID every time we used multiple disks to simulate a larger volume in a way that increases capacity, reliability, or both? How powerful it will be to expand our vocabulary to other things like helpdesk management, monitoring, and user-notification!
Q: What would you say is the biggest mistake that admins make in their designs?
Tom Limoncelli: I hope that some day the biggest problem is that people are re-inventing the wheel instead of paying attention to the libraries of patterns that are already available. Sadly, first we need to build those libraries.
Q: In what ways, if any, does "Design Patterns..." tie in with your other training session, “Time Management for System Administrators”?
Tom Limoncelli: Time Management is taught in the morning session and Design Patterns is in the afternoon. That's because one builds on the next. There is a lot of new material in my newly revised Time Management tutorial. The overflow moved into the Patterns tutorial. You'll see a lot of patterns related to checklists, for example, in the afternoon session.
Each tutorial stands alone. You don't have to attend one to understand the other.
Q: How do you decide what topics to cover when you only have half a day to present them?
Tom Limoncelli: I carefully study hundreds of IT shops, lovingly hand-picking patterns by hand based on utility, understandability, ripeness, color, texture, shape and flavor.
I spent 5 years writing down patterns when I saw them. My biggest fear is that I won't be able to get through all the examples. (in that case, people can read the slides)
Q: When we leave the “Design Patterns for System Administrators” training, what new skills and knowledge will we be bringing back to our jobs in the real world?
Tom Limoncelli: Hopefully everyone will have a bunch of new things to try, a dozen new ways to work, and a entirely new way to think about how they do system administration. Most of all, I hope everyone starts recording their own patterns and sharing them.
Tom Limoncelli Blog: Everything Sysadmin
Tom Limoncelli LISA Profile: http://www.usenix.org/event/lisa09/training/tutinstructors.html#limoncelli
Design Patterns for System Administrators Training Program: http://www.usenix.org/events/lisa09/training/tutonefile.html#t11
Register for LISA'09: http://www.usenix.org/event/lisa09/registration/