An Intervew with Ted Ts'o

Congratulations to Theodore Ts'o, who was just awarded the 2014 LISA Award for Outstanding Achievement in System Administration at LISA14. We recently sat down with Ted, and had the chance to ask him a few questions.

JBM: Why did you start teaching, and why Linux performance tuning?

The first class tutorial class that I taught was “Kerberos and Network Security”.   At the time there were a lot of people who were interested in learning about how Kerberos worked, especially after Microsoft announced they were going to be adopting Kerberos for their network security protocol.   At the time, I was the tech lead of the MIT Kerberos Development Team, and so my mentor, Jeff Schiller, and I were the best people to teach a two day class so people could understand the basics of cryptography as it applied to network security, and how Kerberos worked.   I found that I liked teaching, and Jeff and I ended teaching that class not only at Usenix, but also at the Microsoft PDC and even to some Microsoft field engineers as an internal class.    I later ended up teaching classes about Kerberos, network security, Linux, and file systems not just at many Usenix conferences, but also on several Geek Cruises, where I would teach classes while a cruise ship was at sea, on various cruises of Alaska, the Carribean, and the Mediterranean Sea.   The Geek Cruise concept faded away at the end of the Dot-com bubble, but those were good times while it lasted!

The history behind the Linux Performance Tuning class was that originally, I taught a class about the Linux Kernel Internals.    But the Linux Kernel was constantly changing, and so keeping that class up to date was a challenge; in addition, what I found was that while most of the attendees were interested in what the latest developmernt kernels looked like, what they really needed was information about the version of Linux found in the Red Hat or SuSE distribution they were using in their data center, and in particular, the most common problem they needed to solve were performance related.    So I retooled the class to be much more focused on what system administrators really needed.

JBM: You’re currently working with file systems and storage at Google. What do you want every sysadmin/engineer/IT professional to better understand about your field and/or ext4?

There’s something which I like to call the Highlander Myth of File Systems --- “There can only be one!”    That might have been more true in the days of time sharing systems, when a single system might be used for many users and many programs at the same time.  However, today, most servers run only one application, and every file system has its strengths and weaknesses.   One workload may be better on file system A, while for another workload, file system B will be better; there will always be tradeoffs in file system designs.   As a result, statements such as “the last word in file systems” are more marketing statements than signs of engineering or pratical sysadmin truth.

A related observation is that file system benchmarks can be horribly misleading.   Most of the time, the benchmark will not be a good representation with how your applications will be using the file system, and the hardware used by the benchmarker will probably not be representative of your hardware.   Even if the software and hardware a good a benchmark, the performance of a freshly file system may be very different with an aged file system which has been in use a long time, or which is almost full.   This effect is especially pronounced on copy-on-write file systems such as btrfs, f2fs, and zfs.   Fortunately, in many cases, the file system performance is not the bottleneck, so often it makes sense to use the file system that you’re most familiar with, or which has the more mature user space tools.

JBM: The software that you’ve helped to create is used around the world (and beyond!) from Antarctica to the International Space Station. Tell me about some of your most amazing, head-shaking, experiences when users detailed how they’ve used the tools you’ve helped to create.

I must be a little jaded, because Linux is so ubiquitous that in almost any environment where you might include a computer, whether it be in a wearable device, or a mobile handset, or a server, or an embedded system, it’s very likely that Linux gets used.   One of the use cases of Linux that I do think is very cool is the Linux real-time patchset.   Linux was originally designed to be a desktop OS, and it then expanded to be an enterprise server OS, and then it grew to encompass the embedded and mobile space --- but I don’t think I could have ever anticipated initially that Linux would make inroads into use cases that were traditionally the domain of hard real time systems.   So the fact that Linux with the real-time patchset is being used to control laser-wielding robots, or the missile fire control systems on the US Navy’s next generation destroyer, amuses me to no end.

JBM: With the rise of automation via configuration management, release engineering, and package management, we’re in the midst of a fascinating evolution in system administration. Do you envision your research and work changing over the next 3-5 years?

There is an oft-quoted metaphor which compares treating servers as pets versus cattles.   In the server as pets world, servers are unique and “lovingly hand-raised and cared for”, and when they get ill they are nursed back to health.   In the servers as cattle world, when a server gets sick, you shoot it and get another one[1].

This observation is not new.   Large-scale system administrators will realize that this strategy is decades old.  For example, “The Anatomy of an Athena Workstation”, presented at the 1998 LISA conference, describes a system that had been in use since the early 1990’s.

However, the rise of the cloud has made this change almost impossible to ignore.   In addition, two decades ago, even in the largest deployments with thousands of client workstations, typically file servers were often provided on a single highly reliable, high performance, and extremely costly server --- whether it be a NFS server with a large RAID array, or a NetApp filer, or a multi-million storage appliance from EMC or IBM.    But now, the same principles of using a large number of scale-out servers, which are individually less reliable (with perhaps only 99% uptime instead of paying $$$$ for 99.9999% uptime), are also being applied to storage systems, and the rise of cluster file systems have in turn very different requires on file systems.

For example, a large number of the contributions to the ext4 file system over the past three or four years have been coming from companies which have been using ext4 as the local disk file system for some kind of cluster file system, whether it is Google’s cluster file system, or a Hadoopfs file system at a company like Tao Bao.   For that use case, RAID isn’t terribly important, because the aggregation of a large number of disks (and storage servers) happen at the cluster file system.  Similarly, because the cluster file system handles data reliability, either by using checksums and replication of objects on multiple servers, or using reed solomon encoding, using the end to end principle, it’s not necessary, and in fact a waste of resources, for the local disk file system to try to provide those services at its level of the storage stack.

JBM: Are there any cool tips or tricks that you haven’t been able to integrate into your training sessions, but would like to share on the blog?

Hmm.   The e2image program is a relatively new addition to the e2fsprogs suite of programs.   It was originally intended so that users could bundle up just the metadata portion of the file system so they can send it to the e2fsprogs developers as part of a bug report.   But it can also be used to only copy the metadata and in-use data blocks of a file system image from one location to another, which is really handy when manipulating disk images for a virtual machine.

JBM: What do you consider your biggest achievement?

I don’t know if it’s my biggest achievement, but one of the things of which I’m very proud is that fact that e2fsck (the file system consistency checker for ext2, ext3, and ext4) is the only fsck program that has a regression test suite.   Whenever someone reports a bug or a file system consistency that e2fsck doesn’t handle properly, I make sure we create a new test which demonstrates the problem, and then fix the bug.  That way, we can make sure there are no regressions later on, and it helps increase the overall code coverage.

Something else of which I’ve been quite proud is the turn-key regression tester, kvm-xfstests, which allows ext4 developers to very easily run smoke tests and full test suites very easily.  I’m very strongly believe that time spent in creating better tools for developers and system administrators, whether it’s a fully featured test framework which allow developers to easily run tests, or a debugfs program which allows developers to easily create test cases, and allows experts to manually recover data from a corrupted file system, is time very well spent that pays back ten or hundred times the time invested in creating the tool.