LISA14 - Day Five

Day (counts fingers) 5 started out with Gene Kim's presentation on DevOps. It's good stuff, and he's summarized things very well. One thing I asked him about was whether he and his collaborators will publish the data set and methods used for the PuppetLabs "State of DevOps" report; there are a lot of strong findings there, and it would be really good to have those replicated by other people. The answer was yes, they'll put it on their backlog (ha! Gene Kim uses Kanban); it was punctuated by a strong thumbs up and a shout of "YES, PEER REVIEW!" from LISA14 chair Nicole Forsgren Velasquez (who did the analysis).

After that I sat down in the LISA Labs for the first time; Caskey Dickson was doing Office Hours on "The State of Monitoring", and it was really, REALLY incredible to be able to pick the brain of an expert. In the process of answering questions like "What monitoring should I use?" he recapped his presentation from last year's LISA, which I need to find a link for; it breaks down "monitoring" into seven different things, and shows how (nearly) the tools we have tackle one, two, maybe three categories -- but not all. (You should read that now.) The exception is Shinken, which is unique because it explicitly attempts to solve all the problems; "everything you see in Shinken comes out of someone's pain point." He likes OpenTSDB a lot, wishes there was standardized communication between Nagios/Graphite/Logstash/Sensu/Ganglia, and pointed me to Twitter's blog post about monitoring. This was an incredible opportunity, and I was scribbling notes as fast as I could.

Next up: Kyle Brandt's presentation on Bosun, the new monitoring framework from Stack Exchange. There's a TON of stuff that's cool about this, but briefly: it treats tweaking alerts as a code development problem. That means you want fast iteration, tests (which are done against previously logged data), and frictionless changes. It'll send informative alerts, bundle up alerts (if the web server's down, who cares that the power supply, hard drive, eth0, SNMP, and CPU load checks are all returning UNKNOWN?), and you can do a crapton of math when deciding whether or not to alert. On top of that, you can use that math to do predictions and alert on that -- not "HALP the hard drive's full!" but "Whoah...if this goes on, in eight hours that hard drive's probably gonna be full." The agent runs on Windows and Linux -- both are first-class citizens. I'm very, very interested in this.

I buttonholed one of the folks from PuppetLabs in the hall to clarify a point in my n00b head, then took a break to write up notes. I heard a strong recommendation to watch the Librato talk -- sad I missed it, but I got to meet Dave Josephsen as he was doing office hours in the LISA Lab. It was great to meet him in person -- he's a very, very good writer, and his monthly column on monitoring for ;login: is wonderful. He'd just been to the Nagios World conference, so I got to ask him about the state of things there -- again, clarified a few things. I met Nicole Forsgren Velasquez and got to thank her for the stats course and her work on LISA this year.

I'd just like to point out that this is one of the great things about LISA. You run around and you bump into these people who are experts in their field, and you can say things like "Oh, there's this problem that's been bugging me..." and they say "Oh, yeah, there's this fix". Or "Yeah, that's a real problem and no one has figured that out yet" -- and that can be even more valuable.

But another, equally wonderful part is the chance encounters with people that leave you agog at how large our profession is, and how many fun problems there are to be solved. Like the admin I met who used to work at a farm analytics company. Me: "I'm sorry, a what now?"

That's right. Farms want to make the most efficient use of resources like water and soil. So you don't just go out, spit in the ground and say "I'm planting 40 acres of corn right here." No, you get sunshine and precipiatation data from the last four years; you get data on UV light from drone flyovers of your land; you plug in soil data and what your crop is. And then you send all that data to a services company that fires up a Hadoop cluster in AWS to crunch that data, and it spits out a watering plan for the next year. It takes into account the irrigation system you have, how fancy it is and how many controls it has, and what kind of leakage you've experienced before. And that is how you do it.

I would never in a million years have guessed this. But every single LISA I've been to has had a moment like this, where I have to scrape my jaw off the floor and then scribble down notes about this thing I just found out that people do, apparently. I love this. I love being able to peek behind the curtain to see what's going on that I never knew about.

And I also love riding the Monorail with the folks from the EFF, and having them getting a picture of me and USENIX wonder angel Julie Miller:

I'm way behind on this post and there are a dozen things to do, so I'll have to skip over the reception -- except to say it was awesome, and kudos to the organizers for picking such a great venue.