Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Attend
    • Registration
    • Discounts
    • Venue, Hotel, and Travel
    • Why Attend?
    • Students and Grants
    • Speaker Resources
  • Program
    • Program at a Glance
    • Conference Program
    • Training Program
    • Workshop Program
    • Conference Topics
    • Co-Located Events
      • URES '14 West
      • SESA '14
      • Puppet Camp Seattle
      • LISA Data Storage Day
      • CentOS Dojo Seattle
    • Activities
      • LISA Build
      • LISA Lab
      • Birds-of-a-Feather Sessions
      • Poster Session
      • LISA14 Expo
  • Sponsors and Expo
    • LISA14 Expo
    • Sponsors/Exhibitors List
    • Exhibitor Services
    • Download Prospectus (PDF)
  • About
    • Conference Organizers
    • Past Conferences
    • Services
    • Contact Us
    • Code of Conduct
    • Original Call for Participation
    • Help Promote

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

why attend lisa?

"Information from LISA helps us push the envelope on automation and scaling, allowing a team of four to manage over 3000 Firefox build and test systems running 15 different operating systems."

Amy Rich, Manager of Release Engineering Operations at Mozilla

"LISA is where professionals share what's hot in designing, building, and maintaining critical systems."

Tom Limoncelli, author, speaker, and system administrator

"LISA is the conference that I send my system administrators to so they can bring the latest tools and techniques back to the rest of the team. Much of our current environment can be traced directly back to LISA."

Cory Lueninghoener, Deputy Group Leader of Production High Performance Computing at Los Alamos National Laboratory

"LISA is where I find direction for evolving the my core professional skills."

LISA '13 Attendee

"LISA is the place where industry best practices and cutting-edge research come together to advance system administration."

Nicole Forsgren Velasquez, Utah State University

"LISA is the conference that I send my system administrators to so they can bring the latest tools and techniques back to the rest of the team. Much of our current environment can be traced directly back to LISA."

Cory Lueninghoener, Deputy Group Leader of Production High Performance Computing at Los Alamos National Laboratory

"LISA is where professionals share what's hot in designing, building, and maintaining critical systems."

Tom Limoncelli, author, speaker, and system administrator

"I keep coming back for the technical content and the personal networking opportunities. I attend for career development."

LISA '13 Attendee

"I use LISA to benchmark the SA activities in my company."

LISA '13 Attendee

help promote

LISA16 CFP button

Get more
Help Promote graphics!

sponsors

Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
General Sponsor
General Sponsor
General Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner
Industry Partner
Industry Partner
Industry Partner
Industry Partner
Industry Partner

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

Site Map

You are here

Home » while (true) do; How hard can it be to keep running?
Tweet

connect with us

http://twitter.com/lisaconference
https://www.facebook.com/usenixassociation
http://www.linkedin.com/groups/USENIX-Association-49559/about
https://plus.google.com/108588319090208187909/posts
http://www.youtube.com/user/USENIXAssociation

while (true) do; How hard can it be to keep running?

Mini Tutorial
Wednesday, November 12, 2014 - 2:00pm-3:30pm

Caskey L. Dickson, Google, Inc.

Caskey L. Dickson, Google, Inc.

Caskey Dickson is a Site Reliability Engineer/Software Engineer at Google, where he works on writing and maintaining monitoring services that operate at "Google scale." Before coming to Google he was a senior developer at Symantec, wrote software for various internet startups such as CitySearch and CarsDirect, ran a consulting company, and taught undergraduate and graduate computer science at Loyola Marymount University. He has an undergraduate degree in Computer Science, a Masters in Systems Engineering, and an MBA from Loyola Marymount.

Description: 

At Google we have more than a handful of servers and must leverage our administration time as effectively as possible. Between custom in-house software and off-the-shelf daemons, there are many parts to running a reliable, distributed, redundant service. Most fundamental is running the software and keeping it running. Through reboots, crashes, upgrades, downgrades, bugs, canaries and outages, myriad forces conspire to end your process and keep it stopped or worse, keep it alive but not functioning.

There exists init, upstart, rc scripts, cron, at and more that provide mechanisms to run programs unattended, but each of them can fail in different ways. When you have dozens or hundreds of servers they will fail in many different ways. I will discuss the obvious and not-so-obvious failure modes of popular packages like upstart and cron, as well as how we’ve worked with and around them to ensure that when we run a daemon it stays running. Some special emphasis will be given to how virtual hosts create new challenges that can trip up launch strategies and services written for bare metal.

Who should attend: 

Administrators who manage fleets of virtual or physical machines that have essential daemons that are managed using automated tools will benefit from the simple and reliable technique described.

Take back to work: 

A simple and reliable technique to run daemons (services) reliably on large fleets of machines that can be upgraded and rolled back in an automated fashion.

Topics include: 
  • Packaging configurations for distribution
  • Process management
  • Recovery from failure
  • Init script design
  • Pitfalls of pidfiles
  • Why daemonization is bad for you
  • Roll forward vs. roll back
  • Canaries and monitoring
  • Log in or    Register to post comments

Back to Conference Program

Gold Sponsors

Silver Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners

© USENIX

  • Privacy Policy
  • Contact Us

LISA is a registered trademark of the USENIX Association.