Linux Systems Troubleshooting

Wednesday, October 31, 2018 - 2:00 pm3:30 pm

Thomas Uphill, Narrabilis


In the days of immutable infrastructure, what's the point of troubleshooting anything?

In my experience, when there is a problem with the image/application/container, the immutable infrastructure will just keep recreating the problem. Knowing how to diagnose common problems is still an important skill. Moreover, the machines that control images and containers are longer running than the containers they run, problems on these machines still need to be found and fixed.

In this tutorial we will look at the problems seen with Linux systems. We'll start with how various subsystems work: networking; filesystems; users/groups; and permissions. We'll then move on to look at tools to inspect running systems and of course strace. The focus will be on "off the shelf" tools and how to use them.

Thomas Uphill, Narrabilis

Thomas is a veteran System Administrator who has recently switched to a development role. He's the author of several books on Puppet and has spoken at past LISA conferences on a variety of topics. His primary work environment is Linux and he's a VIM user.

@conference {221822,
author = {Thomas Uphill},
title = {Linux Systems Troubleshooting},
year = {2018},
address = {Nashville, TN},
publisher = {{USENIX} Association},
Who should attend: 
  • System Administrators
  • Devops Engineers
  • Developers who need to support what they release.
Take back to work: 
  • Look at the whole picture when looking for problems
  • Document your steps
  • Know how subsystems interact when trying to figure out a problem
  • Never underestimate basic Unix permissions
Topics include: 
  • Brief overview of how Linux works (high level)
  • Boot problems
  • /proc filesystem and running processes
  • users, groups, limits and basic permissions
  • lsof and open files
  • gdb debugging running processes and core files
  • networking

Experience with Linux, there will be a brief introduction to the topics, but some background will be assumed. Basic programming experience. Some networking knowledge will be assumed.