You are here
An Introduction to R for System Administrators
Thurgood Marshall West
The R programming language and ecosystem constitute a rich tool set for performing system analyses, for communicating the results and importance of those analyses, and for ensuring reproducible and repeatable results.
This tutorial is designed to
- motivate you to pick up R,
- demonstrate useful techniques using R,
- illustrate ways to simplify your life by automating data analysis and reporting
Examples will be based on situations that the instructor encountered during routine system operations. Additional exercises and data sets that students can explore following the workshop will be provided. The instructor will be available in the LISA Lab after the workshop.
System administrators who are awash in operational data and want to do a more effective job of understanding their data and communicating their findings should attend this class. Prior knowledge of R is not required, but if you are already working with R, you are welcome! Facility with programming and a knowledge of basic descriptive statistics will be assumed.
This introduction to R and its ecosystem provides a walk along the R main line—coming up to speed on R, accessing data, analyzing data, and getting the message out. The key points include:
- Acquaintance with R, R packages, and R Studio
- Understanding where R fits into the system administrator’s tool set
- Familiarity with basic R data manipulation techniques
- Basic principles for ensuring reproducible and automated analyses
- Motivation to learn or improve your R skills
- Next steps in mastering R
- Introduction to the R ecosystem (R, R Studio, CRAN)
- Why should you consider R?
- The R programming model: functions, tables, and packages
- The basic data analysis workflow
- Reading and writing data from files and pipes
- Data frames and data frame manipulations
- Using the plyr and dplyr packages to slice and dice data
- Using the ggplot2 package for graphing
- Overview of the R package system
- Other useful R packages
The following software packages should be installed in advance of the tutorial:
R: Version 3 or later from CRAN or as supported by your OS. Some distributions have packages already available; others have both pre-compiled binaries and source code available from CRAN). R requires about 160MB (installed) on Mac OS X. Source code is available. License: GNU General Public License (GPL).
Optional: R Studio. Requires R, plus an additional 305MB on Mac OS X. Binary installations are available for Ubuntu, Fedora, Mac OS X, and Windows. Source code is available. R Studio does not appear to be supported for the BSD distributions. License: GNU Affero General Public License.
Once you have installed R and (optional) R Studio, you can download contributed packages. Class demonstrations will use plyr and ggplot2. Other packages will be introduced as needed.