You are here
SRE University—Practical Large System Design
In this class, you will learn about large system design. Truly large-scale systems are still rare, and in a world of outsourcing and cloud computing, it's harder for system administrators to get the opportunity to design large systems. It's even harder to get the design right. Most organizations don't have the in-house expertise to build a large system, and outsource the detailed design to external contractors. If your organization doesn't have the expertise to design a large system, it's unlikely that it has the expertise to confirm that a proposal is fit for purpose and cost effective.
While anyone can wave their hands convincingly and come up with a rough outline of a large distributed system, those who can also fill in the details are highly prized. This class will teach you how to design software systems like Imgur and Twitter, then estimate the hardware needed to ensure you can deliver to an SLA. You will learn how requirements like queries-per-second, multi-site reliability, and data security impact the cost of implementation. You will be involved in classroom exercises in small groups, each with its own Google SRE mentor, while working out these designs.
System administrators, SREs, and DevOps who have some familiarity of distributed systems, server hardware, and systems programming, especially those who would like to work with, procure, or build large distributed systems.
How to design large distributed systems, how to evaluate design proposals, and how to explain such designs to third parties.
- Design patterns for large distributed systems
- Monitoring large-scale systems
- Large-scale design workshop and presentations
- Non-abstract design; taking a design and producing a "bill of materials"
- Designing to fail; how to work around rack, networking, and datacenter failures