Site Reliability Engineer

0
Join & Subscribe
Udacity
Paid Course
English
Certificate Available
17 weeks long, 5-10 hours a week

Overview

Master the job-ready skills you need to be a successful site reliability engineer and start designing systems to automate responses to software site issues.

Syllabus

  • Foundations of Observability
    • Get a practical introduction to what observability requires in terms of people and tools. Learn about site reliability engineering, its roles and responsibilities, and how those differ from other teams. See how the role helps an enterprise improve, discuss associated costs, learn the types of members and about the tools a team may use.
  • Planning for High Availability and Incident Response
    • This course will cover monitoring, high availability (HA) and disaster recovery (DR), infrastructure as code, and database recovery and availability. Learn the basics about SLOs and SLIs as well as how to translate them into queries and finally graphs. Also, learn how to design and deploy highly available databases to AWS.
  • Self-Healing Architecture
    • Learn how to deploy microservices or cloud architecture that is resilient enough to withstand failures, and predictable enough to resolve issues via automation without human intervention. Understand self-healing system design fundamentals, deployment strategies, implementation steps, and use cases. Learn cloud automation to increase the resiliency of systems.
  • Establishing a Culture of Reliability
    • Learn how to develop processes and frameworks that drive workplaces toward putting reliability first by working through the incident management process and how to have effective on-calls. Understand how to perform reliability reviews on various phases of your system, how to effectively manage system capacity, and how to reduce toil.

Taught by

nd087 Nathan Anderson, nd087 Travis Scotto, nd087 Emmanuel Apau and nd087 Sonny Sevin