Site Reliability Engineers must have the right tools and strategies to perform in a technical, fast-paced environment. IBM Cloud SRE is guided by nine competency areas that lead to the successful practice of the discipline:
● Applying Site Reliability Engineering principles
● Operations
● Monitoring and incident management
● Security and compliance
● Compute infrastructure
● Networking
● Storage and data management
● Reliability and resiliency
● Deployment automation
In this first course of the three-part Professional Certificate in Site Reliability Engineering (SRE), you will focus on the first four SRE competencies:
● Applying Site Reliability Engineering principles
● Operations
● Monitoring and incident management
● Security and compliance
NOTE: The remaining five SRE competencies are covered in Course 2: SRE Infrastructure, Resiliency and Deployment Automation.
This course covers approximately 50% of the content required to help you prepare for the “IBM Certified Professional SRE - Cloud V2” certification exam.
If you are interested in pursuing the “IBM Certified Professional SRE - Cloud V2” certification, we recommend that you complete all three offerings of the Professional Certificate in Site Reliability Engineering (SRE) to ensure a successful certification exam experience.
Module 1: Welcome and Introduction
You will cover the following topics:
● An introduction to the IBM Professional SRE role
Module 2: SRE Fundamentals and Terminology
You will cover the following topics:
● Deeper dive into SRE role
● SRE principles
● Managing trade-offs between change, velocity, and reliability
● Negotiating service level objectives, service level indicators, error budgets and the user experience
● IBM Cloud tools and technology across the Software Development Life Cycle
● Applying software engineering principles to drive reliability
Module 3: Operations
You will cover the following topics:
● Performing operational readiness reviews (ORR) on IBM Cloud
● Creating ORR checklist
● Employing cost-optimization strategies
● Managing backups and recoveries on IBM Cloud
Module 4: Monitoring
You will cover the following topics:
● Monitoring overview
● Creating and maintaining metrics, traces, and alerts on IBM Cloud
● Collecting, analyzing, and managing logs on IBM Cloud
● Identifying key metrics for service health on IBM Cloud
● Using performance and availability metrics to measurethe health of services on IBM Cloud
Module 5: Incident Management
You will cover the following topics:
● Managing incidents on IBM Cloud
● Developing a balanced action plan to mitigate future incidents
● Performing the post-incident review
Module 6: Security and Compliance
You will cover the following topics:
● Monitoring and managing security threats on IBM Cloud
● Implementing and managing security policies on IBM Cloud
● Implementing encryption models
● Managing role-based access control on IBM Cloud