#96 - Practical Guide to Implementing SRE and SLOs - Alex Hidalgo
“Reliability is the most important thing. Your users define your reliability, so make sure you’re measuring the right thing. And 100% is out of the question, so pick the right target." Alex Hidalgo is the Principal Reliability Advocate at Nobl9 and author of “Implementing Service Level Objectives”. In this episode, we discussed the practical guide on how to implement SRE and SLOs. Alex started by explaining the basic concept of service reliability and service truths. He then explained the concept of reliability stack, that includes the famous SRE concepts: SLI, SLO, and error budgets. Alex then shared his insights on how we can define a service reliability target, why a higher reliability target is expensive, and the risk of a service of being too reliable. Towards the end, Alex shared his tips on how we can build an SRE culture and how we can use the error budget as a communication tool within the organization. Listen out for: Career Journey - [00:07:19] Understanding SRE & SLO - [00:14:17] Service