#SlightReliability

Stephen Townshendthekiwisre@hachyderm.io
2023-11-28

This week on #SlightReliability I had the honour of chatting with @honeycombio Field CTO @lizthegrey about the role of developer advocacy in #SRE.

#DevRel

πŸ—£οΈ What is developer relations (DevRel)?
🎡 Is DevRel and developer advocacy the same thing?
πŸ’° What value does developer advocacy add to organisations and the community?
πŸ“– Storytelling and the power of visuals
πŸ₯‡ ...and some tips on getting SRE traction in your organisation!

youtube.com/watch?v=loVZWgVpnF

2023-11-27

@paigerduty I was listening to your appearance on #SlightReliability last week and you mentioned that in #OpenTelemetry all of your services should send data to the same collector. Is this always the case? I was thinking that some ingestion platforms would be able to correlate all the traces, but I could easily be wrong there.
I was thinking I could have one collector per service in a separate container, or even one per eng group. Maybe that isn't feasible, best practices though?

Stephen Townshendthekiwisre@hachyderm.io
2023-11-21

This week on Slight Reliability @paigerduty is back! This time we dive into sampling of distributed traces. We cover...

πŸ•ΈοΈ What is distributed tracing? What are spans?
πŸ§ͺ What is sampling? And why do we need it?
🀯 What constitutes an interesting trace?
🦘 No sampling VS head based VS tail based
πŸ‘©πŸΎβ€πŸ”¬ Non-traditional use cases of tracing such as CI/CD
🧻 The power of napkin math to make informed decisions
...and much more.

#SRE #observability #SlightReliability

youtube.com/watch?v=GYwjeE9reb

Stephen Townshendthekiwisre@hachyderm.io
2023-10-10

This week on #SlightReliability I chat with Dr. Vlad Ukis (author of the book "Establishing SRE Foundations" and head of R&D at Siemens Healthineers) about implementing #SRE.

One of my big takeaways from the conversation was the power of selling SRE practices internally, showcasing success, and the "SRE marketing funnel". The social side of SRE is overlooked but very important.

Also in this episode: SLOs and how to get started with them.

#SLO #SLI #DevOps

youtube.com/watch?v=PPiCm_k03H

Stephen Townshendthekiwisre@hachyderm.io
2023-10-03

This week on #SlightReliability Amin Astaneh from Certo Modo is back! This time we discuss his #sre (production engineering) experiences at #meta. We cover:

🏒 What it's like interviewing for big tech
🦢 Voting with your feet (as an incentive to prioritise reliability)
πŸ’ SRE engagement models
πŸ… Socialising SRE wins to grow the practice (the sales part of SRE)
πŸ‡Ή Wide VS deep skillsets in different sized orgs
πŸš’ The time Amin burned down a data centre...

(and much more!)

youtube.com/watch?v=YIptrW0SZa

2023-09-20

I can't tell you how fun it was being invited as a guest onto the #SlightReliability podcast with @the_kiwi_sre talking about modern #dashboards, the 3 phases of #cloudnative #observability, and so much more! #chronosphereio youtube.com/watch?v=-annvqpYCA

Stephen Townshendthekiwisre@hachyderm.io
2023-09-12

This week on Slight Reliability I revisit the concept of the single pane of glass (#SPOG) with Jamie Allen from EPAM Systems and Adam Kinniburgh from SquaredUp.

πŸ‘οΈ What is a SPOG supposed to be?
🌏 Can it work at massive scale?
πŸ’Ό Is it a tool for engineers or executives?
πŸ€– What is the future of dashboards in the #AI era?

(and much more) #SRE #observability #dashboard #monitoring #SlightReliability

youtube.com/watch?v=H5bsC8CvQh

Stephen Townshendthekiwisre@hachyderm.io
2023-09-05

This week on #SlightReliability I drill into the myths and truths about #AI with Kyle Forster from RunWhen.

Can we bring single player mode to pair programming using AI? Are IT jobs at risk of being displaced? How (as consumers) do we make informed decisions about purchasing products with AI? (and of course, much more).

I hope you enjoy my drawing of Vision (from the MCU)... it took me quite some time :)

youtube.com/watch?v=CvsljSP1Xf

Stephen Townshendthekiwisre@hachyderm.io
2023-08-29

This week on Slight Reliability I had the honour of interviewing Courtney Nash about why mean time to recover (#MTTR) is an unhelpful metric, what she learned by analysing 10+ incident reports, and much more.

πŸ•΅πŸ½β€β™€οΈ Instead of MTTR, let's focus on learning from incidents, observing patterns and themes, involving leadership, and adding an "accident investigator" lens after the fact to enhance the learning.

#SRE #DevOps #incidents #SlightReliability

youtube.com/watch?v=k-tuE9aMg3

Stephen Townshendthekiwisre@hachyderm.io
2023-08-22

This week on #SlightReliability I chat with Martin Thwaites from Honeycomb.io about #observability during #development (#ODD). Some of my takeaways:

πŸ’» How observability in development frees up developers to spend less time debugging and more time writing code.

πŸ€– That manual instrumentation is where the power is.

πŸ’° Keeping the cost of observability data down through a combination of head and tail based sampling. "Keeping every span of trace data is irresponsible".

youtube.com/watch?v=dsLVtqILbH

Stephen Townshendthekiwisre@hachyderm.io
2023-08-15

This week on #SlightReliability... how do we prevent #observability from only generating value for a small set of engineers? How do executives, product managers, and other stakeholders leverage its power?

youtube.com/watch?v=rH0U1sKr-T

(You can also listen to Slight Reliability via most podcast platforms, or check out slightreliability.com/)

An mspaint drawing of a hand squeezing the juice out of an orange (the juice representing the essence of the fruit).
Stephen Townshendthekiwisre@hachyderm.io
2023-05-30

Unfortunately there is no #SlightReliability episode this week... So as is tradition, I have a haiku for you. #sre

The haiku reads...
You need to see more
In reliability
Than technology
Stephen Townshendthekiwisre@hachyderm.io
2023-05-22

Who else is going to be at AWS Summit in London on June 7th? Would be great to meet some of the community in person. #awssummit #aws #slightreliability aws.amazon.com/events/summits/

An mspaint drawing of Stephen climbing a mountain with an AWS logo at the top of it.
Stephen Townshendthekiwisre@hachyderm.io
2023-04-04

This week on Slight Reliability I chat to Ivan Merrill about his experiences implementing #observability in the real world. We discuss making observability part of onboarding, discussing risk to get leadership buy-in, inviting over inflicting practices, and much more.

#sre #SlightReliability #reliability

youtube.com/watch?v=6osDq8DSxc

An mspaint.exe picture of Stephen and Ivan climbing steps up to a red flag like the Super Mario Brothers games.
Stephen Townshendthekiwisre@hachyderm.io
2023-03-30

Yesterday #SlightReliability reached 1k subscribers on YouTube! Just wanted to say thank you to everyone who has listened and joined in the discussion about #sre!

The meme about the guy who gets a bronze medal but celebrates way harder than the one getting gold.
Stephen Townshendthekiwisre@hachyderm.io
2023-03-21

This week on #SlightReliability... what is "insight" in #observability? Are tool vendors lying to us about being able to provide it? Is it science? Art? Or magic? #sre youtube.com/watch?v=i2GFEobj2g

An mspaint picture of Stephen as a ninja in a blue outfit, blindfolded, holding a sword.
Stephen Townshendthekiwisre@hachyderm.io
2023-03-08

This week on #SlightReliability I reminisce from my #performancetesting days when I used to analyse complete sets of raw data using scatterplots, and ponder how we could apply this in #observability #sre youtube.com/watch?v=f1GSGWGUEG

An mspaint picture of Stephen chewing on a raw steak.
Stephen Townshendthekiwisre@hachyderm.io
2023-02-28

Last week on #SlightReliability I chated to Paige Cruz from Chronosphere about cognitive overload in #SRE. We chated about how SREs are often used as the Swiss army knives of the IT department, how as humans our RAM is maxed out, why you shouldn’t give your team a name like β€œThe Lobsters”, and a whole lot more.

This was one of my very favourite interviews I've ever done. youtube.com/watch?v=CDhGgnIGGQ

An mspaint drawing of Stephen Townshend and Paige Cruz gasping as their heads explode into mushroom clouds.
Stephen Townshendthekiwisre@hachyderm.io
2023-02-14

This week on #SlightReliability I talk about how I think #observability promises more than what we're getting. I argue that it needs to look at more than technology in order to help us negotiate the ocean of chaos in the Digital Era. #sre youtube.com/watch?v=da3o2QSxVe

An mspaint picture of Stephen with a third eye (inspired by Doctor Strange), glowing yellow like a Dragonball Z character, with eyes in the darkness behind him.
Stephen Townshendthekiwisre@hachyderm.io
2023-01-24

This week on #SlightReliability... what do we do with all our #telemetry data? Should we put it all in a data lake? Or is there another way we can pull insight together? #sre #observability youtube.com/watch?v=Mv55p1kXz6

An mspaint drawing of a man with half his head submerged under water with bubbles gurgling up to the surface.

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst