Just after 1400 today I set up a 10s loop that grabs the homepage of the mesostic app. Neat to see it show up in the span duration graph, and how it pulls up the durations for other spans.
Just after 1400 today I set up a 10s loop that grabs the homepage of the mesostic app. Neat to see it show up in the span duration graph, and how it pulls up the durations for other spans.
System Administration, Week 1: UNIX History
We're borrowing this video from our "Advanced Programming in the UNIX Environment" class to give a brief summary of the history of the UNIX family of operating systems.
Oh and trying to keep OTEL top-of-mind as well, cascading traces!
Obviously our highest latency is the call to NASA. :p
🔍 ¿Tu equipo busca culpables cuando algo falla? El postmortem blameless cambia el enfoque: Analiza errores sin señalar personas. Aprende, mejora y construye confianza 🚀
Lee más 👉 https://www.soloingenieria.org/ingenieria-de-software/postmortem-blameless/
Imagen creada con IA.
#PostmortemBlameless #DevOps #IngenieríaDeSoftware #CulturaDevOps #SRE #MejoraContinua
Los equipos que más aprenden no son los que nunca fallan. Son los que analizan sus errores sin miedo ni culpas. Eso es cultura blameless 💡
#PostmortemBlameless #DevOps #IngenieríaDeSoftware #CulturaDevOps #SRE #MejoraContinua
Agents enter the room.
Quick! What do you do?
..
(Don't look at me.)
Agents are non-embodied actors in human systems.
Agents receive context as input.
Agents decide, execute, loop.
Agents reduce complexity.
Until the END.
Who pays the embodied cost of AI-driven sense-making?
And why is it never the systems that scale it?
I write about language, technology and human systems.
👉 https://systemic.engineering/who-invited-the-agent-oh-god-smith-will-suffice/
#SystemicEngineering #SRE #SREforHumans #SiteReliabilityEngineering #Agents #AI #AIEthics #AI
Instant DevOps Labs: Hands-On & Free! 🚀
Get 15 minutes of real Linux environment practice with genuine scenarios.
No signup. No email. No credit card required.
24 hours until the CfP for "SREday London 2026 Q1" closes: https://papercall.io/cfps/6456/submissions/new
System Administration, Week 1: Core Principles
In this video, we present a few core principles that will guide us throughout the semester: Scalability, Security, and Simplicity. We'll also get to know a few basic "laws", well known by any System Administrator. If you're wondering what all this has to do with Legos, please tune in...
🚀 The Best of #CloudComputing & #DevOps in 2025
#InfoQ published some serious heavy hitters last year. These 5 deep dives are essential reading for engineers who want to #StayAhead of the curve 👇
➡️ Designing Resilient Event-Driven Systems at Scale by Rajesh Kumar Pandey
https://bit.ly/3HlYOpa
➡️ Being Functionless: How to Develop a Serverless Mindset to Write Less Code! by Sheen Brisals
https://bit.ly/4rhWXmM
➡️ Checklist for Kubernetes in Production: Best Practices for SREs by Utku Darilmaz
https://bit.ly/43GZ4rO
➡️ When Reverse Proxies Surprise You: Hard Lessons from Operating at Scale by Mitendra Mahto
https://bit.ly/4nZJTR3
➡️ Why Is My Docker Image So Big? A Deep Dive with “dive” to Find the Bloat by Chirag Agrawal
https://bit.ly/44os5ar
📚 Knowledge is power! 💪
#SystemDesign #Serverless #Kubernetes #SRE #Docker #CloudNative
A lot of “scalability work” is really “making side effects predictable.”
Idempotency, retries, timeouts, and clear ownership of state sound boring until your first incident teaches you they were the product all along.
When a system is calm under failure, it is not because it never fails.
It is because 𝗶𝘁 𝗳𝗮𝗶𝗹𝘀 𝗶𝗻 𝘄𝗮𝘆𝘀 𝘆𝗼𝘂 𝗽𝗹𝗮𝗻𝗻𝗲𝗱 𝗳𝗼𝗿.
#SoftwareEngineering #DistributedSystems #Reliability #SRE #SystemDesign #EngineeringBasics #ByernNotes
I’m currently looking for a full-time or contract work in SRE / DevOps / IT Operations.
Portland, OR. Open to hybrid, on-site, or remote. Willing to relocate to Seattle.
Schedule: Any
Tools: Python, Bash, PowerShell, Terraform, Jenkins, Puppet, Ansible, Splunk, Grafana, BigPanda
CI/CD: Jenkins, Bitbucket, container builds with Docker/Podman, deployments to Openshift.
I have worked as an IT Operations Engineer in enterprise production environments, supporting on-prem VMware (RHEL and Windows) alongside Azure and AWS. My role included on-call rotations and incident command for high-severity outages.
My responsibilities included monitoring, alert triage, and root cause analysis across infrastructure and application layers, coordinating with infrastructure, development, and product teams to isolate failures, restore service, and prevent recurrence.
My focus was developing Python tooling for automation and production support, with Ansible used for routine infrastructure tasks.
I worked extensively with Splunk, Grafana, and BigPanda, building dashboards for investigation, event correlation, and metrics and trends.
Additional experience includes:
Terraform for cloud provisioning and Puppet for configuration enforcement
Network troubleshooting across Cisco and Arista environments
Production database support: Oracle, SQL Server, MongoDB, Postgres
My DM’s are open! Feel free to message me for my resume.
Git: github.com/Aleph0x
Web: https://www.al3f.com
A huge thank you to our #opensource community for landing Coroot in the top 30 most popular observability project on Github (out of 3,300+ entries!)
Love #Coroot and want to help share it with a world? Add your ⭐️ to the galaxy: https://github.com/coroot/coroot
#linux #ebpf #observability #softwarelibre #devops #sre #tech
System Administration, Week 1: The Job of a System Administrator
In this video, we try to capture the job of a System Administrator. We show what things SysAdmins may encounter in their day to day routine, ranging from blade servers and routers to cable ties and power tools and everything in between. As we try to define the job, we find out it's not quite that easy...
It's duct tape and WD40 all the way down.
After years in DevOps, I learned the most not from certifications, but from 2AM production outages and bulk-dollar cloud mistakes.
This post breaks down what 100 real incidents taught me about reliability, cost, and calm decision-making.
#DevOps #CloudEngineering #AWS #SRE #ProductionIncidents #CloudCosts #FinOps
Một số lỗi thực tế trên môi trường sản xuất không gây sập hệ thống hay hiện lỗi rõ ràng, nhưng lại dẫn đến trạng thái sai lệch: người dùng bị chặn, giao dịch không thực hiện, webhook không gửi được... Dữ liệu "im lặng" lệch hướng trong khi mọi thứ vẫn hiển thị bình thường. Những lỗi này ẩn mình trong glue code, sự chênh lệch môi trường, cạnh thời gian xử lý hoặc các luồng dự phòng bị lãng quên. Có phải sản phẩm thực sự "ma ám"? 🤯 #SoftwareEngineering #SystemReliability #Debugging #SRE #LỗiẨn #K
Rejected again. Picked someone over me, again. Sisyphean rollercoaster all over again. If this happens one more time, I will have to start counting on two hands.
Seven months unemployed now.
The depression is really smacking me around.
I appreciate all the help with leads and roles that didn't work out. Still trying to find work.
System Administration, Week 1: Introduction
In this video, we cover a number of administrative issues relating to our course: we discuss why and how System Administration is covered in an academic Computer Science curriculum and outline the course syllabus.
Lovely quote: "Software engineering is programming over time"
It's even more true in the time of vibe coded software (and AI assisted programming). The operarional costs of creating a feature might be going down, but at the same time the costs of running and changing the system rise.
While the programming part of the quote gets easier and faster, the long-term operation becomes harder and effort might shift to this discipline.
#ai #software #programming #vibecoding #sre #softwareengineering
New features are here! 🚀
We’ve just launched two highly requested improvements:
- Skill-level filtering in Scenarios: find the right challenges for you.
- GitHub login support: seamless access using your GitHub account.
Ready to dive in and get hands-on? https://www.learnbyfixing.com/scenarios/