Lmst

@obfuscurity hi Jason, do you have dates for #Monitorama 2025?

It was a drunk guy, and our neighbor seated at the bar (also a #monitorama attendee!) and the bartender both had our backs. The dude got cut off, and would have gotten thrown out had he pushed it any further. Good times.

How bad of an outage do we need to have before people are happy to pay for the software we build? [fin] #monitorama

(5) but there's a lot of demand for good product design: large-scale, heterogenous data, and low-latency feedback paths (including over longer periods of time, not just instantaneous), and nobody wants to pay. #monitorama

(3) Use boring technology and combine it in innovative ways. [ed: although I... worry about older storage engines, and think that their cost economics may not be up to snuff] (4) It's a crowded market, and the "best" product may not win. #monitorama

(2) New query languages are rarely the solution. A new query language is not likely to succeed. Everyone uses SQL, use it too unless you have a good reason. [ed: and this is why I made @honeycombio's builder says VISUALIZE _ WHERE _ GROUP BY _ ORDER BY _ LIMIT _] #monitorama

So the lessons: (1) logs (inverted indices, little aggregation) and timeseries (data loss tolerant, compresses well, just a bucket of numbers) are different challenges for storage. Often you need separate engines. [ed: although he thinks @honeycombio is interesting] #monitorama

[ed: because I think we're finally at the end of the journey.] So now we're at Google. So what does Stackdriver use to measure itself/GCP and do planet-scale observability? "mostly good," he says, [ed: and I'd agree based on my 8-month-stale knowledge] #monitorama

So enter InfluxDB, and suddenly being able to measure everything and high cardinality dimensions. (which could have solved loggly's problems) every generation can monitor the previous generation, but not itself... [ed: & why I demoed Honeycomb debugging itself] #monitorama

so he wound up going to work at Loggly in 2012. There was a huuuge volume of logs to index, but at least there was partial visibility. And by 2014 it still wasn't solved :/ #monitorama

In this thread, I help a Google engineer with using Google Slides to project slides and have separate speaker notes. This repeatedly happens and is one of the most painful UX issues of Google Slides that bites every non-professional presenter. :( :( ( #monitorama

Finally is @general_order24 on building o11y platforms over the past 10 years, and how it brought him eventually to Stackdriver... #monitorama

Homin concedes: instead of pre-training your data, maybe just label everything explicitly and correctly to begin with. This functionality is for the situation that you have org drift, different schemas, lack of labels, etc. #monitorama

They automatically build the graphs, don't surface them directly to the user, but instead intend to make them part of Datadog's next generation of ML-powered assistive features. #monitorama

[ed: by that point...to me, that says, have fewer alerts/alarms...] We then use an o11y graph to get to the bottom of what's happening and expose relevant info. conclusion: we need to study how people interact with data to improve system o11y. [fin] #monitorama

[ed: I think this talk is... probably a fine talk for people with many potential-cause-based metrics/alerts, but I'm finding it hard to translate to my symptom-and-trace based debugging world] We're trying to hypothesize why all our 5+ alerts are going off... #monitorama

"Dashboards have gotten a bad rap recently, but they're one of the more useful tools out there if well constructed c.f. @gphat's talk" --Homin We can find out what metrics matter by how often they're seen/dashboarded. #monitorama

There's some useful data on which service is implicated in an alert, which team it goes to, and what metric it's on. And so on for all our alerts. And we might be able to find temporal leading indications/correlations. #monitorama

Homin says that we need to use ML on messy data, do unsupervised/semi-supervised training, plus our existing known nodes and relationships between them, and construct a knowledge graph. One concrete example: #monitorama

How can we do RCA with our imperfect, unlabeled data? [ed: my Allspaw introject is objecting] Our messy data does have lots of data on real user interactions, and patterns of how components behave. #monitorama

#Monitorama

Client Info