Issue 270
Plenty of great content this week, along with what appears to be a rising tide of conversations around LLM observability. The videos for Monitorama are already online for binging; if you like what you see, make sure to sign up for updates on next year’s event. 🧠🚀🍿
This issue is sponsored by:
ClickHouse is a real-time data warehouse and open-source database optimized for analytics, combined with the OpenTelemetry integration, it is the perfect fit for SQL-based observability. When it comes to the four pillars of Observability: metrics, events, logs and traces, ClickHouse shines in its ability to ingest massive amounts of data, perform a range of analytical functions over it, while providing high compression rates with blazing-fast performance.
Articles & News on monitoring.love
Observability & Monitoring Community Slack
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Videos from last week’s Monitorama conference are already online for viewing. The entire playlist is full of gems, but here are some of my personal favorites:
- The Ticking Timebomb of Observability Expectations
- Pugs, Poe’s and pipelines
- How we tricked engineers into utilizing distributed tracing
- The Hater’s Guide to Dealing with Generative AI
- The shoemaker’s children have no shoes - why SRE teams must help themselves
- Incident Management: Lessons from Emergency Services
Stop paying for luxury monitoring
Making the case that synthetic monitors are good enough for smaller businesses who may not have the budget for a commercial observability product. Woof.
Adriana Villela spoke with numerous folks from the OpenTelemetry community, both contributors and users, at the most recent KubeCon EU. This post includes a transcript of the conversations as well as the full video. Fun way to see what others in the community are up to and where they think we’re going next.
Enhancing Cloud Usage Forecasting, Monitoring & Optimizing
How Etsy’s cloud infrastructure practices have matured as a result of a dedicated FinOps practice and strong forecasting capabilities (something Etsy has been doing well as long as anyone I can remember).
Designing an Observability pipeline for LLM Applications
Some considerations and best practices for monitoring LLM applications, along with a closing pitch for what appears to be an LLM observability tool, based on OpenTelemetry, and designed for AI developers.
groundcover is a cloud-native application monitoring solution that reinvents the domain with eBPF. Built for modern production environments, it enables teams to instantly monitor everything they build and run in the cloud without compromising on cost, granularity, or scale. Install free or try sandbox environment to see it in action! (SPONSORED)
Investigating Kafka Broker I/O When Using Tiered Storage
A reminder that technical debt can rise up and bite you at any time. Love a good Kafka debugging story.
Reducing cardinality load from node_systemd_unit_state
Handy tip for decreasing cardinality by dropping unwanted metrics.
Debugging with production neighbors
A look at the motivations, design constraints, and promise of Uber’s SLATE tool for E2E testing and debugging in production. Not something that every company needs (or can build) for themselves, but still an interesting case study for offering dynamic debugging capabilities.
Tools
“OpenLIT is an OpenTelemetry-native tool designed to help developers gain insights into the performance of their LLM applications in production.”
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor