Issue 272
Fun week full of stories on systems design, instrumentation, and problem solving with observability tooling. Enjoy! 🧠🙏😎
This issue is sponsored by:
ObservabilityCON 2024 registration is open. Whether you’re in an empire state of mind or an observability bind, join us for bagels and logs in New York City, September 24-25. Connect with Grafana Labs experts and preview LGTM Stack releases and solutions. Register now and join us in person.
Articles & News on monitoring.love
Observability & Monitoring Community Slack
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Building a large-scale Observability Ecosystem
An insightful look at one company’s observability journey. This feels like a solid roadmap for anyone planning a similar transformation.
Managing Critical Alerts through PagerDuty’s Event Rules
Sometimes it just takes a little bit of extra context to get folks to care that little bit more about an alert and to take preventative action.
Love to see OpenTelemetry making inroads to front-end telemetry. Looks like much of this post was inspired by a recent talk at KubeCon EU.
Solving large logs with ClickHouse
A deep-dive on what it took for one vendor to reduce large log query time in ClickHouse.
otel-tui: A TUI Tool for Viewing OpenTelemetry Traces
Fun new project for interacting with OTel traces inside the terminal. Love it!
Getting started with Grafana: best practices to design your first dashboard
Just like alerting, there is good and bad dashboard design. This post covers some of the basic necessities for any effective dashboard.
Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl to analyze, collect, process, and route all IT and security data, delivering the choice, control, and flexibility required to adapt to their ever-changing needs. (SPONSORED)
Fascinating look at the challenges of indexing time-series data at “Datadog Scale”.
Observability using OpenSearch + Grafana
A cautionary tale (with pointers) for anyone considering Grafana over Kibana with OpenSearch.
A Journey as an Incident Commander: The Unsung Hero of Crisis Management
Having worked for organizations where nobody really knew (or cared to know) how to IC effectively, this one hits close to home. I’d have loved this post to dive deeper into specifics, but it’s still a good reference to share with a team that questions the importance of this role (or their ability to perform it).
Monitoring Instance Metrics in a Golang program
Although I think the example here isn’t the best way to collect related machine metrics, it’s still a useful pattern for Go developers learning how to instrument their code.
Tools
“A terminal OpenTelemetry viewer inspired by otel-desktop-viewer”
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor