Issue 210
Some fun and interesting articles this week. Loving the emphasis on practical monitoring and troubleshooting practices. And flame graphs! 🔥📈🔔
This issue is sponsored by:
Regardless of where you are on your incident management maturity journey, there’s a right next step you can take. Learn about three areas of focus — roles, services, and retros — why they’re important, and how to improve at any level in "3 ways to improve your incident management program in 2023."
Articles & News on monitoring.love
Observability & Monitoring Community Slack
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Improving Istio Propagation Delay
I absolutely love debugging stories like this one from Airbnb working through some Istio performance issues in production. 🤩
OpenTelemetry and the future of monitoring and observability
Probably one of the more honest appraisals of OpenTelemetry’s strengths and areas for improvement.
This engineer not only created a Grafana dashboard for monitoring the OpenTelemetry collector, but they wrote up a page with diagrams explaining the flow and respective metrics. Great work!
Scaling Observability Reliably and Frugally at Magicpin
One company’s journey planning for a new Observability platform, going through the usual considerations like build-vs-buy, open source versus commercial, and how to ensure it could scale with their growth.
From Unstructured Logs to Observability
Unstructured logs continue to be a pillar of Observability for many companies, but they can be so much more. This post shows how a bit of planning can yield richer data using structured events.
Collecting job metrics using Prometheus PushGateway
A look at one of the more common use cases for push metrics in a Prometheus-monitored architecture.
I think most folks here are probably aware of this feature, but it’s a great thing to share with anyone who might be newer to Prometheus and Alertmanager.
AWS Troubleshooting: The Art of Finding the Needle in the Haystack
A broad set of practices for troubleshooting AWS services. Most of this is probably a refresher, but it’s a good reminder to make sure you have your processes and strategies up to date.
Grafana security release: Fixes for CVE-2023-1410
A medium severity security fix affecting the use of Grafana’s Graphite data source.
Kubernetes CreateContainerConfigError and CreateContainerError
The folks at Sysdig take a closer look at a couple common Kubernetes container errors and how you can monitor them effectively.
Tools
“Singed makes it easy to get a flamegraph anywhere in your code base.”
Events
Monitorama has announced their full agenda for this year’s event. Looks like an awesome collection of topics and speakers. Hope to see you there!
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor