Issue 242
Some deeply technical and fun stories from production this week. Hope you enjoy reading them as much as I did. If you’re a Go developer, make sure to check out the NilAway article too! ⛄🦃🏳🌈
This issue is sponsored by:
Stop sampling & capture everything needed for o11y, security, analytics, and more. Axiom efficiently ingests, stores, and queries 100% of your app and infra telemetry with no sampling or cold storage required. Within seconds, know exactly what happened 3 minutes, 3 months, or 3 years ago.
Articles & News on monitoring.love
Observability & Monitoring Community Slack
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
OpenTelemetry parameter that might ruin your flexibility
An excellent article on an underserved topic… the importance of choosing the right metrics temporarlity for your OpenTelemetry metrics aggregation and its impact on the long-term portability of your data.
Insights from building a scalable distributed tracing platform for adidas
Lessons learned adopting distributed tracing (and its effects on the rest of their observability stack) inside a Platform team at Adidas Group.
VictoriaMetrics: pushing metrics without Prometheus Pushgateway
Long-time readers know I’m a stubborn fan of the push (vs pull) model for metrics collection. This example demonstrates VictoriaMetrics’ native support for push, eliminating a potential extra hop in your data pipeline.
Improving Efficiency Of Goku Time Series Database at Pinterest
The start of a new series of posts from Pinterest engineering revisiting their in-house time-series database, looking back at some of the challenges they faced since its original design and how they’ve adapted it to meet their growing needs. TSDB geeks should love this one.
NilAway: Practical Nil Panic Detection for Go
We don’t typically cover programming tools like linters here, but this post from Uber Engineering provides a fascinating look into their approach to nil panic detection and mitigation.
How Fixing my Typo Improved Cribl Search Query Performance by 20x
Writing performant log queries is an underappreciated skill imho, and this story from a Cribl engineer proves that even the professionals can struggle to cobble together the right bits at times.
Kubernetes: Liveness and Readiness Probes — Best practices
A handy primer on Kubernetes probes and how to make the best use of their respective states.
OCI Cross-tenancy log management
Some policy considerations and examples to be aware of if you have to support cross-tenancy log collection in Oracle Cloud Infrastructure (OCI).
Tools
“NilAway is a static analysis tool that seeks to help developers avoid nil panics in production by catching them at compile time rather than runtime.”
Events
Monitorama PDX 2024 - Early Bird Tickets
Early Bird tickets are running out soon, make sure to grab yours while you can before prices go up for General Admission seating. And don’t forget to submit your CFP proposal before the deadline!
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor