Issue 176
A ton of variety this week, with everything from high-level observability and incident management discussions to the introduction of a new OSS network monitoring tool. And all of the videos are now available from Monitorama… enjoy!
This issue is sponsored by:
Troubleshoot Kubernetes in a Snap with Sysdig Monitor Advisor
Sysdig Monitor is making it easier to find important details about your clusters, namespaces, and deployments with a new feature called Advisor. In this webinar, you will learn how Advisor can help you debug and solve difficult Kubernetes problems. Join us at 10am PT on Tuesday, July 26th to add this feature to your troubleshooting toolbox. Save your seat here.
Articles & News on monitoring.love
Observability & Monitoring Community Slack
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Videos from Monitorama PDX 2022
The videos from last month’s Monitorama event were uploaded this week. There were so many great speakers, but I would make sure you don’t miss out on Sophia Russell and Adrian Cockcroft’s talks from the first day.
Kubernetes Observability — Watches And APIServer Latency Monitoring
A quick example for integrating Kubernetes APIServer latency monitoring into your toolbox.
Alerts, what are they good for?
This post touches on many of the aspects of alert design that we feel but perhaps don’t think about as much as we should. I wish it talked more about the impact of bad alerts on operators, but it’s still a great article.
Akvorado: Netflow/IPFIX collector and visualizer
As someone who got introduced to monitoring concepts and tools through network administration, I was excited to see this tweet about a new Netflow/IPFIX collector based on projects like Kafka and ClickHouse. I’ll be watching this one with a keen eye.
Improving the 5 Why RCA format
Why Razorpay engineers adopted the “5-Why” root cause investigative technique, and then adapted it based on their own learnings, allowing them to capture additional context and driving to better analysis.
Kubernetes Practice — Logging with Logstash and FluentD by Sidecar Container
A very through tutorial for setting up Logstash and Fluentd in your Kubernetes pods with the sidecar pattern. Well done.
Robust storage backend for Jaeger and OpenTelemetry
Promscale is a durable and scalable Postgres-based storage backend for Jaeger that is much easier to set up and operate than Elasticsearch or Cassandra. It includes out-of-the-box dashboards and full SQL query capabilities to understand the performance and behaviors of your services. (SPONSORED)
Give your app monitoring wings through the Azure Anomaly detector API
If you’re looking to apply some automated anomaly detection to your data, here’s a simple introduction to leveraging Azure’s Anomaly Detector service.
Everything you need to know about Observability: A complete Guide
A broad look at Observability, how to distinguish it from Monitoring, some practical examples, and a number of high-level best practices to consider before starting your own observability journey. Share this one with your CIO.
Grafana v9.0.3, v8.5.9, v8.4.10, and v8.3.10 released with high severity security fix
Fixes for a high security CVE affecting Grafana alerting and OAuth capabilities were released this week.
Monitoring Your Platform From Multiple Locations
Considerations for monitoring your systems on a distributed and/or global scale. This article doesn’t address specific tools, but it covers some of the important decisions to make before you eventually pick one.
Tools
“OpenDCDiag is an open-source project designed to identify defects and bugs in CPUs. It consists of a set of tests built around a sophisticated CPU testing framework.”
“This program receives flows (currently Netflow/IPFIX), hydrates them with interface names (using SNMP), geo information (using MaxMind), and exports them to Kafka, then ClickHouse. It also exposes a web interface to browse the collected data.”
Job Opportunities
Postgres Solutions Engineer at pganalyze (US Remote)
Site Reliability Engineer at Litmus (US Remote)
Senior DevOps Engineer at Hotel Engine (US Remote)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor