This week brings us some great posts on OpenTelemetry use and troubleshooting, log and events transformation, and statistical analysis. Enjoy! 🌞🍩🪓

This issue is sponsored by:

ClickHouse logo

ClickHouse is a real-time data warehouse and open-source database optimized for analytics, combined with the OpenTelemetry integration, it is the perfect fit for SQL-based observability. When it comes to the four pillars of Observability: metrics, events, logs and traces, ClickHouse shines in its ability to ingest massive amounts of data, perform a range of analytical functions over it, while providing high compression rates with blazing-fast performance.



Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Sicredi’s Path for Opentelemetry Adoption

A relatively high-level look at how Sicredi adopted OpenTelemetry, unified it with their existing tools, rolled out the OTel collector in stages, and generally approached developer adoption and the required culture shift to lead to a successful outcome.

Tips for Troubleshooting the Target Allocator

Some excellent troubleshooting tips for anyone struggling with the OpenTelemetry Collector’s Target Allocator.

Enhancing Netflix Reliability with Service-Level Prioritized Load Shedding

We don’t often get a chance to cover load shedding strategies, but it’s an interesting complement to observability and scaling discussions. Leave it to Netflix to support the topic with plenty of metrics viz.

Grafana Loki security update

Apparently there’s an issue where a specific configuration of Grafana Loki can “explode your AWS bill”. If you’re using Grafana’s Helm chart for Loki, make sure you check this one out and upgrade accordingly.

OpenTelemetry Best Practices #3: Data Prep and Cleansing

Some more OTel tips and best practices, with an emphasis on data transforms and filtering.

Modernizing Logging at Uber with CLP (Part II)

A detailed look at how Uber takes full advantage of the Compressed Log Processor (CLP) at scale and with enough retention and coverage to be viable.

Elastic logo

Elastic is the leading search analytics company. We enable anyone to securely harness search powered AI to find the answers they need in real-time using all their data, at scale. Elastic’s cloud-based solutions for search, security, and observability help businesses deliver on the promise of AI. (SPONSORED)



How to Determine API Slow Downs, Part 2

An example for identifying performance regressions using various statistical models. Using this as an input for alerting is an exercise left to the reader.

AWS: Metric Filter vs Subscription Filter

A look at two popular AWS CloudWatch features for routing or transforming log data.

Web Performance Regression Detection (Part 3 of 3)

Third part in a series on performance regressions at Pinterest. This series skews heavily towards real-user monitoring, but it’s still an excellent collection of insights for anyone supporting user-facing sites and services.

Tools

y-scope/clp

“Compressed Log Processor (CLP) is a free log management tool capable of compressing text logs and searching the compressed logs without decompression.”

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor