This week’s issue is loaded with hands-on technical posts and topics. Lots of Prometheus coverage, a couple OpenTelemetry guides, and looks at how Mastodon infrastructure is scaling and monitored during the Great Twitter Migration:tm:. 😅🌈🔔

This issue is sponsored by:

Chronosphere logo

Chronosphere recently announced that in partnership with Julius Volz (Co-founder of Prometheus) we will be donating PromLens to the Prometheus Community, making the PromLens Query Builder open source for all. Read more in our Co-founder & CEO’s blog.



Articles & News on monitoring.love

Observability & Monitoring Community Slack

It’s been amazing to see the community continue to grow. We’d love to have you join us and share what you’ve been working on.

From The Community

Observability strategies to not overload engineering teams

An intentionally simple and approachable guide for helping your engineering teams adopt OpenTelemetry using auto instrumentation. 👏👏👏

The Growth of Hachyderm

ICYMI the past couple weeks have seen tremendous change to our social networks. Many folks have slowly transitioned from Twitter to Mastodon, a self-hosted, decentralized, federated social network running on open source software. This has seen a renaissance in public discourse around running and debugging our own servers as they scale. Kris Nova, the admin for Hachyderm.io, has been especially public about this community as it grows. I’ve collected a few resources that underscore the importance of observability in a resource growing as quickly as this one.

Are we monitoring our tools?

We place a lot of implicit trust in the tools we use. This article pushes back, with some recommendations for measuring their effectiveness and holding them accountable.

Balancing Velocity and Confidence in Experimentation

Some critical thinking about metrics, experimentation, and how these contribute (or fight against) engineering velocity. This is a good one to share across your entire organization.

Monitoring Stream Processing in a Kubernetes-Native Environment

A look at how Grafana can be extended to handle stream processing observability for Kubernetes clusters. Very cool stuff.

How to prevent metrics explosion in Prometheus

An unexpected flood of new metrics can ruin anyone’s day. This post includes numerous tips for preventing unwanted metrics from some of the more popular Prometheus exporters.

Loop1 logo

Do you have fragmented visibility/oversight, stymied IT?

SolarWinds Hybrid Cloud Observability is a comprehensive, integrated, and full-stack observability solution designed to integrate data from across the IT ecosystem, including network, servers, applications, data, and more. Try it for free today and take your observability to the next level. (SPONSORED)



Using PromQL Subqueries to Calculate Service Level Indicators

A useful example of when and how to leverage Prometheus’ subquery feature.

How to Monitor Kubernetes API Server

If you’re responsible for any Kubernetes clusters, you might want to check out this detailed look at the kube-apiserver and the metrics that matter.

Deploy Promtail as a Sidecar to your Main App

A pattern for sending your Kubernetes-hosted application logs to Loki using a Promtail sidecar.

OpenTelemetry for Python

If you’re already familiar with OpenTelemetry and understand its benefits, but don’t have any hands-on experience yet, check out this four-part series showing you how to add it to your Python apps.

Tools

numaproj/numaflow

Numaflow is a Kubernetes-native tool for running massively parallel stream processing.

Events

Monitorama PDX 2023 - June 26-28 (Portland, OR)

Monitorama is returning to Portland, OR next summer. The 2022 conference was a fantastic event and I look forward to seeing you all again in 2023.

Job Opportunities

Staff Cloud Software Engineer at Pantheon (US Remote)

Staff Site Reliability Engineer at Momentive (CA Remote)

Senior Site Reliability Engineer at Momentive (CA Remote)

Staff Site Reliability Engineer at Momentive (US Remote)

Senior Site Reliability Engineer at Momentive (US Remote)

Site Reliability Engineer at Open Systems (EU Remote)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor