Issue 098
My apologies for the super late issue this week! This week has me moving from San Francisco to Portland, OR, so it’s been a hectic week. Back to our normal schedule for next week. :)
This issue is sponsored by:
📈 Data-Driven Guide to Engineering Leadership
Ship faster because you know more, not because you’re rushing. Get actionable insights from 7 million commits and 85,000+ software engineers, to increase your team’s velocity. Free Guide
Latest Articles on monitoring.love
Real World DevOps: Observability in Mega-Scale Banking with Greg Parker
Ever thought hard about your company’s observability strategy and the challenges you’re facing? What about if your company spanned 70 countries, 90,000+ employees, and you were a bank? My guest certainly thinks about this regularly. In this episode, I speak with Greg Parker, the head of the Enterprise Monitoring Services team at Standard Chartered Bank about what it takes to design and implement a global monitoring strategy in a complex environment.
From The Community
Scaling Graphite to Millions of Metrics
From the article: “Currently our stack reliably handles over a million active metric keys at any given time across 17 million total metric keys.” Very nice.
The sheer amount of cool stuff in here is overwhelming. Huge props to the Grafana team for this release–well done!
Not all metrics are infrastructure-related or deep in the code–many of the most important ones are higher-level. This article talks about what makes a good (business-level) metric.
Lighthouse & AWS Lambda: parallel web perf testing on a budget
There historically hasn’t really been a great way to test web page performance internally, at-scale, and with solid performance. I love this soluton.
There’s a lot of newsletters I follow, and Dieter’s Security Newsletter is one of the great ones. Highly recommended.
Resilience Engineering and Error Budgets
From the article: “I’m not a fan of error budgets. I’ve never seen them implemented particularly well up close, though I know lots of folks who say it works for them. I’m not ready to declare bankruptcy on the practice, though I’d like to highlight some of my concerns with respect to human factors, safety, and resilience engineering.”
Monitoring AWS ECS: Part 1, Part 2
The folks at Datadog are back with another great series on monitoring AWS Elastic Container Service.
Extending Vector with eBPF to inspect host and container performance
Netflix’s on-host performance analysis tool gets some neat updates.
Sunsetting Bosun at Stack Overflow and Call for New Bosun Maintainer(s)
The wonderful folks at StackExchange are sunsetting Bosun internally, so they’re looking for a new maintainer.
This issue is sponsored by:
Monitor What Matters Most and Diagnose Anomalies in a Matter of Seconds
When it’s time to troubleshoot an issue, are you providing the right monitoring signals to your team? SignalFx APM helps by providing full distributed tracing, anomaly detection, and predictive analytics – all right out of the box.
Events
The CFP is now open for Datadog’s DashCon.
Spring Monitoring Meetup - March 6, 2019 - London, UK
See you next week!
– Mike (@mike_julian) Monitoring Weekly Editor