Don't be afraid to build a tool. Just don't become too attached to it.
A guide on when and how to build internal tools for development teams, emphasizing practicality over attachment.
Benjamin Cane shares insights on distributed systems, reliability patterns, performance testing, and engineering leadership, focusing on practical lessons for building resilient software.
40 articles from this blog
A guide on when and how to build internal tools for development teams, emphasizing practicality over attachment.
A guide for engineers on when to challenge technical decisions and when to accept and support them for team cohesion.
Compares Canary and Blue/Green deployment strategies, explaining their complexities, use cases, and when each is optimal for software releases.
A guide to recognizing and managing personal bias in technical decision-making, focusing on objective data and open-minded discussions.
Explains the value of Architecture Decision Records (ADRs) for documenting technical choices and fostering a collaborative engineering culture.
Explains how adding random jitter to scheduled tasks can prevent synchronized resource spikes and improve application performance.
Explains why stopping a listener immediately during app shutdown causes failed requests and details the correct graceful shutdown sequence.
Explains gRPC's persistent connection challenges during failover and offers solutions like HTTP/2-aware load balancers.
A developer discusses the dangers of assuming code won't change or be misunderstood, advocating for defensive programming practices.
Explores the compounding impact of shaving milliseconds off microservice latency in distributed systems, affecting throughput and scalability.
Explains the Store and Forward resiliency design pattern for handling service dependencies in tech systems like payments and telecom.
A strategy for building low-latency systems by deferring non-essential processing to an event-driven platform to optimize real-time performance.
Explores why coding is just one component of software engineering, highlighting system design, architecture, and the role of AI tools.
Explains how avoiding cross-region calls in microservices improves performance and resilience, and discusses the complexities of designing for regional isolation.
Explains how Kube-proxy uses iptables for load-balancing in Kubernetes and the implications for gRPC/HTTP/2 traffic.
Explains operational flags, long-lived runtime controls for system resiliency, as opposed to temporary feature flags for releases.
Explains why over-reliance on automatic retries can harm low-latency platforms and advocates for fundamental resiliency practices.
Explains how improper logging can severely impact microservice latency and offers solutions like adjusting log levels and using async logging.
Discusses the risks of running analytics on operational databases and offers solutions to separate workloads.
Discusses the critical importance of configuring timeouts, retries, and connection pools in distributed systems to prevent minor oversights from amplifying failures.