Kubernetes List API performance and reliability
Analyzes performance and reliability challenges of Kubernetes List API calls at massive scale, explaining high-risk patterns and recent improvements.
Ahmet Alp Balkan is a software engineer specializing in large-scale Kubernetes infrastructure and cluster management systems. He has led compute platform teams at LinkedIn, Twitter, Google Cloud, and Microsoft, focusing on bare-metal fleets, controller development, and developer experience for cloud-native platforms.
8 articles from this blog
Analyzes performance and reliability challenges of Kubernetes List API calls at massive scale, explaining high-risk patterns and recent improvements.
A deep dive into the various mechanisms that can evict or terminate Pods in Kubernetes, explaining internal behaviors and control strategies.
A guide to designing reliable, production-grade Kubernetes controllers and avoiding common API design pitfalls.
Analysis of OpenAI's Kubernetes outage, focusing on API server overload and DNS service discovery issues in large-scale clusters.
Analysis of a Kubernetes node-feature-discovery upgrade incident, detailing scale issues and architectural changes that caused problems in large clusters.
A guide to common pitfalls and best practices when generating Kubernetes Custom Resource Definitions (CRDs) using controller-gen.
Explains why Kubernetes takes 60-90 seconds to update Secrets/ConfigMaps on mounted volumes, detailing the kubelet's sync mechanism.
Explains the unique behavior of file change notifications (inotify) on Kubernetes Secret and ConfigMap volumes and how to handle atomic updates.