Ahmet Alp Balkan 11/15/2024

Tale of a Kubernetes node-feature-discovery incident

Read Original

A detailed post-mortem of a Kubernetes incident where upgrading the node-feature-discovery (NFD) component caused major scale issues. The new version's architectural shift to using NodeFeature custom resources consumed excessive etcd storage (~140 KB per node) in large production clusters, breaking pod scheduling. The article covers the decision to roll back and provides lessons on evaluating off-the-shelf components for large-scale operations.

Tale of a Kubernetes node-feature-discovery incident

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser