Ahmet Alp Balkan 11/15/2024

Tale of a Kubernetes node-feature-discovery incident

Read Original

A detailed post-mortem of a Kubernetes incident where upgrading the node-feature-discovery (NFD) component caused major scale issues. The new version's architectural shift to using NodeFeature custom resources consumed excessive etcd storage (~140 KB per node) in large production clusters, breaking pod scheduling. The article covers the decision to roll back and provides lessons on evaluating off-the-shelf components for large-scale operations.

Tale of a Kubernetes node-feature-discovery incident

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week

No top articles yet