Upgrading Kubernetes
This guide covers the process of upgrading your Kubernetes cluster to a newer version in Krutrim Kubernetes Service.
Overview
Upgrading a Kubernetes cluster in KKS is a two-phase process:
Control Plane Upgrade: Upgrades the Kubernetes control plane components
Node Group Upgrade: Upgrades worker nodes in each node group individually
Important: These are separate operations. Upgrading the cluster version only upgrades the control plane. You must upgrade each node group separately to complete the cluster upgrade.
How Kubernetes Version Upgrade Works
Phase 1: Control Plane Upgrade
When you upgrade the Kubernetes version:
Control Plane Upgrade (Automatic)
✓ API Server upgraded to new version
✓ Controller Manager upgraded
✓ Scheduler upgraded
✓ etcd compatibility verified
Worker Nodes: Still running OLD versionAfter control plane upgrade:
✅ Control plane runs the new Kubernetes version
⚠️ Worker nodes still run the old version
✅ Cluster remains operational (Kubernetes supports version skew)
⚠️ You must upgrade node groups to complete the process
Phase 2: Node Group Upgrade
After upgrading the control plane, you must upgrade each node group:
Rolling update ensures:
No downtime for properly configured workloads
Pods are rescheduled to healthy nodes
One node upgraded at a time
Cluster capacity maintained during upgrade
Prerequisites
Before upgrading your Kubernetes cluster:
Check Version Compatibility
✅ You can only upgrade to the next minor version (e.g., 1.27 → 1.28)
❌ Cannot skip versions (e.g., 1.27 → 1.29)
✅ Control plane must be upgraded before node groups
✅ Check available versions in Krutrim platform
Review Release Notes
Review Kubernetes release notes for the target version
Check for deprecated APIs or breaking changes
Verify your applications are compatible with the new version
Backup Critical Data
Backup any critical application data
Document current cluster configuration
Take note of current cluster state
Check Cluster Health
Upgrading the Control Plane
Upgrading Node Groups
Critical: Prepare for Node Group Upgrades
Before upgrading each node group, ensure smooth operation.
Ensure Pods Can Be Rescheduled
Common issues:
PDB with
minAvailable: 100%will block drainingNot enough replicas to satisfy PDB during drain
Single-replica deployments without PDB
Solution example (adjust PDB):
Move Critical Workloads (If Necessary)
For critical single-replica workloads or workloads that cannot tolerate disruption:
Check Node Drain Blockers
Monitor Node Group Upgrade
During the upgrade, the platform performs a rolling update:
Example output:
Per-node process:
New node with updated version is created
New node joins cluster and becomes Ready
Old node is cordoned (no new pods scheduled)
Old node is drained (pods evicted gracefully)
Old node is removed after successful drain
Process repeats for next node
Handle Stuck Node Upgrades
Symptoms:
Node group upgrade stuck in UPGRADING state
Old node stuck in "Draining" state
Node group upgrade not progressing
Cause: Old node cannot be drained due to:
PodDisruptionBudget blocking drain
Pods with
emptyDirvolumesBare pods (no controller)
Pods with local storage
Diagnosis:
Resolution options:
Option 1: Fix PodDisruptionBudget
Option 2: Scale Up Application
Option 3: Delete Blocking Pods (Careful!)
Option 4: Contact Support
Best Practices for Smooth Upgrades
Do's
Always Upgrade Control Plane First
Control plane must be at the same or newer version than nodes
Node groups cannot be newer than control plane
Upgrade Node Groups One at a Time
Wait for each node group upgrade to complete
Verify workloads are healthy before proceeding
Maintain cluster stability
Prepare Your Workloads
Ensure multiple replicas for critical services
Configure appropriate PodDisruptionBudgets
Use Deployments/StatefulSets instead of bare pods
Example PDB:
Test Node Drainability Before Upgrade
Monitor During Upgrade
Schedule Upgrades During Maintenance Windows
Plan upgrades during low-traffic periods
Notify users of potential brief disruptions
Have rollback plan ready
Upgrade Non-Production Clusters First
Test upgrade process in dev/staging
Identify potential issues before production
Validate application compatibility
Don'ts
Don't Skip Kubernetes Versions
❌ Cannot upgrade 1.27 → 1.29
✅ Must upgrade 1.27 → 1.28 → 1.29
Don't Upgrade Multiple Node Groups Simultaneously
Can cause cluster instability
Harder to troubleshoot issues
May exceed resource limits
Don't Ignore PodDisruptionBudgets
PDBs can block node draining
Review and adjust PDBs before upgrade
Ensure PDBs allow at least some disruption
Don't Use Bare Pods in Production
Bare pods are deleted during drain (not rescheduled)
Always use Deployments, StatefulSets, or DaemonSets
Controllers ensure pods are recreated
Don't Upgrade Without Testing
Test upgrade in non-production first
Verify application compatibility
Check for deprecated APIs
Don't Forget About Version Skew
Control plane and nodes can differ by 1 minor version
Don't leave nodes on old version indefinitely
Complete all node group upgrades within reasonable time
Don't Ignore Failed Drains
Investigate why drain failed
Fix underlying issue
Don't force drain without understanding impact
Troubleshooting Upgrade Issues
Version Skew Policy
Kubernetes supports running control plane and nodes at different versions (within limits):
Supported Version Skew:
Recommendations:
Upgrade control plane first
Upgrade all node groups within 1-2 weeks
Don't leave node groups more than 1 version behind
Complete upgrades before next version release
Rollback Considerations
Important: Kubernetes upgrades are typically one-way operations.
Control Plane Rollback:
Not typically supported
May require cluster restore from backup
Contact Krutrim support for assistance
Node Group Rollback:
Can create new node group with old version
Migrate workloads to old version node group
Remove upgraded node group
Prevention is Better:
Test upgrades in non-production first
Verify application compatibility
Have rollback plan documented
Take backups before upgrading
Post-Upgrade Tasks
After completing the upgrade:
Verify Cluster Health
Update Documentation
Document the upgrade date and version
Note any issues encountered and resolutions
Update cluster documentation with new version
Update Client Tools
Review Deprecated APIs
Check for deprecated API warnings
Update manifests to use newer APIs
Test applications thoroughly
Monitor Cluster
Monitor cluster performance
Watch for any unusual behavior
Check application metrics and logs
Additional Resources
Kubernetes Release Notes: https://kubernetes.io/releases/
Krutrim Documentation: Check platform docs for version upgrade procedures
Version Skew Policy: https://kubernetes.io/releases/version-skew-policy/
Related Guides
Managing Node Groups - Node group operations
Creating a Cluster - Initial cluster setup
Installing Add-ons - Managing cluster add-ons
Last updated
Was this helpful?

