Upgrading Kubernetes

This guide covers the process of upgrading your Kubernetes cluster to a newer version in Krutrim Kubernetes Service.

Overview

Upgrading a Kubernetes cluster in KKS is a two-phase process:

Control Plane Upgrade: Upgrades the Kubernetes control plane components
Node Group Upgrade: Upgrades worker nodes in each node group individually

Important: These are separate operations. Upgrading the cluster version only upgrades the control plane. You must upgrade each node group separately to complete the cluster upgrade.

How Kubernetes Version Upgrade Works

Phase 1: Control Plane Upgrade

When you upgrade the Kubernetes version:

Control Plane Upgrade (Automatic)

✓ API Server upgraded to new version
✓ Controller Manager upgraded
✓ Scheduler upgraded
✓ etcd compatibility verified

Worker Nodes: Still running OLD version

After control plane upgrade:

✅ Control plane runs the new Kubernetes version
⚠️ Worker nodes still run the old version
✅ Cluster remains operational (Kubernetes supports version skew)
⚠️ You must upgrade node groups to complete the process

Phase 2: Node Group Upgrade

After upgrading the control plane, you must upgrade each node group:

Node Group Upgrade (Rolling Update Process)

1. New node with updated version joins cluster
2. Wait for new node to become Ready
3. Old node is cordoned (no new pods scheduled)
4. Old node is drained (pods evicted)
5. Old node is removed from cluster
6. Repeat for next node...

Rolling update ensures:

No downtime for properly configured workloads
Pods are rescheduled to healthy nodes
One node upgraded at a time
Cluster capacity maintained during upgrade

Prerequisites

Before upgrading your Kubernetes cluster:

Check Version Compatibility

✅ You can only upgrade to the next minor version (e.g., 1.27 → 1.28)
❌ Cannot skip versions (e.g., 1.27 → 1.29)
✅ Control plane must be upgraded before node groups
✅ Check available versions in Krutrim platform

Review Release Notes

Review Kubernetes release notes for the target version
Check for deprecated APIs or breaking changes
Verify your applications are compatible with the new version

Backup Critical Data

Backup any critical application data
Document current cluster configuration
Take note of current cluster state

Check Cluster Health

# Check all nodes are Ready
kubectl get nodes

# Check all system pods are running
kubectl get pods -n kube-system

# Check critical workloads are healthy
kubectl get pods -A

Upgrading the Control Plane

Initiate Control Plane Upgrade

Upgrade the cluster's Kubernetes version through the Krutrim platform:

# Using Krutrim CLI (example)
krutrim cluster upgrade --cluster-id <cluster-id> --version 1.28.0

# Or via API
# Check Krutrim API documentation for specific endpoints

Monitor Control Plane Upgrade

# Check cluster status
# Cluster status will show UPGRADING during the process
# Wait for cluster to return to PROVISIONED state
# This typically takes 5-15 minutes

Verify Control Plane Upgrade

# Check API server version
kubectl version --short

# Output example:
# Client Version: v1.27.0
# Server Version: v1.28.0  ← Control plane upgraded

# Check node versions (still old version)
kubectl get nodes
# Nodes will still show v1.27.0

After control plane upgrade:

✅ Control plane is now running the new version
⚠️ Node groups still need to be upgraded
✅ Cluster is functional with version skew

Upgrading Node Groups

Critical: Prepare for Node Group Upgrades

Before upgrading each node group, ensure smooth operation.

Ensure Pods Can Be Rescheduled

# Check PodDisruptionBudgets (PDBs)
kubectl get pdb -A

# Review each PDB to ensure it allows disruptions
kubectl describe pdb <pdb-name> -n <namespace>

Common issues:

PDB with minAvailable: 100% will block draining
Not enough replicas to satisfy PDB during drain
Single-replica deployments without PDB

Solution example (adjust PDB):

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 1  # Allow draining as long as 1 pod remains
  selector:
    matchLabels:
      app: myapp

Move Critical Workloads (If Necessary)

For critical single-replica workloads or workloads that cannot tolerate disruption:

# Option 1: Scale up temporarily
kubectl scale deployment <deployment-name> --replicas=2 -n <namespace>

# Option 2: Migrate to a different node group
# Use node selectors or taints to move workloads
kubectl edit deployment <deployment-name> -n <namespace>

Check Node Drain Blockers

# Check for bare pods (pods without controller)
kubectl get pods -A --field-selector spec.nodeName=<node-name>

# Bare pods will be deleted and not rescheduled
# Convert to Deployment/StatefulSet/DaemonSet before upgrade

Upgrade Node Groups One by One

Important: Upgrade node groups one at a time to maintain cluster stability.

Recommended upgrade order:

Non-critical node groups first (development, testing)
General workload node groups (application nodes)
Critical node groups last (production, stateful workloads)

Upgrade Process for Each Node Group

# Using Krutrim CLI (example)
krutrim nodegroup upgrade \
  --cluster-id <cluster-id> \
  --nodegroup-id <nodegroup-id> \
  --version 1.28.0

# Or via API
# Check Krutrim API documentation for specific endpoints

Monitor Node Group Upgrade

During the upgrade, the platform performs a rolling update:

# Watch nodes being updated
kubectl get nodes -w

Example output:

NAME         STATUS   ROLES    AGE   VERSION
node-1-old   Ready    <none>   10d   v1.27.0
node-2-old   Ready    <none>   10d   v1.27.0
node-3-new   Ready    <none>   1m    v1.28.0  ← New node joins
node-1-old   Ready,SchedulingDisabled  10d  v1.27.0  ← Old node cordoned
node-1-old   NotReady,SchedulingDisabled  10d  v1.27.0  ← Draining
# node-1-old removed
node-4-new   Ready    <none>   1m    v1.28.0  ← Next new node joins

Per-node process:

New node with updated version is created
New node joins cluster and becomes Ready
Old node is cordoned (no new pods scheduled)
Old node is drained (pods evicted gracefully)
Old node is removed after successful drain
Process repeats for next node

Handle Stuck Node Upgrades

Symptoms:

Node group upgrade stuck in UPGRADING state
Old node stuck in "Draining" state
Node group upgrade not progressing

Cause: Old node cannot be drained due to:

PodDisruptionBudget blocking drain
Pods with emptyDir volumes
Bare pods (no controller)
Pods with local storage

Diagnosis:

# Check which pods are blocking drain
kubectl get pods -A --field-selector spec.nodeName=<stuck-node-name>

# Check PodDisruptionBudgets
kubectl get pdb -A

# Check for drain events
kubectl get events --field-selector involvedObject.name=<stuck-node-name>

Resolution options:

Option 1: Fix PodDisruptionBudget

# Temporarily adjust PDB to allow draining
kubectl edit pdb <pdb-name> -n <namespace>
# Change minAvailable or maxUnavailable to allow disruption

Option 2: Scale Up Application

# Add more replicas to satisfy PDB during drain
kubectl scale deployment <deployment-name> --replicas=3 -n <namespace>

Option 3: Delete Blocking Pods (Careful!)

# For bare pods or stuck pods (understand impact first!)
kubectl delete pod <pod-name> -n <namespace> --grace-period=0 --force

Option 4: Contact Support

# If issue persists, contact Krutrim support with:
# - Cluster ID
# - Node group ID
# - Stuck node name
# - Output of: kubectl get pods -A --field-selector spec.nodeName=<node-name>

Verify Node Group Upgrade

After each node group upgrade completes:

# Check all nodes in the node group are updated
kubectl get nodes -l nodegroup=<nodegroup-name>

# Verify node versions
kubectl get nodes -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion

# Check all pods are running
kubectl get pods -A -o wide

# Verify workloads are healthy
kubectl get deployments -A
kubectl get statefulsets -A

Repeat for Remaining Node Groups

Repeat the previous steps for each remaining node group until all node groups are upgraded.

Best Practices for Smooth Upgrades

Do's

Always Upgrade Control Plane First
- Control plane must be at the same or newer version than nodes
- Node groups cannot be newer than control plane
Upgrade Node Groups One at a Time
- Wait for each node group upgrade to complete
- Verify workloads are healthy before proceeding
- Maintain cluster stability
Prepare Your Workloads
- Ensure multiple replicas for critical services
- Configure appropriate PodDisruptionBudgets
- Use Deployments/StatefulSets instead of bare pods

Example PDB:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: myapp

Test Node Drainability Before Upgrade

# Test if a node can be drained (dry run)
kubectl drain <node-name> --dry-run=client --ignore-daemonsets

Monitor During Upgrade

# Watch nodes
kubectl get nodes -w

# Watch pods being rescheduled
kubectl get pods -A -w

# Check events
kubectl get events -A --watch

Schedule Upgrades During Maintenance Windows
- Plan upgrades during low-traffic periods
- Notify users of potential brief disruptions
- Have rollback plan ready
Upgrade Non-Production Clusters First
- Test upgrade process in dev/staging
- Identify potential issues before production
- Validate application compatibility

Don'ts

Don't Skip Kubernetes Versions
- ❌ Cannot upgrade 1.27 → 1.29
- ✅ Must upgrade 1.27 → 1.28 → 1.29
Don't Upgrade Multiple Node Groups Simultaneously
- Can cause cluster instability
- Harder to troubleshoot issues
- May exceed resource limits
Don't Ignore PodDisruptionBudgets
- PDBs can block node draining
- Review and adjust PDBs before upgrade
- Ensure PDBs allow at least some disruption
Don't Use Bare Pods in Production
- Bare pods are deleted during drain (not rescheduled)
- Always use Deployments, StatefulSets, or DaemonSets
- Controllers ensure pods are recreated
Don't Upgrade Without Testing
- Test upgrade in non-production first
- Verify application compatibility
- Check for deprecated APIs
Don't Forget About Version Skew
- Control plane and nodes can differ by 1 minor version
- Don't leave nodes on old version indefinitely
- Complete all node group upgrades within reasonable time
Don't Ignore Failed Drains
- Investigate why drain failed
- Fix underlying issue
- Don't force drain without understanding impact

Troubleshooting Upgrade Issues

Control Plane Upgrade Stuck

Symptoms:

Cluster stuck in UPGRADING state
Control plane upgrade not completing

Solution:

Check cluster status in Krutrim platform
Review error messages
Contact Krutrim support with cluster ID

Node Group Upgrade Not Starting

Symptoms:

Node group remains in current version
No new nodes being created

Possible Causes:

Control plane not upgraded yet
Invalid target version
Insufficient quotas

Solution:

# Verify control plane is upgraded
kubectl version --short

# Check node group status in Krutrim platform

# Verify target version is valid

# Check OpenStack quotas for instance creation

Pods Failing After Upgrade

Symptoms:

Pods in CrashLoopBackOff after upgrade
Services not working correctly

Possible Causes:

Application incompatible with new Kubernetes version
Deprecated APIs removed
Configuration issues

Solution:

# Check pod logs
kubectl logs <pod-name> -n <namespace>

# Check pod events
kubectl describe pod <pod-name> -n <namespace>

# Review Kubernetes deprecation notices

# Check release notes for breaking changes

# Roll back if necessary (may require cluster restore)

Node Stuck in NotReady After Upgrade

Symptoms:

New node stuck in NotReady state
Node not joining cluster properly

Solution:

# Check node conditions
kubectl describe node <node-name>

# Check kubelet logs on the node
# (requires node access)

# Check CNI pods
kubectl get pods -n kube-system -l k8s-app=cilium

# Contact Krutrim support if issue persists

Version Skew Policy

Kubernetes supports running control plane and nodes at different versions (within limits):

Supported Version Skew:

Control Plane: v1.28.x
Node Groups:   v1.27.x or v1.28.x  ✅ Supported (1 minor version difference)
Node Groups:   v1.26.x             ❌ Not supported (2 minor versions)

Recommendations:

Upgrade control plane first
Upgrade all node groups within 1-2 weeks
Don't leave node groups more than 1 version behind
Complete upgrades before next version release

Rollback Considerations

Important: Kubernetes upgrades are typically one-way operations.

Control Plane Rollback:

Not typically supported
May require cluster restore from backup
Contact Krutrim support for assistance

Node Group Rollback:

Can create new node group with old version
Migrate workloads to old version node group
Remove upgraded node group

Prevention is Better:

Test upgrades in non-production first
Verify application compatibility
Have rollback plan documented
Take backups before upgrading

Post-Upgrade Tasks

After completing the upgrade:

Verify Cluster Health

# Check all nodes are running new version
kubectl get nodes -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion

# Check all pods are running
kubectl get pods -A

# Check system components
kubectl get pods -n kube-system

# Check critical workloads
kubectl get deployments -A
kubectl get statefulsets -A

Update Documentation

Document the upgrade date and version
Note any issues encountered and resolutions
Update cluster documentation with new version

Update Client Tools

# Update kubectl to match cluster version
# Download from: https://kubernetes.io/docs/tasks/tools/

# Verify kubectl version
kubectl version --client

Review Deprecated APIs

Check for deprecated API warnings
Update manifests to use newer APIs
Test applications thoroughly

Monitor Cluster

Monitor cluster performance
Watch for any unusual behavior
Check application metrics and logs

Additional Resources

Kubernetes Release Notes: https://kubernetes.io/releases/
Krutrim Documentation: Check platform docs for version upgrade procedures
Version Skew Policy: https://kubernetes.io/releases/version-skew-policy/

Managing Node Groups - Node group operations
Creating a Cluster - Initial cluster setup
Installing Add-ons - Managing cluster add-ons

Last updated 2 months ago

Was this helpful?

hashtagOverview

hashtagHow Kubernetes Version Upgrade Works

hashtagPhase 1: Control Plane Upgrade

hashtagPhase 2: Node Group Upgrade

hashtagPrerequisites

hashtagCheck Version Compatibility

hashtagReview Release Notes

hashtagBackup Critical Data

hashtagCheck Cluster Health

hashtagUpgrading the Control Plane

hashtagInitiate Control Plane Upgrade

hashtagMonitor Control Plane Upgrade

hashtagVerify Control Plane Upgrade

hashtagUpgrading Node Groups

hashtagCritical: Prepare for Node Group Upgrades

hashtagEnsure Pods Can Be Rescheduled

hashtagMove Critical Workloads (If Necessary)

hashtagCheck Node Drain Blockers

hashtagUpgrade Node Groups One by One

hashtagUpgrade Process for Each Node Group

hashtagMonitor Node Group Upgrade

hashtagHandle Stuck Node Upgrades

hashtagVerify Node Group Upgrade

hashtagRepeat for Remaining Node Groups

hashtagBest Practices for Smooth Upgrades

hashtagTroubleshooting Upgrade Issues

hashtagVersion Skew Policy

hashtagRollback Considerations

hashtagPost-Upgrade Tasks

hashtagVerify Cluster Health

hashtagUpdate Documentation

hashtagUpdate Client Tools

hashtagReview Deprecated APIs

hashtagMonitor Cluster

hashtagAdditional Resources

hashtagRelated Guides

Overview

How Kubernetes Version Upgrade Works

Phase 1: Control Plane Upgrade

Phase 2: Node Group Upgrade

Prerequisites

Check Version Compatibility

Review Release Notes

Backup Critical Data

Check Cluster Health

Upgrading the Control Plane

Initiate Control Plane Upgrade

Monitor Control Plane Upgrade

Verify Control Plane Upgrade

Upgrading Node Groups

Critical: Prepare for Node Group Upgrades

Ensure Pods Can Be Rescheduled

Move Critical Workloads (If Necessary)

Check Node Drain Blockers

Upgrade Node Groups One by One

Upgrade Process for Each Node Group

Monitor Node Group Upgrade

Handle Stuck Node Upgrades

Verify Node Group Upgrade

Repeat for Remaining Node Groups

Best Practices for Smooth Upgrades

Troubleshooting Upgrade Issues

Version Skew Policy

Rollback Considerations

Post-Upgrade Tasks

Verify Cluster Health

Update Documentation

Update Client Tools

Review Deprecated APIs

Monitor Cluster

Additional Resources

Related Guides