> For the complete documentation index, see [llms.txt](https://docs.cloud.olakrutrim.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.cloud.olakrutrim.com/basics/core-infrastructure/krutrim-kubernetes-system/upgrading-kubernetes.md).

# Upgrading Kubernetes

This guide covers the process of upgrading your Kubernetes cluster to a newer version in Krutrim Kubernetes Service.

## Overview

Upgrading a Kubernetes cluster in KKS is a two-phase process:

* Control Plane Upgrade: Upgrades the Kubernetes control plane components
* Node Group Upgrade: Upgrades worker nodes in each node group individually

Important: These are separate operations. Upgrading the cluster version only upgrades the control plane. You must upgrade each node group separately to complete the cluster upgrade.

## How Kubernetes Version Upgrade Works

### Phase 1: Control Plane Upgrade

When you upgrade the Kubernetes version:

```
Control Plane Upgrade (Automatic)

✓ API Server upgraded to new version
✓ Controller Manager upgraded
✓ Scheduler upgraded
✓ etcd compatibility verified

Worker Nodes: Still running OLD version
```

After control plane upgrade:

* ✅ Control plane runs the new Kubernetes version
* ⚠️ Worker nodes still run the old version
* ✅ Cluster remains operational (Kubernetes supports version skew)
* ⚠️ You must upgrade node groups to complete the process

### Phase 2: Node Group Upgrade

After upgrading the control plane, you must upgrade each node group:

```
Node Group Upgrade (Rolling Update Process)

1. New node with updated version joins cluster
2. Wait for new node to become Ready
3. Old node is cordoned (no new pods scheduled)
4. Old node is drained (pods evicted)
5. Old node is removed from cluster
6. Repeat for next node...
```

Rolling update ensures:

* No downtime for properly configured workloads
* Pods are rescheduled to healthy nodes
* One node upgraded at a time
* Cluster capacity maintained during upgrade

## Prerequisites

Before upgrading your Kubernetes cluster:

### Check Version Compatibility

* ✅ You can only upgrade to the next minor version (e.g., 1.27 → 1.28)
* ❌ Cannot skip versions (e.g., 1.27 → 1.29)
* ✅ Control plane must be upgraded before node groups
* ✅ Check available versions in Krutrim platform

### Review Release Notes

* Review Kubernetes release notes for the target version
* Check for deprecated APIs or breaking changes
* Verify your applications are compatible with the new version

### Backup Critical Data

* Backup any critical application data
* Document current cluster configuration
* Take note of current cluster state

### Check Cluster Health

```bash
# Check all nodes are Ready
kubectl get nodes

# Check all system pods are running
kubectl get pods -n kube-system

# Check critical workloads are healthy
kubectl get pods -A
```

## Upgrading the Control Plane

{% stepper %}
{% step %}

### Initiate Control Plane Upgrade

Upgrade the cluster's Kubernetes version through the Krutrim platform:

```bash
# Using Krutrim CLI (example)
krutrim cluster upgrade --cluster-id <cluster-id> --version 1.28.0

# Or via API
# Check Krutrim API documentation for specific endpoints
```

{% endstep %}

{% step %}

### Monitor Control Plane Upgrade

```bash
# Check cluster status
# Cluster status will show UPGRADING during the process
# Wait for cluster to return to PROVISIONED state
# This typically takes 5-15 minutes
```

{% endstep %}

{% step %}

### Verify Control Plane Upgrade

```bash
# Check API server version
kubectl version --short

# Output example:
# Client Version: v1.27.0
# Server Version: v1.28.0  ← Control plane upgraded

# Check node versions (still old version)
kubectl get nodes
# Nodes will still show v1.27.0
```

After control plane upgrade:

* ✅ Control plane is now running the new version
* ⚠️ Node groups still need to be upgraded
* ✅ Cluster is functional with version skew
  {% endstep %}
  {% endstepper %}

## Upgrading Node Groups

### Critical: Prepare for Node Group Upgrades

Before upgrading each node group, ensure smooth operation.

#### Ensure Pods Can Be Rescheduled

```bash
# Check PodDisruptionBudgets (PDBs)
kubectl get pdb -A

# Review each PDB to ensure it allows disruptions
kubectl describe pdb <pdb-name> -n <namespace>
```

Common issues:

* PDB with `minAvailable: 100%` will block draining
* Not enough replicas to satisfy PDB during drain
* Single-replica deployments without PDB

Solution example (adjust PDB):

```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 1  # Allow draining as long as 1 pod remains
  selector:
    matchLabels:
      app: myapp
```

#### Move Critical Workloads (If Necessary)

For critical single-replica workloads or workloads that cannot tolerate disruption:

```bash
# Option 1: Scale up temporarily
kubectl scale deployment <deployment-name> --replicas=2 -n <namespace>

# Option 2: Migrate to a different node group
# Use node selectors or taints to move workloads
kubectl edit deployment <deployment-name> -n <namespace>
```

#### Check Node Drain Blockers

```bash
# Check for bare pods (pods without controller)
kubectl get pods -A --field-selector spec.nodeName=<node-name>

# Bare pods will be deleted and not rescheduled
# Convert to Deployment/StatefulSet/DaemonSet before upgrade
```

{% stepper %}
{% step %}

### Upgrade Node Groups One by One

Important: Upgrade node groups one at a time to maintain cluster stability.

Recommended upgrade order:

* Non-critical node groups first (development, testing)
* General workload node groups (application nodes)
* Critical node groups last (production, stateful workloads)
  {% endstep %}

{% step %}

### Upgrade Process for Each Node Group

```bash
# Using Krutrim CLI (example)
krutrim nodegroup upgrade \
  --cluster-id <cluster-id> \
  --nodegroup-id <nodegroup-id> \
  --version 1.28.0

# Or via API
# Check Krutrim API documentation for specific endpoints
```

{% endstep %}

{% step %}

### Monitor Node Group Upgrade

During the upgrade, the platform performs a rolling update:

```bash
# Watch nodes being updated
kubectl get nodes -w
```

Example output:

```
NAME         STATUS   ROLES    AGE   VERSION
node-1-old   Ready    <none>   10d   v1.27.0
node-2-old   Ready    <none>   10d   v1.27.0
node-3-new   Ready    <none>   1m    v1.28.0  ← New node joins
node-1-old   Ready,SchedulingDisabled  10d  v1.27.0  ← Old node cordoned
node-1-old   NotReady,SchedulingDisabled  10d  v1.27.0  ← Draining
# node-1-old removed
node-4-new   Ready    <none>   1m    v1.28.0  ← Next new node joins
```

Per-node process:

1. New node with updated version is created
2. New node joins cluster and becomes Ready
3. Old node is cordoned (no new pods scheduled)
4. Old node is drained (pods evicted gracefully)
5. Old node is removed after successful drain
6. Process repeats for next node
   {% endstep %}

{% step %}

### Handle Stuck Node Upgrades

Symptoms:

* Node group upgrade stuck in UPGRADING state
* Old node stuck in "Draining" state
* Node group upgrade not progressing

Cause: Old node cannot be drained due to:

* PodDisruptionBudget blocking drain
* Pods with `emptyDir` volumes
* Bare pods (no controller)
* Pods with local storage

Diagnosis:

```bash
# Check which pods are blocking drain
kubectl get pods -A --field-selector spec.nodeName=<stuck-node-name>

# Check PodDisruptionBudgets
kubectl get pdb -A

# Check for drain events
kubectl get events --field-selector involvedObject.name=<stuck-node-name>
```

Resolution options:

Option 1: Fix PodDisruptionBudget

```bash
# Temporarily adjust PDB to allow draining
kubectl edit pdb <pdb-name> -n <namespace>
# Change minAvailable or maxUnavailable to allow disruption
```

Option 2: Scale Up Application

```bash
# Add more replicas to satisfy PDB during drain
kubectl scale deployment <deployment-name> --replicas=3 -n <namespace>
```

Option 3: Delete Blocking Pods (Careful!)

```bash
# For bare pods or stuck pods (understand impact first!)
kubectl delete pod <pod-name> -n <namespace> --grace-period=0 --force
```

Option 4: Contact Support

```bash
# If issue persists, contact Krutrim support with:
# - Cluster ID
# - Node group ID
# - Stuck node name
# - Output of: kubectl get pods -A --field-selector spec.nodeName=<node-name>
```

{% endstep %}

{% step %}

### Verify Node Group Upgrade

After each node group upgrade completes:

```bash
# Check all nodes in the node group are updated
kubectl get nodes -l nodegroup=<nodegroup-name>

# Verify node versions
kubectl get nodes -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion

# Check all pods are running
kubectl get pods -A -o wide

# Verify workloads are healthy
kubectl get deployments -A
kubectl get statefulsets -A
```

{% endstep %}

{% step %}

### Repeat for Remaining Node Groups

Repeat the previous steps for each remaining node group until all node groups are upgraded.
{% endstep %}
{% endstepper %}

## Best Practices for Smooth Upgrades

Do's

* Always Upgrade Control Plane First
  * Control plane must be at the same or newer version than nodes
  * Node groups cannot be newer than control plane
* Upgrade Node Groups One at a Time
  * Wait for each node group upgrade to complete
  * Verify workloads are healthy before proceeding
  * Maintain cluster stability
* Prepare Your Workloads
  * Ensure multiple replicas for critical services
  * Configure appropriate PodDisruptionBudgets
  * Use Deployments/StatefulSets instead of bare pods

Example PDB:

```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: myapp
```

* Test Node Drainability Before Upgrade

```bash
# Test if a node can be drained (dry run)
kubectl drain <node-name> --dry-run=client --ignore-daemonsets
```

* Monitor During Upgrade

```bash
# Watch nodes
kubectl get nodes -w

# Watch pods being rescheduled
kubectl get pods -A -w

# Check events
kubectl get events -A --watch
```

* Schedule Upgrades During Maintenance Windows
  * Plan upgrades during low-traffic periods
  * Notify users of potential brief disruptions
  * Have rollback plan ready
* Upgrade Non-Production Clusters First
  * Test upgrade process in dev/staging
  * Identify potential issues before production
  * Validate application compatibility

Don'ts

* Don't Skip Kubernetes Versions
  * ❌ Cannot upgrade 1.27 → 1.29
  * ✅ Must upgrade 1.27 → 1.28 → 1.29
* Don't Upgrade Multiple Node Groups Simultaneously
  * Can cause cluster instability
  * Harder to troubleshoot issues
  * May exceed resource limits
* Don't Ignore PodDisruptionBudgets
  * PDBs can block node draining
  * Review and adjust PDBs before upgrade
  * Ensure PDBs allow at least some disruption
* Don't Use Bare Pods in Production
  * Bare pods are deleted during drain (not rescheduled)
  * Always use Deployments, StatefulSets, or DaemonSets
  * Controllers ensure pods are recreated
* Don't Upgrade Without Testing
  * Test upgrade in non-production first
  * Verify application compatibility
  * Check for deprecated APIs
* Don't Forget About Version Skew
  * Control plane and nodes can differ by 1 minor version
  * Don't leave nodes on old version indefinitely
  * Complete all node group upgrades within reasonable time
* Don't Ignore Failed Drains
  * Investigate why drain failed
  * Fix underlying issue
  * Don't force drain without understanding impact

## Troubleshooting Upgrade Issues

<details>

<summary>Control Plane Upgrade Stuck</summary>

Symptoms:

* Cluster stuck in UPGRADING state
* Control plane upgrade not completing

Solution:

* Check cluster status in Krutrim platform
* Review error messages
* Contact Krutrim support with cluster ID

</details>

<details>

<summary>Node Group Upgrade Not Starting</summary>

Symptoms:

* Node group remains in current version
* No new nodes being created

Possible Causes:

* Control plane not upgraded yet
* Invalid target version
* Insufficient quotas

Solution:

```bash
# Verify control plane is upgraded
kubectl version --short

# Check node group status in Krutrim platform

# Verify target version is valid

# Check OpenStack quotas for instance creation
```

</details>

<details>

<summary>Pods Failing After Upgrade</summary>

Symptoms:

* Pods in CrashLoopBackOff after upgrade
* Services not working correctly

Possible Causes:

* Application incompatible with new Kubernetes version
* Deprecated APIs removed
* Configuration issues

Solution:

```bash
# Check pod logs
kubectl logs <pod-name> -n <namespace>

# Check pod events
kubectl describe pod <pod-name> -n <namespace>

# Review Kubernetes deprecation notices

# Check release notes for breaking changes

# Roll back if necessary (may require cluster restore)
```

</details>

<details>

<summary>Node Stuck in NotReady After Upgrade</summary>

Symptoms:

* New node stuck in NotReady state
* Node not joining cluster properly

Solution:

```bash
# Check node conditions
kubectl describe node <node-name>

# Check kubelet logs on the node
# (requires node access)

# Check CNI pods
kubectl get pods -n kube-system -l k8s-app=cilium

# Contact Krutrim support if issue persists
```

</details>

## Version Skew Policy

Kubernetes supports running control plane and nodes at different versions (within limits):

Supported Version Skew:

```
Control Plane: v1.28.x
Node Groups:   v1.27.x or v1.28.x  ✅ Supported (1 minor version difference)
Node Groups:   v1.26.x             ❌ Not supported (2 minor versions)
```

Recommendations:

* Upgrade control plane first
* Upgrade all node groups within 1-2 weeks
* Don't leave node groups more than 1 version behind
* Complete upgrades before next version release

## Rollback Considerations

Important: Kubernetes upgrades are typically one-way operations.

Control Plane Rollback:

* Not typically supported
* May require cluster restore from backup
* Contact Krutrim support for assistance

Node Group Rollback:

* Can create new node group with old version
* Migrate workloads to old version node group
* Remove upgraded node group

Prevention is Better:

* Test upgrades in non-production first
* Verify application compatibility
* Have rollback plan documented
* Take backups before upgrading

## Post-Upgrade Tasks

After completing the upgrade:

### Verify Cluster Health

```bash
# Check all nodes are running new version
kubectl get nodes -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion

# Check all pods are running
kubectl get pods -A

# Check system components
kubectl get pods -n kube-system

# Check critical workloads
kubectl get deployments -A
kubectl get statefulsets -A
```

### Update Documentation

* Document the upgrade date and version
* Note any issues encountered and resolutions
* Update cluster documentation with new version

### Update Client Tools

```bash
# Update kubectl to match cluster version
# Download from: https://kubernetes.io/docs/tasks/tools/

# Verify kubectl version
kubectl version --client
```

### Review Deprecated APIs

* Check for deprecated API warnings
* Update manifests to use newer APIs
* Test applications thoroughly

### Monitor Cluster

* Monitor cluster performance
* Watch for any unusual behavior
* Check application metrics and logs

## Additional Resources

* Kubernetes Release Notes: <https://kubernetes.io/releases/>
* Krutrim Documentation: Check platform docs for version upgrade procedures
* Version Skew Policy: <https://kubernetes.io/releases/version-skew-policy/>

## Related Guides

* [Managing Node Groups](https://docs.cloud.olakrutrim.com/basics/core-infrastructure/krutrim-kubernetes-system/managing-nodegroups) - Node group operations
* [Creating a Cluster](https://docs.cloud.olakrutrim.com/basics/core-infrastructure/krutrim-kubernetes-system/creating-cluster) - Initial cluster setup
* [Installing Add-ons](https://docs.cloud.olakrutrim.com/basics/core-infrastructure/krutrim-kubernetes-system/installing-addons) - Managing cluster add-ons


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cloud.olakrutrim.com/basics/core-infrastructure/krutrim-kubernetes-system/upgrading-kubernetes.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
