Managing Nodegroups
Node groups are collections of worker nodes that run your containerized applications. This guide covers creating, scaling, and managing node groups in your Kubernetes cluster.
What is a Node Group?
A node group is a set of Kubernetes worker nodes with identical configuration:
Instance Type: CPU, memory, and other hardware specifications
Disk Size: Storage capacity for each node
Scaling: Minimum, maximum, and desired node count
Labels: Key-value pairs for workload scheduling
Taints: Restrictions on which pods can be scheduled
Network: Subnet configuration
Why Node Groups Matter
Node groups allow you to:
Separate Workloads: System services vs. applications
Optimize Resources: Different instance types for different needs
Control Costs: Scale and size appropriately
Isolate Workloads: Use taints and labels for pod placement
Critical: Untainted Node Groups
Always create at least 1–2 nodes without taints before creating any specialized node groups.
Essential cluster components require untainted nodes:
CoreDNS: DNS resolution for services and pods
Cilium (or your CNI): Pod networking
Without untainted nodes:
Essential add-ons cannot be scheduled
Cluster will not function properly
Pods cannot start or communicate
DNS resolution will fail
Recommended First Node Group:
Creating a Node Group
Node Group Configuration
When creating a node group, you'll need to configure the following settings.
Basic Settings
Node Group Name
Choose a descriptive name.
Naming Rules:
Maximum 100 characters
Lowercase alphanumeric characters, hyphens (-), and dots (.)
Must start and end with alphanumeric characters
No consecutive dots (..) or hyphens (--)
Examples:
✅ Good:
general-nodes,app-workers,gpu-nodes-prod❌ Bad:
ng1,nodes,test
Instance Type (Flavor)
Select the compute resources for your nodes.
Available Instance Types:
Choosing Instance Type:
Disk Configuration
Configure root disk size for each node.
Disk Size Examples:
Minimum: 50 GB
Default: 80 GB (if not specified)
Recommended:
General nodes: 80-100 GB
Application nodes: 100-200 GB
Image-heavy workloads: 200+ GB
What uses disk space?
Planning Disk Size:
Scaling Configuration
Define how your node group scales.
Scaling Configuration
Min Size: Minimum number of nodes (always running)
Cannot be less than 0
Should be at least 1 for production
Can be 0 for dev/test environments (but node group creation needs at least 1)
Max Size: Maximum number of nodes (limit for scaling)
Must be >= Min Size
Set based on maximum expected load
Consider account quota limits
Desired Size: Target number of nodes (current goal)
Must be between Min Size and Max Size
Can be adjusted later
Cluster autoscaler can modify this
Scaling Examples:
Validation Rules:
Network Configuration
Subnet Selection
Subnet KRN (Required):
Select the subnet where node network interfaces will be created
Must be in the same VPC as the cluster
Ensure sufficient IP addresses available
IP Address Planning:
Labels Configuration (Optional)
Labels are key-value pairs used for pod scheduling.
Common Label Patterns:
Usage Example:
Taints Configuration (Optional)
Taints restrict which pods can be scheduled on nodes. Pods must have matching tolerations.
Taint Structure:
Taint Effects:
NoSchedule:
Hard requirement
Pods without toleration will NOT be scheduled
Existing pods not affected
PreferNoSchedule:
Soft requirement
System tries to avoid scheduling
Will schedule if no other option
NoExecute:
Evicts running pods without toleration
Prevents new pods from being scheduled
Use with caution!
Common Taint Scenarios:
Scenario: Dedicated GPU Nodes
Scenario: High-Priority Production Nodes
Scenario: General Purpose Nodes (NO TAINTS!)
DO NOT add taints to your first node group!
❌ If all nodes are tainted, system pods cannot schedule and the cluster becomes non-functional.
✅ Ensure at least 1–2 nodes without taints so system pods can schedule and the cluster remains functional.
Remote Access Configuration (Optional)
Enable SSH access to nodes for debugging:
SSH Key Selection:
Choose from your existing SSH keys
Required for SSH access to nodes
Recommended for troubleshooting
Security Groups:
Select security groups for SSH access
Restrict SSH access to specific IPs/networks
Follow security best practices
When to Enable:
✅ Development/testing environments
✅ Troubleshooting scenarios
⚠️ Production (only if necessary with strict security)
Node Repair Configuration (Optional)
Automatic node health monitoring and repair:
Node Repair Configuration:
What it does:
Monitors node health
Detects failed or unhealthy nodes
Automatically replaces unhealthy nodes
Helps maintain cluster availability
When to enable:
✅ Production clusters (recommended)
✅ Critical workloads
✅ Long-running clusters
When to disable:
Development environments
Short-lived clusters
Manual node management preference
Creating the Node Group
After configuring all settings:
Name and instance type
Scaling configuration
Labels and taints
Network settings
Verification Checklist:
Submit the node group configuration to begin creation.
Node Group Lifecycle
Creation Process
CREATING → SCALINGUP → RUNNING
Timeline:
Initial setup: 1–2 minutes
Node provisioning: 3–5 minutes per node
Kubernetes join: 1–2 minutes per node
Total: 5–10 minutes for 2–3 nodes
What's happening:
Monitoring Creation:
Node Group States
CREATING: Initial setup in progress
SCALINGUP: Adding nodes
RUNNING: Operational and healthy
SCALINGDOWN: Removing nodes
UPDATING: Configuration or version update
FAILED: Operation failed (check error message)
PENDING_DELETE: Deletion initiated
DELETING: Removal in progress
Scaling Node Groups
Manual Scaling
Update desired size to scale your node group:
Configuration Update:
Modify the Desired Size parameter
Submit the configuration change
Scaling Up (Desired > Current):
Scaling Down (Desired < Current):
Important: Nodes are drained before removal. Ensure:
Automatic Scaling
If Cluster Autoscaler add-on is installed:
How it works:
Scale Up: Pods cannot be scheduled → Add nodes
Scale Down: Nodes underutilized → Remove nodes
Configuration:
Autoscaler respects min/max size
Can modify desired size automatically
Checks for un-schedulable pods
Monitors node utilization
Scaling Behavior:
Updating Node Groups
Update Configuration
Update scaling or repair settings for your node group.
Updatable Settings:
✅ Min size
✅ Max size
✅ Desired size
✅ Node repair configuration
What cannot be updated:
❌ Instance type (create new node group)
❌ Disk size (create new node group)
❌ Labels (recreate nodes)
❌ Taints (recreate nodes)
❌ Subnet (create new node group)
Update Kubernetes Version
Keep node group version aligned with cluster.
Rolling update process:
Timeline: ~5–10 minutes per node
Important Considerations:
Node Group Best Practices
✅ Do's
Create Untainted Nodes First
Start with nodes that have NO taints
Ensure essential components can schedule
Wait for nodes to be Ready before creating tainted nodes
Separate Workloads
System components: Dedicated untainted nodes
Applications: Separate node groups by purpose
Specialized: GPU, high-memory, etc.
Plan Capacity
Set appropriate min/max for each node group
Consider peak load in max size
Allow headroom for updates
Use Meaningful Labels
Label nodes by purpose, environment, team
Document label schema
Use labels for pod scheduling
Configure Node Repair
Enable for production node groups
Improves reliability
Reduces manual intervention
Right-Size Instances
Match instance type to workload
Don't over-provision
Monitor and adjust
❌ Don'ts
Don't Taint All Nodes
Always have untainted nodes for essential components
Cilium and CoreDNS need untainted nodes
Don't Under-Size Nodes
Minimum 2vcpu-4gb for general workloads
Cluster components need resources
Don't Forget Disk Space
Plan for images, logs, temp storage
Monitor disk usage
Increase if nodes run out of space
Don't Set Min = Max
Allow scaling flexibility
Use autoscaler for efficiency
Unless fixed size is required
Don't Block Draining
Avoid aggressive PodDisruptionBudgets
Plan for node updates
Allow graceful termination
Common Node Group Patterns
Pattern: Standard Three-Tier
Node Group Planning Example:
Pattern: Environment Separation
Troubleshooting Node Groups
Additional Resources
Installing Add-ons - Install CNI and other essential add-ons
Creating Cluster Guide - Cluster setup process
Troubleshooting Guide - Common issues and solutions
Last updated
Was this helpful?

