Part 3 - RKE2 Zero To Hero: Mastering RKE2 Configuration - From YAML to CLI

Part 3 - RKE2 Zero To Hero: Mastering RKE2 Configuration - From YAML to CLI

Welcome back to our RKE2: Zero to Hero series! If you've been following along, you've successfully set up your first single-node RKE2 cluster in Part 1 and scaled it to a multi-node powerhouse in Part 2. Now it's time to roll up our sleeves and dive into the fascinating world of RKE2 configuration. Think of this as the moment when you graduate from using a basic toolkit to wielding a Swiss Army knife with precision and finesse.

In this third installment, we'll transform you from someone who can follow installation scripts to a configuration wizard who can bend RKE2 to their will. By the time we're done, you'll understand how to customize every aspect of your cluster, from network settings to storage backends, and everything in between. Fair warning: after mastering these concepts, you might find yourself offering to "optimize" your colleagues' clusters at every opportunity (they'll thank you later, probably).

Understanding RKE2's Configuration Hierarchy

Before we start tweaking knobs and adjusting settings, let's understand how RKE2 prioritizes configuration sources . RKE2 follows a specific order of precedence that would make even the most diplomatic negotiator proud:

  1. Command-line arguments (highest priority)
  2. Environment variables (middle priority)
  3. Configuration file (lowest priority, but most practical)

This hierarchy means that CLI arguments will override environment variables, which in turn override configuration file settings . While this flexibility is powerful, the configuration file approach is generally recommended for production environments because it provides persistence and easier management .

The Heart of RKE2: The config.yaml File

The primary configuration file lives at /etc/rancher/rke2/config.yaml and serves as the central nervous system of your RKE2 cluster . Unlike some configuration files that seem designed by committee, RKE2's YAML structure is refreshingly straightforward and human-readable.

Creating Your Configuration File

First things first: RKE2 doesn't create this file automatically, so you'll need to create it manually :

sudo touch /etc/rancher/rke2/config.yaml
sudo chmod 600 /etc/rancher/rke2/config.yaml

The restrictive permissions are intentional since this file will contain sensitive information like tokens and TLS certificates .

Basic Configuration Structure

Here's a foundational example that demonstrates the YAML structure :

# /etc/rancher/rke2/config.yaml
write-kubeconfig-mode: "0644"
tls-san:
  - "foo.local"
  - "192.168.1.100"
node-label:
  - "environment=production"
  - "role=control-plane"
debug: true

Notice how repeatable CLI arguments become YAML lists, and boolean flags translate to true or false values . This direct mapping makes it easy to convert between CLI and YAML configurations.

Advanced Configuration Options

RKE2 offers an extensive array of configuration options that can be overwhelming at first glance . Let's break down the most commonly used categories:

Server Configuration Options

For server nodes (control plane), you have access to numerous configuration parameters :

# Cluster networking
cluster-cidr: "10.42.0.0/16"
service-cidr: "10.43.0.0/16"
cluster-dns: "10.43.0.10"

# API server configuration
kube-apiserver-arg:
  - "audit-log-maxage=30"
  - "audit-log-maxbackup=3"

# etcd configuration
etcd-arg:
  - "auto-compaction-retention=1h"
  - "max-request-bytes=33554432"

# TLS and security
tls-san:
  - "my-cluster.example.com"
  - "192.168.1.100"

Agent Configuration Options

Agent nodes have their own set of configuration parameters focused on workload execution :

# Node configuration
node-name: "worker-01"
node-label:
  - "node-type=compute"
  - "storage=ssd"
node-taint:
  - "gpu=true:NoSchedule"

# Kubelet configuration
kubelet-arg:
  - "max-pods=110"
  - "cluster-dns=10.43.0.10"

# Container runtime settings
container-runtime-endpoint: "unix:///run/containerd/containerd.sock"

Working with Multiple Configuration Files

One of RKE2's more sophisticated features is support for multiple configuration files . This allows you to organize configurations logically and manage complex setups more effectively:

# Main configuration
/etc/rancher/rke2/config.yaml

# Additional configurations (loaded alphabetically)
/etc/rancher/rke2/config.yaml.d/networking.yaml
/etc/rancher/rke2/config.yaml.d/storage.yaml
/etc/rancher/rke2/config.yaml.d/security.yaml

When using multiple files, the last value found for a given key takes precedence . However, you can append values instead of replacing them by adding a + suffix to the key :

# config.yaml
node-label:
  - "environment=production"

# config.yaml.d/additional.yaml
node-label+:
  - "team=platform"
  - "cost-center=engineering"

CLI Arguments and Environment Variables

While the configuration file is the recommended approach, understanding CLI arguments and environment variables provides valuable flexibility . Every configuration file parameter has a corresponding CLI flag and often an environment variable equivalent .

CLI Argument Examples

# Starting RKE2 with CLI arguments
rke2 server \
  --write-kubeconfig-mode "0644" \
  --tls-san "foo.local" \
  --node-label "environment=production" \
  --debug

Environment Variable Usage

# Using environment variables
export RKE2_KUBECONFIG_MODE="0644"
export RKE2_DEBUG="true"
export RKE2_CONFIG_FILE="/custom/path/config.yaml"

systemctl start rke2-server

Environment variables are particularly useful in containerized deployments or when using configuration management tools .

Disabling Default Components

One of RKE2's strengths is its ability to run with minimal components when needed . By default, RKE2 includes several packaged components that you might not need in every scenario:

  • rke2-canal (CNI)
  • rke2-coredns (DNS)
  • rke2-ingress-nginx (Ingress Controller)
  • rke2-metrics-server (Metrics)
  • rke2-snapshot-controller (Volume Snapshots)

Disabling Components

To disable unwanted components, use the disable configuration option :

# /etc/rancher/rke2/config.yaml
disable:
  - rke2-ingress-nginx
  - rke2-metrics-server

This approach is particularly valuable for security-conscious environments where minimizing the attack surface is paramount . Remember that disabling core components like CoreDNS or CNI will require you to provide alternatives .

The .skip File Method

Alternatively, you can create .skip files for specific manifests :

# Prevent metrics-server from being installed
sudo touch /var/lib/rancher/rke2/server/manifests/rke2-metrics-server.yaml.skip

Note that .skip files only prevent installation; they won't remove already deployed components .

Custom Networking Configuration

Networking is where RKE2 really shines, offering multiple CNI options and extensive customization capabilities . RKE2 bundles four primary CNI plugins: Canal (default), Cilium, Calico, and Flannel .

Choosing Your CNI

To specify a different CNI, use the cni parameter :

# Using Calico instead of Canal
cni: calico

# Custom CNI configuration
cni-conf-dir: /etc/cni/net.d
cni-bin-dir: /opt/cni/bin

Advanced Networking Options

For production environments, you'll often need more sophisticated networking configurations :

# Custom cluster networking
cluster-cidr: "172.16.0.0/16"
service-cidr: "172.17.0.0/16"
cluster-dns: "172.17.0.10"

# Flannel backend configuration
flannel-backend: "wireguard"
flannel-iface: "eth1"

# Disable kube-proxy for advanced networking
disable-kube-proxy: true

Multi-CNI with Multus

For advanced use cases requiring multiple network interfaces, RKE2 supports Multus as a secondary CNI :

# Enable Multus alongside primary CNI
cni: canal
multus: true

Storage Backend Configuration

While RKE2 doesn't include a built-in storage solution, it provides several options for integrating with storage backends . The configuration depends on your chosen storage solution.

Local Storage Configuration

For development or specific use cases, you might want to configure local storage paths :

# Custom data directory
data-dir: "/opt/rke2/data"

# kubelet configuration for local storage
kubelet-arg:
  - "root-dir=/opt/rke2/kubelet"

External Storage Integration

For production environments, you'll typically integrate with external storage systems through CSI drivers :

# Disable default storage components to use custom CSI
disable:
  - rke2-snapshot-controller
  - rke2-snapshot-controller-crd

# Custom volume plugin directory
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec"

Advanced etcd Configuration

etcd is the backbone of your Kubernetes cluster, and RKE2 provides extensive options for customizing its behavior . These configurations are crucial for performance and security in production environments:

# etcd performance tuning
etcd-arg:
  - "auto-compaction-retention=1h"
  - "max-request-bytes=33554432"
  - "quota-backend-bytes=8589934592"  # 8GB
  - "heartbeat-interval=100"
  - "election-timeout=1000"

# etcd security configuration
etcd-arg:
  - "cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256"
  - "tls-min-version=1.2"

etcd Backup Configuration

Automated backups are essential for production clusters :

etcd-snapshot-schedule-cron: "0 */12 * * *"  # Every 12 hours
etcd-snapshot-retention: 5
etcd-snapshot-dir: "/var/lib/rancher/rke2/server/db/snapshots"

Security Hardening Through Configuration

RKE2 is designed with security in mind, but additional hardening is often required for production environments . Here are key security configurations:

Pod Security Standards

# Enable Pod Security Standards
kube-apiserver-arg:
  - "admission-control-config-file=/etc/rancher/rke2/pss.yaml"
  - "enable-admission-plugins=NodeRestriction,PodSecurity"

# Disable insecure features
kube-apiserver-arg:
  - "anonymous-auth=false"
  - "insecure-port=0"

Network Security

# Restrict kubelet API access
kubelet-arg:
  - "anonymous-auth=false"
  - "authorization-mode=Webhook"

# Enable network policies
cni: calico  # Calico supports network policies natively

Configuration Management Best Practices

Managing RKE2 configurations effectively requires following established best practices :

Version Control

Always store your configuration files in version control:

# Create a dedicated repository for cluster configs
mkdir rke2-configs
cd rke2-configs
git init

# Organize by environment
mkdir -p {production,staging,development}/config.yaml.d

Environment-Specific Configurations

Use the multiple configuration files feature to manage environment differences :

# base/config.yaml - Common configuration
write-kubeconfig-mode: "0644"
debug: false

# production/config.yaml.d/production.yaml - Production overrides
etcd-snapshot-schedule-cron: "0 */6 * * *"
node-label+:
  - "environment=production"

# staging/config.yaml.d/staging.yaml - Staging overrides  
debug: true
node-label+:
  - "environment=staging"

Validation and Testing

Before applying configuration changes, validate them in a test environment :

# Test configuration syntax
rke2 server --dry-run --config /etc/rancher/rke2/config.yaml

# Check for configuration conflicts
systemctl stop rke2-server
rke2 server --config /etc/rancher/rke2/config.yaml --dry-run

Troubleshooting Configuration Issues

Even with careful planning, configuration issues can arise . Here's a systematic approach to troubleshooting:

Common Configuration Problems

  1. Syntax errors in YAML files
  2. Conflicting parameters between files
  3. Permission issues with configuration files
  4. Network connectivity problems after CNI changes

Debugging Techniques

Enable debug logging to get detailed information about configuration processing :

debug: true

Check the RKE2 service logs for configuration-related errors:

# View recent logs
journalctl -u rke2-server --since "10 minutes ago"

# Follow logs in real-time
journalctl -u rke2-server -f

Configuration Validation

Use RKE2's built-in validation capabilities :

# Validate configuration without starting services
rke2 server --config /etc/rancher/rke2/config.yaml --dry-run

Performance Optimization Through Configuration

Proper configuration can significantly impact cluster performance . Here are key areas to focus on:

Resource Allocation

# Optimize kubelet resource management
kubelet-arg:
  - "max-pods=110"
  - "kube-reserved=cpu=200m,memory=512Mi"
  - "system-reserved=cpu=200m,memory=512Mi"

Network Performance

# Optimize networking for high throughput
flannel-backend: "host-gw"  # Better performance than VXLAN
disable-network-policy: false  # Only if you don't need network policies

etcd Performance

# Tune etcd for better performance
etcd-arg:
  - "heartbeat-interval=100"
  - "election-timeout=1000"
  - "max-snapshots=3"
  - "max-wals=3"

Preparing for Production Workloads

As you prepare to move from configuration experimentation to production deployment, consider these final recommendations :

High Availability Configuration

# Configure for HA deployment
tls-san:
  - "api.k8s.example.com"
  - "192.168.1.100"
  - "192.168.1.101"
  - "192.168.1.102"

# etcd configuration for HA
etcd-expose-metrics: true
etcd-snapshot-retention: 10

Monitoring and Observability

# Enable metrics collection
kube-controller-manager-arg:
  - "bind-address=0.0.0.0"
kube-scheduler-arg:
  - "bind-address=0.0.0.0"

Backup and Recovery Planning

# Automated backup configuration
etcd-snapshot-schedule-cron: "0 2 * * *"  # Daily at 2 AM
etcd-snapshot-retention: 14
etcd-s3-endpoint: "s3.amazonaws.com"
etcd-s3-bucket: "my-cluster-backups"

Looking Ahead

You've now mastered the art of RKE2 configuration, transforming from someone who follows installation guides to a configuration expert who can tailor clusters to meet specific requirements. You understand how to use YAML files, CLI arguments, and environment variables effectively, how to disable unnecessary components, and how to optimize networking and storage configurations.

With these configuration skills under your belt, you're ready for the next phase of your RKE2 journey. In Part 4, "Deploying Applications: From Hello World to Production," we'll put your beautifully configured cluster to work by deploying real applications. You'll learn how to create deployment manifests, expose services, implement Ingress controllers, and use Helm charts to streamline application deployment. The theoretical knowledge from the first three parts will finally come together as you start running actual workloads on your cluster.

Your cluster is configured, optimized, and ready for action. Time to make it earn its keep by running some applications that will actually justify all this careful configuration work you've done.