kubernetes

Kubespray: Zero to Hero - Part 3: The Big Leap – Automated Kubernetes Deployment with Kubespray

AB Engineering

24 Jul 2025 • 6 min read

With your infrastructure properly prepared from Part 1 and your certificate authority established in Part 2, you're ready to deploy a production-grade Kubernetes cluster using Kubespray's powerful automation capabilities. This part will guide you through inventory configuration, playbook execution, troubleshooting strategies, and comprehensive verification procedures.

Understanding Kubespray Architecture

Kubespray leverages Ansible as its core automation engine, providing a declarative approach to cluster management through Infrastructure as Code principles. The system utilizes a combination of Ansible playbooks and kubeadm under the hood, offering exceptional flexibility in component selection and configuration.

The default Kubespray installation provides a comprehensive stack including etcd for cluster state storage, containerd as the container runtime, Calico for networking, and nginx for ingress. This composable architecture allows you to customize virtually every aspect of your deployment while maintaining production-ready defaults.

Inventory Configuration and Management

Creating Your Inventory Structure

Begin by copying the sample inventory and establishing your cluster-specific configuration:

cp -rfp inventory/sample inventory/mycluster

The inventory structure follows Ansible's recommended layout, organizing configuration files in logical groups. Your primary inventory file resides at inventory/mycluster/hosts.yaml, while group-specific variables are managed in inventory/mycluster/group_vars/.

Automated Inventory Generation

Kubespray provides a powerful inventory builder script that automatically generates the hosts.yaml file from IP addresses:

declare -a IPS=(10.10.1.3 10.10.1.4 10.10.1.5)
CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}

This script intelligently distributes nodes across control plane, etcd, and worker roles based on your cluster size and requirements.

Manual Inventory Configuration

For precise control over node roles and configurations, you can manually configure the inventory file:

all:
  children:
    kube_control_plane:
      hosts:
        node1:
          ansible_host: 10.10.1.3
          ansible_user: ubuntu
    kube_node:
      hosts:
        node2:
          ansible_host: 10.10.1.4
          ansible_user: ubuntu
        node3:
          ansible_host: 10.10.1.5
          ansible_user: ubuntu
    etcd:
      hosts:
        node1:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:

This configuration defines node roles, SSH connection parameters, and cluster topology.

Advanced Configuration Management

Group Variables Structure

Kubespray's configuration flexibility shines through its group variables system. The inventory/mycluster/group_vars/ directory contains several key files:

all/all.yml: Global cluster settings
k8s_cluster/k8s-cluster.yml: Kubernetes-specific configurations
etcd.yml: etcd cluster parameters

Essential Configuration Parameters

Configure critical cluster parameters in your group variables:

# inventory/mycluster/group_vars/all/all.yml
cluster_name: mycluster
kube_version: v1.29.5
kube_network_plugin: calico
container_manager: containerd
etcd_deployment_type: host

# inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
kube_proxy_mode: iptables
kube_apiserver_bind_address: 0.0.0.0
kube_controller_manager_bind_address: 0.0.0.0
kube_scheduler_bind_address: 0.0.0.0

These parameters directly influence cluster behavior, security posture, and operational characteristics.

Network Plugin Configuration

Kubespray supports multiple Container Network Interface (CNI) plugins. For production deployments, consider these options:

Calico: Default choice offering robust network policies and excellent performance
Flannel: Lightweight option suitable for simpler network requirements
Cilium: Advanced option providing enhanced security and observability features

Configure your chosen CNI plugin in the k8s-cluster.yml file.

Executing the Deployment

Pre-deployment Validation

Before initiating the deployment, verify your configuration and connectivity:

# Test SSH connectivity to all nodes
ansible -i inventory/mycluster/hosts.yaml -m ping all

# Validate inventory structure
ansible-inventory -i inventory/mycluster/hosts.yaml --graph --vars

Successful connectivity tests are crucial for deployment success.

The Deployment Command

Execute the main deployment playbook with appropriate privileges:

ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml

This command initiates the comprehensive deployment process, which typically takes 15-30 minutes depending on your cluster size and network conditions.

Understanding the Deployment Process

The cluster.yml playbook orchestrates multiple phases:

Preparation Phase: Fact gathering and prerequisite validation
Container Engine Installation: Deployment of containerd or docker
etcd Cluster Setup: Distributed key-value store configuration
Kubernetes Node Preparation: Installation of kubelet, kubeadm, and kubectl
Control Plane Initialization: Master node setup and configuration
Worker Node Integration: Joining workers to the cluster
Network Plugin Deployment: CNI installation and configuration
Application Layer Setup: Installation of core cluster applications

Each phase includes comprehensive error handling and idempotency checks.

Troubleshooting Common Issues

SSH and Connectivity Problems

Authentication failures often stem from incorrect SSH key configuration or user permissions. Verify that your SSH keys are properly distributed and the specified user has sudo privileges without password prompts.

Inventory Configuration Errors

Common inventory issues include:

Missing or incorrect ansible_host parameters
Inconsistent user configurations across nodes
Improper group membership assignments

The kubespray community has reported specific issues with inventory file recognition, particularly with YAML formatting.

Network Configuration Issues

Network-related failures frequently occur due to:

Firewall rules blocking required ports
Incorrect network plugin configuration
DNS resolution problems

Ensure your infrastructure meets the networking requirements established in Part 1.

Container Runtime Problems

Container runtime issues may manifest as:

Docker daemon connectivity problems
Registry access failures
Image pull timeouts

Verify that your nodes can reach the configured container registries and that the container runtime is properly configured.

Resource Constraints

Monitor system resources during deployment. Insufficient memory, CPU, or disk space can cause deployment failures. Kubernetes control plane components require adequate resources to function properly.

Deployment Verification

Cluster Status Verification

After successful deployment, verify cluster functionality:

# Check node status
kubectl get nodes -o wide

# Verify system pods
kubectl get pods --all-namespaces

# Check cluster information
kubectl cluster-info

All nodes should report "Ready" status, and system pods should be running without errors.

Component Health Checks

Verify individual cluster components:

# Check etcd cluster health
kubectl get componentstatuses

# Verify kube-system pods
kubectl get pods -n kube-system

# Test DNS resolution
kubectl run test-pod --image=busybox --restart=Never -- nslookup kubernetes.default

These commands validate that core cluster services are operational.

Network Connectivity Testing

Test network functionality across the cluster:

# Deploy a test application
kubectl create deployment nginx --image=nginx

# Expose the application
kubectl expose deployment nginx --port=80 --type=ClusterIP

# Test connectivity from another pod
kubectl run test-client --image=busybox --restart=Never -it -- wget -qO- nginx

Successful connectivity indicates proper network plugin configuration.

Certificate Verification

Validate that your custom certificates from Part 2 are properly integrated:

# Check certificate details
kubectl config view --raw -o json | jq '.clusters.cluster."certificate-authority-data"' | base64 -d | openssl x509 -text -noout

# Verify API server certificate
echo | openssl s_client -connect :6443 2>/dev/null | openssl x509 -noout -text

Performance Baseline Testing

Establish performance baselines for your cluster:

# Run cluster performance tests
kubectl apply -f https://raw.githubusercontent.com/kubernetes/perf-tests/master/clusterloader2/testing/density/config.yaml

# Monitor resource utilization
kubectl top nodes
kubectl top pods --all-namespaces

Advanced Deployment Scenarios

High Availability Configuration

For production environments, configure multiple control plane nodes:

# inventory/mycluster/hosts.yaml
all:
  children:
    kube_control_plane:
      hosts:
        master1:
          ansible_host: 10.10.1.3
        master2:
          ansible_host: 10.10.1.4
        master3:
          ansible_host: 10.10.1.5
    etcd:
      hosts:
        master1:
        master2:
        master3:

This configuration provides fault tolerance and eliminates single points of failure.

Load Balancer Integration

Configure external load balancers for API server high availability:

# inventory/mycluster/group_vars/all/all.yml
loadbalancer_apiserver:
  address: 10.10.1.100
  port: 8443

This setup distributes API server traffic across multiple control plane nodes.

Custom etcd Configuration

Kubespray implements an incremental etcd cluster configuration approach for stability during scaling operations. This method adds nodes one at a time to maintain quorum throughout the process.

Post-Deployment Considerations

Accessing Your Cluster

Configure kubectl access from your management node:

# Copy kubeconfig from master node
scp user@master1:/etc/kubernetes/admin.conf ~/.kube/config

# Verify connectivity
kubectl get nodes

Scaling Considerations

Plan for future scaling requirements. Kubespray supports adding nodes through the scale.yml playbook:

# Add new nodes to inventory
# Run scale playbook
ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root scale.yml

However, be aware of known limitations with scaling operations when using the --limit flag.

Security Hardening Preparation

With your cluster operational, you're ready to implement the security hardening measures covered in Part 4. Your deployment provides the foundation for implementing RBAC, network policies, and additional security controls.

Monitoring and Maintenance

Cluster Health Monitoring

Implement continuous health monitoring:

# Check cluster events
kubectl get events --sort-by='.lastTimestamp'

# Monitor node conditions
kubectl describe nodes | grep -i condition

# Verify persistent volumes
kubectl get pv,pvc --all-namespaces

Log Analysis

Configure centralized logging for troubleshooting:

# Check kubelet logs
journalctl -u kubelet -f

# Examine pod logs
kubectl logs -n kube-system 

# Review container runtime logs
journalctl -u containerd -f

Backup Preparation

Establish backup procedures for critical cluster components:

# Create etcd snapshot
kubectl exec -n kube-system etcd-master1 -- etcdctl snapshot save /tmp/etcd-snapshot.db

# Backup cluster configuration
kubectl get all --all-namespaces -o yaml > cluster-backup.yaml

Your Kubernetes cluster is now operational and ready for production workloads. The automation provided by Kubespray has eliminated the complexity of manual cluster setup while providing the flexibility needed for enterprise deployments. In Part 4, we'll focus on hardening this foundation through comprehensive security measures, and Part 5 will cover ongoing management and scaling strategies to ensure your cluster remains robust and efficient as your requirements evolve.