Kubespray: Zero to Hero - Part 3: The Big Leap – Automated Kubernetes Deployment with Kubespray

With your infrastructure properly prepared from Part 1 and your certificate authority established in Part 2, you're ready to deploy a production-grade Kubernetes cluster using Kubespray's powerful automation capabilities. This part will guide you through inventory configuration, playbook execution, troubleshooting strategies, and comprehensive verification procedures.
Understanding Kubespray Architecture
Kubespray leverages Ansible as its core automation engine, providing a declarative approach to cluster management through Infrastructure as Code principles. The system utilizes a combination of Ansible playbooks and kubeadm under the hood, offering exceptional flexibility in component selection and configuration.
The default Kubespray installation provides a comprehensive stack including etcd for cluster state storage, containerd as the container runtime, Calico for networking, and nginx for ingress. This composable architecture allows you to customize virtually every aspect of your deployment while maintaining production-ready defaults.
Inventory Configuration and Management
Creating Your Inventory Structure
Begin by copying the sample inventory and establishing your cluster-specific configuration:
cp -rfp inventory/sample inventory/mycluster
The inventory structure follows Ansible's recommended layout, organizing configuration files in logical groups. Your primary inventory file resides at inventory/mycluster/hosts.yaml
, while group-specific variables are managed in inventory/mycluster/group_vars/
.
Automated Inventory Generation
Kubespray provides a powerful inventory builder script that automatically generates the hosts.yaml file from IP addresses:
declare -a IPS=(10.10.1.3 10.10.1.4 10.10.1.5)
CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}
This script intelligently distributes nodes across control plane, etcd, and worker roles based on your cluster size and requirements.
Manual Inventory Configuration
For precise control over node roles and configurations, you can manually configure the inventory file:
all:
children:
kube_control_plane:
hosts:
node1:
ansible_host: 10.10.1.3
ansible_user: ubuntu
kube_node:
hosts:
node2:
ansible_host: 10.10.1.4
ansible_user: ubuntu
node3:
ansible_host: 10.10.1.5
ansible_user: ubuntu
etcd:
hosts:
node1:
k8s_cluster:
children:
kube_control_plane:
kube_node:
This configuration defines node roles, SSH connection parameters, and cluster topology.
Advanced Configuration Management
Group Variables Structure
Kubespray's configuration flexibility shines through its group variables system. The inventory/mycluster/group_vars/
directory contains several key files:
all/all.yml
: Global cluster settingsk8s_cluster/k8s-cluster.yml
: Kubernetes-specific configurationsetcd.yml
: etcd cluster parameters
Essential Configuration Parameters
Configure critical cluster parameters in your group variables:
# inventory/mycluster/group_vars/all/all.yml
cluster_name: mycluster
kube_version: v1.29.5
kube_network_plugin: calico
container_manager: containerd
etcd_deployment_type: host
# inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
kube_proxy_mode: iptables
kube_apiserver_bind_address: 0.0.0.0
kube_controller_manager_bind_address: 0.0.0.0
kube_scheduler_bind_address: 0.0.0.0
These parameters directly influence cluster behavior, security posture, and operational characteristics.
Network Plugin Configuration
Kubespray supports multiple Container Network Interface (CNI) plugins. For production deployments, consider these options:
- Calico: Default choice offering robust network policies and excellent performance
- Flannel: Lightweight option suitable for simpler network requirements
- Cilium: Advanced option providing enhanced security and observability features
Configure your chosen CNI plugin in the k8s-cluster.yml file.
Executing the Deployment
Pre-deployment Validation
Before initiating the deployment, verify your configuration and connectivity:
# Test SSH connectivity to all nodes
ansible -i inventory/mycluster/hosts.yaml -m ping all
# Validate inventory structure
ansible-inventory -i inventory/mycluster/hosts.yaml --graph --vars
Successful connectivity tests are crucial for deployment success.
The Deployment Command
Execute the main deployment playbook with appropriate privileges:
ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml
This command initiates the comprehensive deployment process, which typically takes 15-30 minutes depending on your cluster size and network conditions.
Understanding the Deployment Process
The cluster.yml
playbook orchestrates multiple phases:
- Preparation Phase: Fact gathering and prerequisite validation
- Container Engine Installation: Deployment of containerd or docker
- etcd Cluster Setup: Distributed key-value store configuration
- Kubernetes Node Preparation: Installation of kubelet, kubeadm, and kubectl
- Control Plane Initialization: Master node setup and configuration
- Worker Node Integration: Joining workers to the cluster
- Network Plugin Deployment: CNI installation and configuration
- Application Layer Setup: Installation of core cluster applications
Each phase includes comprehensive error handling and idempotency checks.
Troubleshooting Common Issues
SSH and Connectivity Problems
Authentication failures often stem from incorrect SSH key configuration or user permissions. Verify that your SSH keys are properly distributed and the specified user has sudo privileges without password prompts.
Inventory Configuration Errors
Common inventory issues include:
- Missing or incorrect ansible_host parameters
- Inconsistent user configurations across nodes
- Improper group membership assignments
The kubespray community has reported specific issues with inventory file recognition, particularly with YAML formatting.
Network Configuration Issues
Network-related failures frequently occur due to:
- Firewall rules blocking required ports
- Incorrect network plugin configuration
- DNS resolution problems
Ensure your infrastructure meets the networking requirements established in Part 1.
Container Runtime Problems
Container runtime issues may manifest as:
- Docker daemon connectivity problems
- Registry access failures
- Image pull timeouts
Verify that your nodes can reach the configured container registries and that the container runtime is properly configured.
Resource Constraints
Monitor system resources during deployment. Insufficient memory, CPU, or disk space can cause deployment failures. Kubernetes control plane components require adequate resources to function properly.
Deployment Verification
Cluster Status Verification
After successful deployment, verify cluster functionality:
# Check node status
kubectl get nodes -o wide
# Verify system pods
kubectl get pods --all-namespaces
# Check cluster information
kubectl cluster-info
All nodes should report "Ready" status, and system pods should be running without errors.
Component Health Checks
Verify individual cluster components:
# Check etcd cluster health
kubectl get componentstatuses
# Verify kube-system pods
kubectl get pods -n kube-system
# Test DNS resolution
kubectl run test-pod --image=busybox --restart=Never -- nslookup kubernetes.default
These commands validate that core cluster services are operational.
Network Connectivity Testing
Test network functionality across the cluster:
# Deploy a test application
kubectl create deployment nginx --image=nginx
# Expose the application
kubectl expose deployment nginx --port=80 --type=ClusterIP
# Test connectivity from another pod
kubectl run test-client --image=busybox --restart=Never -it -- wget -qO- nginx
Successful connectivity indicates proper network plugin configuration.
Certificate Verification
Validate that your custom certificates from Part 2 are properly integrated:
# Check certificate details
kubectl config view --raw -o json | jq '.clusters.cluster."certificate-authority-data"' | base64 -d | openssl x509 -text -noout
# Verify API server certificate
echo | openssl s_client -connect :6443 2>/dev/null | openssl x509 -noout -text
Performance Baseline Testing
Establish performance baselines for your cluster:
# Run cluster performance tests
kubectl apply -f https://raw.githubusercontent.com/kubernetes/perf-tests/master/clusterloader2/testing/density/config.yaml
# Monitor resource utilization
kubectl top nodes
kubectl top pods --all-namespaces
Advanced Deployment Scenarios
High Availability Configuration
For production environments, configure multiple control plane nodes:
# inventory/mycluster/hosts.yaml
all:
children:
kube_control_plane:
hosts:
master1:
ansible_host: 10.10.1.3
master2:
ansible_host: 10.10.1.4
master3:
ansible_host: 10.10.1.5
etcd:
hosts:
master1:
master2:
master3:
This configuration provides fault tolerance and eliminates single points of failure.
Load Balancer Integration
Configure external load balancers for API server high availability:
# inventory/mycluster/group_vars/all/all.yml
loadbalancer_apiserver:
address: 10.10.1.100
port: 8443
This setup distributes API server traffic across multiple control plane nodes.
Custom etcd Configuration
Kubespray implements an incremental etcd cluster configuration approach for stability during scaling operations. This method adds nodes one at a time to maintain quorum throughout the process.
Post-Deployment Considerations
Accessing Your Cluster
Configure kubectl access from your management node:
# Copy kubeconfig from master node
scp user@master1:/etc/kubernetes/admin.conf ~/.kube/config
# Verify connectivity
kubectl get nodes
Scaling Considerations
Plan for future scaling requirements. Kubespray supports adding nodes through the scale.yml playbook:
# Add new nodes to inventory
# Run scale playbook
ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root scale.yml
However, be aware of known limitations with scaling operations when using the --limit
flag.
Security Hardening Preparation
With your cluster operational, you're ready to implement the security hardening measures covered in Part 4. Your deployment provides the foundation for implementing RBAC, network policies, and additional security controls.
Monitoring and Maintenance
Cluster Health Monitoring
Implement continuous health monitoring:
# Check cluster events
kubectl get events --sort-by='.lastTimestamp'
# Monitor node conditions
kubectl describe nodes | grep -i condition
# Verify persistent volumes
kubectl get pv,pvc --all-namespaces
Log Analysis
Configure centralized logging for troubleshooting:
# Check kubelet logs
journalctl -u kubelet -f
# Examine pod logs
kubectl logs -n kube-system
# Review container runtime logs
journalctl -u containerd -f
Backup Preparation
Establish backup procedures for critical cluster components:
# Create etcd snapshot
kubectl exec -n kube-system etcd-master1 -- etcdctl snapshot save /tmp/etcd-snapshot.db
# Backup cluster configuration
kubectl get all --all-namespaces -o yaml > cluster-backup.yaml
Your Kubernetes cluster is now operational and ready for production workloads. The automation provided by Kubespray has eliminated the complexity of manual cluster setup while providing the flexibility needed for enterprise deployments. In Part 4, we'll focus on hardening this foundation through comprehensive security measures, and Part 5 will cover ongoing management and scaling strategies to ensure your cluster remains robust and efficient as your requirements evolve.