Ansible AWX on a Single node Kubernetes cluster on Rocky Linux
This guide should work with all versions of Rocky Linux.
Run all commands as root.
Deploy VM
- Deploy Rocky Linux ISO with 4 cores, 12Gi of memory, 50Gi of storage
- Configure an A record for the machine and a CNAME record for AWX on your DNS server
- On install configure the fqdn as hostname and given static IP address
- Also configure the NTP server in Anaconda
- (optional) Install system helper tools
dnf install dnf-utils setroubleshoot-serverfor ongoing troubleshooting - Do all updates with
dnf update - Disable swap with
swapoff -aand remove the configuration from the fstab - Disable the firewall with
systemctl disable --now firewalld, don't have a solution for enabled firewall up to now - Reboot
Install Kubernetes
- Enable kernel modules:
modprobe br_netfilter
modprobe overlay
cat <<EOF | tee /etc/modules-load.d/k8s_kernel_modules.conf
overlay
br_netfilter
EOF
- Configure sysctl:
sysctl -w net.bridge.bridge-nf-call-ip6tables=1
sysctl -w net.bridge.bridge-nf-call-iptables=1
sysctl -w net.ipv4.ip_forward=1
cat <<EOF | tee /etc/sysctl.d/01-k8s.conf
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-iptables=1
net.ipv4.ip_forward=1
EOF
- Add the cri-o repo:
cat <<EOF | tee /etc/yum.repos.d/cri-o.repo
[cri-o]
name=CRI-O
baseurl=https://download.opensuse.org/repositories/isv:/cri-o:/stable:/v<stable-version>/rpm/
enabled=1
gpgcheck=1
gpgkey=https://download.opensuse.org/repositories/isv:/cri-o:/stable:/v<stable-version>/rpm/repodata/repomd.xml.key
EOF
- Add the kubernetes repo (make sure you use the same version as for cri-o):
cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v<stable-version>/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v<stable-version>/rpm/repodata/repomd.xml.key
EOF
- Install cri-o and Kubernetes
dnf install container-selinux cri-o cri-tools kubeadm kubectl kubelet - Enable the cri-o service
systemctl enable --now crio - Create the kubeconfig with:
cat <<EOF | tee ~/kubeconfig.yml
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
kubernetesVersion: v<installed-kubeadm-version>
networking:
podSubnet: 10.16.0.0/16
serviceSubnet: 10.96.0.0/12
EOF
- Start the kubelet
systemctl enable --now kubelet.service - Pre-download the images
kubeadm config images pull --config kubeconfig.yml - Init the kubernetes cluster
kubeadm init --skip-token-print --config kubeconfig.yml - Copy the running config with:
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
- Remove the node taints for the single-node-cluster:
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
- Make sure all pods except the 2 coredns ones are running
- Do
systemctl edit kubelet.serviceand write the following into the spacing section
[Service]
CPUAccounting=true
MemoryAccounting=true
- Restart the kubelet
systemctl restart kubelet.service - Download the sources of all needed system services:
curl -LO https://github.com/antrea-io/antrea/releases/download/v<latest-version>/antrea.yml
curl -LO https://raw.githubusercontent.com/rancher/local-path-provisioner/v<latest-version>/deploy/local-path-storage.yaml
curl -LO https://projectcontour.io/quickstart/contour.yaml
- Install antrea
kubectl apply -f antrea.yml - Make sure that all antrea and coredns pods are running
watch kubectl get po -n kube-system - Create the local-path-storage folder
mkdir -p /opt/local-path-provisioner - Install the local-path-storage provider
kubectl apply -f local-path-storage.yaml - Make sure the pod is running and the storageclass got created
watch kubectl get sc - Make this provider to the default one
kubectl patch sc local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' - Install contour
kubectl apply -f contour.yaml - Make sure that all contour pods are running
watch kubectl get po -n projectcontour
Get certificates
cert-manager / ACME
To be able to use cert-manager for signing certificates by a ACME based CA like step-ca you need to configure a cluster wide issuer.
The example configuration uses a step-ca.
- Download the source to install cert-manager:
curl -LO https://github.com/cert-manager/cert-manager/releases/download/v<latest-version>/cert-manager.yaml
- Install cert-manager
kubectl apply -f cert-manager.yaml - Make sure that all pods are running
watch kubectl get po -n cert-manager - Gather your certificate chain (especially the rootca's), and convert them to a base64 string, make sure there are no newlines at the end, this will be used for the caBundle variable
- Create the ClusterIssuer config file:
cat <<EOF | tee ~/cert-issuer.yml
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: <ca-name>
namespace: cert-manager
spec:
acme:
email: <email-address>
privateKeySecretRef:
name: <ca-name>
server: https://<ca-fqdn>/acme/<provisioner-name>/directory
caBundle: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQ0KTUlJQk9nSUJBQUpCQUtqMzR3hGaEQ5MHZjTkxZTEluRkVYNlBweTF0UGY5Q256ajRwNFdHZUtMczFQdDhRdQ0KS1VwUktmRkxmUllDOUFJS2piSlRXaXQrQ3F2aldZenZRd0VDQXdFQUFRSkFJSkxpeEJ5MnFwRm9TNERTbW9FbQ0KbzNxR3kwdDZ6MDlBSUp0SCs1T2VSVjFiZStONGNEWUpLZmZHekRhODh2UUVOWmlSbTBHUnE2YStIUEdRTWQyaw0KVFFJaEFLTVN2eklCbm5pN290L09TaWUG1KTFk0U3dUUUFldlh5c0UyUmJGRFlkQWlFQkNVRWFSUW5NbmJwNw0KOW14RFhEZjZBVTBjTi9SUEJqYjlxU0hEY1daSEd6VUNJRzJFczU5ejZ0dyRFkrcHhMUW53Zm90YWR4ZCtVeQ0Kdi9PdzVUMHE1Z0lKQWlFQXlTNFJhSTlZRzhFV3gvMncwVDY3WlVWQXc4ZU9NQjZCSVVnMFhjdSszb2tDSUJPcw0KLzVPaVBnb1RkU3k3YmNGOUlHcFNFOFpnR0t6Z1lRVlplTjk3WUUwMA0KLS0tLS1FTkQgUlNBIFBSSVZBVEUgS0VZLS0tLS0=
solvers:
- http01:
ingress:
class: contour
EOF
- And apply the config
kubectl apply -f cert-issuer.yml - You should be able to verify the config with
kubectl get ciss <ca-name>now
Manually signed certificate
- Either create a new self-signed certificate (
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -out ingress-tls.crt -keyout ingress-tls.key -subj "/CN=<awx-fqdn>/O=awx-ingress-tls") or copy a ca signed one to the machine (ingress-tls.crt and ingress-tls.key need to be only the server certificate without an empty line) - Import the certificate into kubernetes
kubectl create secret tls awx-ingress-tls --key ingress-tls.key --cert ingress-tls.crt
Install AWX
- Install missing packages
dnf install git - Download the latest kustomize version from the Github releases page
- Unpack, make it a executable and move it to
/usr/local/bin:
curl -LO https://github.com/kubernetes-sigs/kustomize/releases/download/kustomize%2Fv<latest-version>/kustomize_v<latest-version>_linux_amd64.tar.gz
tar xzvf kustomize_v<latest-version>_linux_amd64.tar.gz
chmod +x kustomize
chown root: kustomize
mv kustomize /usr/local/bin
- Download the issuing certificates and import them:
curl -O http://<crl-fqdn>/rootca.crt
kubectl create secret generic awx-custom-certs --from-file=bundle-ca.crt=/root/rootca.crt
- Create the Kustomize config file:
cat <<EOF | tee ~/kustomization.yaml
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
# Find the latest tag here: https://github.com/ansible/awx-operator/releases
- github.com/ansible/awx-operator/config/default?ref=<latest-version>
- awx.yml
# Set the image tags to match the git version from above
images:
- name: quay.io/ansible/awx-operator
newTag: <latest-version>
# Specify a custom namespace in which to install AWX
namespace: default
EOF
- Create the AWX config file for Kubernetes:
cat <<EOF | tee ~/awx.yml
---
apiVersion: v1
kind: Secret
metadata:
name: awx-admin-password
namespace: default
stringData:
password: <password>
---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: awx
spec:
ingress_type: Ingress
ingress_annotations: |
cert-manager.io/cluster-issuer: <ca-name>
ingress.kubernetes.io/force-ssl-redirect: "true"
kubernetes.io/tls-acme: "true"
ingress_hosts:
- hostname: <awx-fqdn>
tls_secret: <secret-name-used-by-cert-manager>
ingress_controller: contour
web_resource_requirements:
requests:
cpu: 400m
memory: 2Gi
limits:
cpu: 1000m
memory: 4Gi
task_resource_requirements:
requests:
cpu: 250m
memory: 1Gi
limits:
cpu: 500m
memory: 2Gi
bundle_cacert_secret: awx-custom-certs
EOF
In case you are using a manually signed certificate, which you already imported like shown above use the following AWX configuration instead:
---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: awx
spec:
ingress_type: Ingress
ingress_hosts:
- hostname: <awx-fqdn>
tls_secret: <secret-name-used-for-tls-secret>
ingress_controller: contour
web_resource_requirements:
...
- Install the awx-operator (run this step twice, make sure to wait inbetween until the awx-operator is ready)
kubectl apply -k . - Make sure the operator is running
watch kubectl get po - Make sure the pods got deployed with
kubectl logs -f deployments/awx-operator-controller-manager -c awx-manager - And
watch kubectl get ing,po,svc,pvc
Upgrade components
Kubernetes Upgrade
Warning
Upgrading a single node kubernetes cluster is always a play with the fire, make sure you always make backups/snapshots before the operation!
Make sure you always lift the cluster version before you lift the kubelet version!
Kubernetes cluster version updates
This is only possible if the kubeadm version has progressed, it's only possible to upgrade to versions lower than or exactly the kubeadm version.
- Update the kubeadm version
dnf update kubeadm - Check for updates
kubeadm upgrade plan - The plan will tell you what is possible to be upgraded (this might show incorrect options, that don't work with your kubeadm version)
- Upgrade to the wanted version with
kubeadm upgrade apply v<cluster-version>
OS updates
- Stop the kubelet, which makes sure etcd doesn't corrupt
systemctl stop kubelet - Any update operation, like
dnf update - Start kubelet again
systemctl start kubelet - Check with
journalctl -fif the containers are coming up normally again and if the CNI configures the network in a working state again - If a restart is needed, stop kubelet again and restart (sometimes only a restart gets the system working again)
Project repo path switch
The kubernetes project moved from it's home at Google to a community owned location:
https://kubernetes.io/blog/2023/10/10/cri-o-community-package-infrastructure/
This means 2 things:
- The repos for both kubernetes and cri-o switched to a different location, look for the exact paths in the guide above
- If you are using cri-o you will have to do a tricky switch, aka you will need to uninstall cri-o and then reinstall it again, as there are file/dependency conflicts between the old and new cri-o version (the whole runtime got merged into the cri-o package now):
systemctl stop kubelet
dnf remove containers-common
dnf module reset container-tools
dnf install cri-o
systemctl enable --now crio
systemctl start kubelet
And there was a second repo switch for cri-o too: github.com/cri-o/packaging
Ansible AWX Upgrade
- Change the version in the kustomization.yaml file
- Rerun
kustomize build . | kubectl apply -f - kubectl logs -f deployments/awx-operator-controller-manager -c awx-managerwatch kubectl get ing,po,svc,pvc
Antrea Upgrade
Upgrade at max 4 minor versions
mv antrea.yml antrea.yml.1curl -LO https://github.com/vmware-tanzu/antrea/releases/download/v<latest-version>/antrea.ymlkubectl apply -f antrea.ymlwatch kubectl get po -n kube-system
Contour Upgrade
https://projectcontour.io/resources/upgrading/
mv contour.yaml contour.yaml.1curl -LO https://projectcontour.io/quickstart/contour.yamlkubectl delete namespace projectcontourkubectl apply -f contour.yamlwatch kubectl get po -n projectcontour
Troubleshooting
CRI-O Problems
There might occur the issue of an ImageInspectError, CRI-O does not allow unambiguous image names anymore since 1.34.
If you still need them make sure to set short_name_mode = "disabled" in the crio.conf.
Antrea Problems
kubectl logs -n kube-system antrea-agent-<key> -c antrea-agent
Contour Problems
kubectl logs -n projectcontour deployment/contour --all-containers -f
Local Path Provisioner is unable to create pv's or applications using a pvc can't write
Most likely it's because SELinux is misbehaving. Could be that you have to set SELinux to permissive mode overall... which is sad.
https://fedoramagazine.org/kubernetes-with-cri-o-on-fedora-linux-39/