You've successfully subscribed to Florin Loghiade
Great! Next, complete checkout for full access to Florin Loghiade
Welcome back! You've successfully signed in
Success! Your account is fully activated, you now have access to all content.
AKS - Node OS options and reasons why you care.

AKS - Node OS options and reasons why you care.

Kubernetes itself runs on Linux by default, and the management plane is purely Linux; however, we can add Windows node pools. So, what should we care about about Node OS?
By default, when you create an AKS cluster, you get the standard Ubuntu 20.04 image as the node pool OS, but you have another option available, which is Azure Linux.

Azure Linux is an operating system image built by Microsoft to provide an optimized experience while running container workloads in AKS. Microsoft maintains the image 100% and keeps it updated as new upstream patches emerge.

Choosing Azure Linux as the main operating system for AKS deployments offers several compelling advantages. Azure Linux is a streamlined, security-focused OS that integrates seamlessly with Azure, ensuring optimal performance and stability for your cloud-native applications.

It is explicitly engineered for Azure, meaning it's optimized for cloud performance and scalability. It's designed to work efficiently with Azure services, reducing compatibility issues and ensuring your applications run smoothly. This tight integration can lead to better resource utilization, lower latency, and a smoother user experience.

Going away from the fluff, there are a lot of benefits of going this route rather than the default Ubuntu image route. 

The key benefits are:


You get "what you pay for," meaning that the image contains the absolute minimum packages required for the workloads to run on, no more, no less. It has around 500 packages, which can sound a lot, but the default Ubuntu node image has over 700. This means a lower attack surface and less disk space used for the host. This, combined with the cluster's Ephemeral mode, creates more SKU options for the node pool.

Secure by default, the supply chain handled.

The base idea of the image is to be as secure as possible by default; this means that by using this principle, along with cloud optimizations and hardened kernel, you rely less on security patches and maintenance of other packages. For example, all the clusters I manage have installed so that the nodes reboot daily and have their patches applied. The principle of KuRED is a daemon set which monitors the nodes for the reboot-required file and then starts doing reboots. This file appears every day, so all my clusters reboot every day. Gone are the days when Linux systems had 1+ years of uptime. 

Supply chain attacks are also reduced because Microsoft and the teams build this OS, build, test and sign the image using packages maintained in-house, compared to taking an Ubuntu image and throwing a battery of tests at it. 

Would this solve all supply chain attacks? No, this is not a utopia, but it is the closest you can get. I'm not trying to drink the Kool-Aid, but the attack surface is widely reduced when fewer vendors are involved.


You might ask, how would this break my existing deployments? Well, it's simple; it shouldn't, as it's been engineered in such a way to preserve compatibility so that your workloads would only benefit from it and not suffer.

This blog runs on Kubernetes with the Azure OS Linux image. I run many systems inside the cluster to aid the development and testing that I do. Dapr, Keda, VPA, and Nginx are not affected. Workloads running various languages from PowerShell, Python, C# and Rust are unaffected. 

I used my cluster as the dogfood environment before I started upgrading the clusters I manage in production, and so far, it's been so good; I write after a while to not fall into the fanboy Bandcamp :)

So what are the caveats?

There are some caveats if you're doing a lot of ML / AI work because this container host image doesn't support the latest and greatest VM SKUs like the A100 series.

Migrating towards Azure Linux is pretty straightforward. 

#the process requires new nodepools so create one, cordon and drain the others and then remove the other nodepools

$CLUSTER_NAME = "<cluster-name>"
$RESOURCE_GROUP = "<resource-group>"
$NODEPOOL_NAME = "<nodepool-name>"
$REMOVE_NODEPOOL = "<nodepool-name>"

az aks nodepool add --resource-group $RESOURCE_GROUP --cluster-name $CLUSTER_NAME --name $NODEPOOL_NAME --mode System --os-sku AzureLinux

#remove existing nodepools 
az aks nodepool delete --resource-group $RESOURCE_GROUP --cluster-name $CLUSTER_NAME --name $REMOVE_NODEPOOL = "<nodepool-name>"

And that's it! I, for one, will advocate for using Azure Linux as long as I don't see issues, and if I see any, I will write about it.

Have a good one!