AKS Automatic - Kubernetes without headaches
This article examines AKS Automatic, a new, managed way to run Kubernetes on Azure. If you want to simplify cluster management and reduce manual work, read on to see if AKS Automatic fits your needs.
I remember seeing the launch, and I told myself, "This is a cool thing. I hope it sticks and doesn't tank." Two years later, it was accepted into the CNCF incubator, giving me more confidence in the product. Fast-forward to 2024, and we see Dapr being used by many companies.
Microsoft launched a new open-source project around Kubernetes and microservices: TechCrunch.
So, before I go on, this is Dapr:
Dapr, or Distributed Application Runtime, is a portable runtime that makes it easy for any developer to build resilient distributed applications that run across cloud and edge. It provides integrated APIs for communication, state, and workflow for building production-ready applications. It leverages industry best practices for security, resiliency, and observability, increasing developer productivity by between 20 and 40 per cent.
Cloud Native Computing Foundation Announces Dapr Graduation | CNCF
Going further, Dapr is a runtime that abstracts common microservice patterns into reusable components. Instead of writing custom code for service discovery, messaging, or state persistence, you leverage Dapr's sidecar architecture to handle these tasks. This means your microservices can remain lean, focusing on business logic while Dapr handles the hard work.
Anyway, I always wanted to try Dapr in production, but I never got around to it as there were always other projects that needed to be worked on. During an idea crunch, I was thinking of building a small platform to provide APIs to different automations so we don't have to rewrite every time the same PowerShell Function in Python, C# or Go, a process that added a lot of toil and wasn't very efficient; you change one function in one place, and you need to adapt the same function in all languages and in all the places they are referenced. This was not scalable or efficient. The idea came to me to build this secure platform that scales enough to use it everywhere. That moment hit me, and this is an excellent scenario for Dapr as it provides every building block I need without redesigning the wheel.
This was two years ago, and now I've been running a big-ish platform backed by Dapr, which serves over one hundred automation systems and 20-30 more. As Dapr supports multiple languages, we don't have a language barrier. The main plugins are written in Python, Go, and C# with some PowerShell, which Dapr does not entirely support. Still, nothing stops you from using the Pub/Sub ability in combination with Azure Functions.
Now, this article is not about Dapr per se but about running Dapr in Kubernetes, and I will talk about what I've learned during these two years running it. While Dapr is cloud-agnostic, running it in a managed environment like AKS brings enormous benefits. As I said, it's cloud-agnostic, so it can be locally, EKS, GKE, a VM, or whatever you like.
Let's start with some theory; then, we can dive into my experiences to have a common language.
Dapr's features are exposed as APIs. For instance, the state API allows you to save and retrieve the state, while the pub/sub-API lets you publish events to a message broker. This design makes it easy to switch implementations without changing application code.
Component Model:
Dapr uses components to configure integrations. You define components as YAML files and deploy them to your cluster. For example, you might have a component for the Redis state store or an Azure Service Bus pub/sub.
Sidecar Injection:
Dapr can automatically inject the sidecar into your application pods via admission controllers when running in Kubernetes. You only need to annotate your deployments, and Dapr does the rest.
Observability and Resiliency:
Built-in support for metrics and distributed tracing, Dapr makes monitoring and debugging complex microservice interactions easier. It also includes features like retries, circuit breakers, and timeouts to build resilient services.
Now that we understand the purpose of Dapr, the next question is why you should run it in Kubernetes.
Managed Control Plane:
The cloud manages the Kubernetes control plane for you, ensuring high availability and automatic patching. This means you don't have to worry about the controller nodes or etcd backups.
Scalability:
With Kubernetes, you can quickly scale your applications horizontally by adding more nodes. Integrated autoscaling features help your applications handle variable workloads without manual intervention.
Simplified Upgrades:
Kubernetes upgrades are handled by the cloud provider with minimal downtime. You can update your cluster with the latest security patches and features without significant manual effort.
If you haven't installed the Dapr CLI, you can use a simple shell script. On a Unix-based system:
wget -q https://raw.githubusercontent.com/dapr/cli/master/install/install.sh -O - | /bin/bash
After installation, verify that the CLI is available:
dapr --version
Once your CLI is ready, initialize Dapr:
dapr init --kubernetes
This command does the following:
The output should confirm that Dapr has been successfully initialized. You can then verify the deployment:
kubectl get pods -n dapr-system
You should see several pods running, such as:
These are the core components that enable Dapr's runtime in your cluster.
dapr status -k
The default installation of Dapr is designed for most scenarios. However, you can modify the Helm chart values if you need to tweak settings (for example, changing resource limits for the sidecar injector). Dapr's installation under the hood uses Helm charts, and advanced users may wish to override certain values. For instance, you could export the default values, modify them, and then upgrade the deployment:
helm show values dapr/dapr -n dapr-system > dapr-values.yaml
Edit dapr-values.yaml as needed, then:
helm upgrade dapr dapr/dapr -f dapr-values.yaml -n dapr-system
Now that Dapr is installed let's deploy a sample application. In this section, we'll walk through deploying a microservice that leverages Dapr for state management and service invocation
apiVersion: apps/v1
kind: Deployment
metadata:
name: dapr-demo-app
labels:
app: dapr-demo-app
spec:
replicas: 2
selector:
matchLabels:
app: dapr-demo-app
template:
metadata:
labels:
app: dapr-demo-app
annotations:
dapr.io/enabled: "true"
dapr.io/app-id: "dapr-demo-app"
dapr.io/app-port: "3000"
spec:
containers:
- name: demo
image: myregistry.azurecr.io/dapr-demo-app:latest
ports:
- containerPort: 3000
Key points in this manifest:
Dapr's service invocation API supports both HTTP and gRPC. To secure these communications:
Avoid hardcoding secrets in your application code or configuration. Dapr provides a secrets management building block:
Scalability is one of the key benefits of running microservices in AKS. When combined with Dapr, you get a platform that not only scales your application containers but also manages the communication overhead between services. In this section, we'll explore strategies for scaling and performance optimization.
AKS supports Horizontal Pod Autoscaling, and Dapr-enabled applications are no exception. HPA monitors CPU and memory usage (or custom metrics) and adjusts the number of replicas dynamically.
For example, to scale your Dapr-enabled application based on CPU usage, you might create an HPA resource:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: dapr-demo-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: dapr-demo-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
With HPA in place, your application will automatically scale out under load and scale back in during idle periods, optimizing resource usage and cost.
Running it in production for over two years, I found very few issues with the technology itself, but I found most configuration issues on my side. I encountered problems regarding configuration middlewares where I incorrectly set up the retry logic, which considerably slowed down the applications. At first, you think you want the retry logic to retry as fast as possible within a reasonable limit; however, that reasonable limit should be appropriately tested as it might cause problems elsewhere.
My problem was setting the default retry policies, leaving them alone, and going to production. That was a big mistake; don't do that. My thought process was that they should be good enough. However, when I hammered the staging cluster with the default policies applied, I quickly discovered how wrong they were. I advise going with the defaults and hammering the system to figure out the sweet spot and keep validating even after going into production. This is where E2E tests help a lot because you shouldn't do this every time you remember about it.
#sample resiliency config
apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
name: resiliencyconfig
spec:
httpPipeline:
handlers:
- name: retry
type: retry
metadata:
- name: maxRetries
value: "3"
- name: perTryTimeout
value: "5s"
- name: circuitbreaker
type: circuitbreaker
metadata:
- name: failureThreshold
value: "5"
- name: recoveryTimeout
value: "30s"
The second problem I encountered was when the deployment in staging failed disastrously. The sidecar wasn't injected, and it took me some time to figure out why. Now, if I had read the documentation better, I would have known that the certificate used for mTLS expired.
This happened in staging because I didn't care much about setting a certificate in Keyvault, syncing it in the cluster, and configuring Dapr to use it. Even so, it still requires a manual process to instantiate the certificate, but it's something less to do when it happens. The second problem was that monitoring wasn't set up to alert when the certificate would expire. Dapr-sentry starts to announce that the certificate will expire 30 days in advance, so a simple alert rule would have let me know that this is going to happen, thus saving me 2-3 hours of figuring out what happened.
The log looks something like this:
{"instance":"dapr-sentry-68cbf79bb9-gdqdv","level":"warning","msg":"Dapr root certificate expiration warning: certificate expires in 2 days and 15 hours","scope":"dapr.sentry","time":"2024-04-01T23:43:35.931825236Z","type":"log","ver":"1.6.0"}
The log entry you're interested:
"Dapr root certificate expiration warning: certificate expires in 2 days and 15 hours"
Setting up an Azure Monitor alert that looks for a container insights log with the Dapr root certificate expiring would have solved the problem. Also, when it dies, it dies. You need to reinstall it from scratch, or that's how my experience was.
The third and last problem was that I used the AKS extension Install the Dapr extension for Azure Kubernetes Service (AKS) and Arc-enabled Kubernetes - Azure Kubernetes Service | Microsoft Learn, which sounds good on paper; however, it's not what you need. It simplifies upgrade management, but you don't want automatic updates, and also, when the certificate fails, you cannot easily renew it as Dapr needs to check the helm chart and guess what? The MSFT version doesn't exist in a public repository, and you will get an error like this:
certificate rotation failed: chart "dapr" version "1.13.1-msft.3" not found in https://dapr.github.io/helm-charts repository
My advice here is to rip off the band-aid and install the official version and don't use the extension unless you don't value your time :)
Ultimately, I do not regret running Dapr in production, starting the process, and learning to use it. The first time I started working with it, I had doubts that it might add more complexity than needed, but it paid off. There are so many things dependent on it that it would take me over six months to replace it with custom solutions that it achieves out of the box.
So, to simplify, there are a few key things to consider when you think of Dapr:
For the rest of the information, I suggest checking out https://dapr.io/ and starting from there, trust me you will love it.
That being said, have a good one!