You've successfully subscribed to Florin Loghiade
Great! Next, complete checkout for full access to Florin Loghiade
Welcome back! You've successfully signed in
Success! Your account is fully activated, you now have access to all content.

I remember seeing the launch, and I told myself, "This is a cool thing. I hope it sticks and doesn't tank." Two years later, it was accepted into the CNCF incubator, giving me more confidence in the product. Fast-forward to 2024, and we see Dapr being used by many companies.

Microsoft launched a new open-source project around Kubernetes and microservices: TechCrunch.

So, before I go on, this is Dapr:

Dapr, or Distributed Application Runtime, is a portable runtime that makes it easy for any developer to build resilient distributed applications that run across cloud and edge. It provides integrated APIs for communication, state, and workflow for building production-ready applications. It leverages industry best practices for security, resiliency, and observability, increasing developer productivity by between 20 and 40 per cent.

Cloud Native Computing Foundation Announces Dapr Graduation | CNCF

Going further, Dapr is a runtime that abstracts common microservice patterns into reusable components. Instead of writing custom code for service discovery, messaging, or state persistence, you leverage Dapr's sidecar architecture to handle these tasks. This means your microservices can remain lean, focusing on business logic while Dapr handles the hard work.

Anyway, I always wanted to try Dapr in production, but I never got around to it as there were always other projects that needed to be worked on. During an idea crunch, I was thinking of building a small platform to provide APIs to different automations so we don't have to rewrite every time the same PowerShell Function in Python, C# or Go, a process that added a lot of toil and wasn't very efficient; you change one function in one place, and you need to adapt the same function in all languages and in all the places they are referenced. This was not scalable or efficient. The idea came to me to build this secure platform that scales enough to use it everywhere. That moment hit me, and this is an excellent scenario for Dapr as it provides every building block I need without redesigning the wheel.

This was two years ago, and now I've been running a big-ish platform backed by Dapr, which serves over one hundred automation systems and 20-30 more. As Dapr supports multiple languages, we don't have a language barrier. The main plugins are written in Python, Go, and C# with some PowerShell, which Dapr does not entirely support. Still, nothing stops you from using the Pub/Sub ability in combination with Azure Functions.

Now, this article is not about Dapr per se but about running Dapr in Kubernetes, and I will talk about what I've learned during these two years running it. While Dapr is cloud-agnostic, running it in a managed environment like AKS brings enormous benefits. As I said, it's cloud-agnostic, so it can be locally, EKS, GKE, a VM, or whatever you like.

Let's start with some theory; then, we can dive into my experiences to have a common language.

The Dapr Concepts in Kubernetes

Dapr's features are exposed as APIs. For instance, the state API allows you to save and retrieve the state, while the pub/sub-API lets you publish events to a message broker. This design makes it easy to switch implementations without changing application code.

Component Model:

Dapr uses components to configure integrations. You define components as YAML files and deploy them to your cluster. For example, you might have a component for the Redis state store or an Azure Service Bus pub/sub.

Sidecar Injection:

Dapr can automatically inject the sidecar into your application pods via admission controllers when running in Kubernetes. You only need to annotate your deployments, and Dapr does the rest.

Observability and Resiliency:

Built-in support for metrics and distributed tracing, Dapr makes monitoring and debugging complex microservice interactions easier. It also includes features like retries, circuit breakers, and timeouts to build resilient services.

Why Run Dapr on AKS?

Now that we understand the purpose of Dapr, the next question is why you should run it in Kubernetes.

The Benefits of a Managed Kubernetes Environment

Managed Control Plane:

The cloud manages the Kubernetes control plane for you, ensuring high availability and automatic patching. This means you don't have to worry about the controller nodes or etcd backups.

Scalability:

With Kubernetes, you can quickly scale your applications horizontally by adding more nodes. Integrated autoscaling features help your applications handle variable workloads without manual intervention.

Simplified Upgrades:

Kubernetes upgrades are handled by the cloud provider with minimal downtime. You can update your cluster with the latest security patches and features without significant manual effort.

Installing the Dapr

If you haven't installed the Dapr CLI, you can use a simple shell script. On a Unix-based system:

wget -q https://raw.githubusercontent.com/dapr/cli/master/install/install.sh -O - | /bin/bash
After installation, verify that the CLI is available:
dapr --version

Once your CLI is ready, initialize Dapr:

dapr init --kubernetes

This command does the following:

  • Creates a new namespace called dapr-system (if it doesn't already exist).
  • Deploys Dapr control plane components including the sidecar injector, placement service, and dashboard.
  • Registers necessary CRDs so that you can define Dapr components for state stores, pub/sub, and more.

The output should confirm that Dapr has been successfully initialized. You can then verify the deployment:

kubectl get pods -n dapr-system

You should see several pods running, such as:

  • dapr-placement-<hash>
  • dapr-sidecar-injector-<hash>
  • dapr-dashboard-<hash>
  • dapr-operator-<hash>

These are the core components that enable Dapr's runtime in your cluster.

dapr status -k

The default installation of Dapr is designed for most scenarios. However, you can modify the Helm chart values if you need to tweak settings (for example, changing resource limits for the sidecar injector). Dapr's installation under the hood uses Helm charts, and advanced users may wish to override certain values. For instance, you could export the default values, modify them, and then upgrade the deployment:

helm show values dapr/dapr -n dapr-system > dapr-values.yaml

Edit dapr-values.yaml as needed, then:

helm upgrade dapr dapr/dapr -f dapr-values.yaml -n dapr-system

Now that Dapr is installed let's deploy a sample application. In this section, we'll walk through deploying a microservice that leverages Dapr for state management and service invocation

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dapr-demo-app
  labels:
    app: dapr-demo-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: dapr-demo-app
  template:
    metadata:
      labels:
        app: dapr-demo-app
      annotations:
        dapr.io/enabled: "true"
        dapr.io/app-id: "dapr-demo-app"
        dapr.io/app-port: "3000"
    spec:
      containers:
      - name: demo
        image: myregistry.azurecr.io/dapr-demo-app:latest
        ports:
        - containerPort: 3000

Key points in this manifest:

    • Annotations:dapr.io/enabled: "true" signals that the Dapr sidecar should be injected.
    • dapr.io/app-id gives the service a unique identifier that other services can use for invocation.
    • dapr.io/app-port indicates which port your application listens on.

Secure Communication Between Services

Dapr's service invocation API supports both HTTP and gRPC. To secure these communications:

  • mTLS (Mutual TLS):
  • Dapr supports mTLS for securing service-to-service communication. When enabled, every service invocation is encrypted and authenticated. In AKS, this ensures that only authorized services can communicate with each other.
  • Access Control:
  • Dapr's sidecars can enforce access control policies, ensuring that only services with the correct credentials or permissions can invoke certain APIs.

Secrets Management

Avoid hardcoding secrets in your application code or configuration. Dapr provides a secrets management building block:

  • External Secret Stores:
  • Use Azure Key Vault or another supported secret store to manage your sensitive information.
  • Dapr Secrets API:
  • Your application can retrieve secrets through the Dapr sidecar using a standardized API, which abstracts away the details of the underlying secret store.

Scaling and Performance Optimization

Scalability is one of the key benefits of running microservices in AKS. When combined with Dapr, you get a platform that not only scales your application containers but also manages the communication overhead between services. In this section, we'll explore strategies for scaling and performance optimization.

Horizontal Pod Autoscaling (HPA) with Dapr

AKS supports Horizontal Pod Autoscaling, and Dapr-enabled applications are no exception. HPA monitors CPU and memory usage (or custom metrics) and adjusts the number of replicas dynamically.

For example, to scale your Dapr-enabled application based on CPU usage, you might create an HPA resource:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: dapr-demo-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: dapr-demo-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

With HPA in place, your application will automatically scale out under load and scale back in during idle periods, optimizing resource usage and cost.

  • abstraction means you write against a consistent API regardless of which underlying technology you use (e.g., for state management or pub/sub).
  • Focus on Business Logic: Developers can concentrate on core functionality, knowing that Dapr handles communication, resiliency, and observability.

Experiences running Dapr in production

Running it in production for over two years, I found very few issues with the technology itself, but I found most configuration issues on my side. I encountered problems regarding configuration middlewares where I incorrectly set up the retry logic, which considerably slowed down the applications. At first, you think you want the retry logic to retry as fast as possible within a reasonable limit; however, that reasonable limit should be appropriately tested as it might cause problems elsewhere.

My problem was setting the default retry policies, leaving them alone, and going to production. That was a big mistake; don't do that. My thought process was that they should be good enough. However, when I hammered the staging cluster with the default policies applied, I quickly discovered how wrong they were. I advise going with the defaults and hammering the system to figure out the sweet spot and keep validating even after going into production. This is where E2E tests help a lot because you shouldn't do this every time you remember about it.

#sample resiliency config
apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
  name: resiliencyconfig
spec:
  httpPipeline:
    handlers:
    - name: retry
      type: retry
      metadata:
      - name: maxRetries
        value: "3"
      - name: perTryTimeout
        value: "5s"
    - name: circuitbreaker
      type: circuitbreaker
      metadata:
      - name: failureThreshold
        value: "5"
      - name: recoveryTimeout
        value: "30s"

The second problem I encountered was when the deployment in staging failed disastrously. The sidecar wasn't injected, and it took me some time to figure out why. Now, if I had read the documentation better, I would have known that the certificate used for mTLS expired.

This happened in staging because I didn't care much about setting a certificate in Keyvault, syncing it in the cluster, and configuring Dapr to use it. Even so, it still requires a manual process to instantiate the certificate, but it's something less to do when it happens. The second problem was that monitoring wasn't set up to alert when the certificate would expire. Dapr-sentry starts to announce that the certificate will expire 30 days in advance, so a simple alert rule would have let me know that this is going to happen, thus saving me 2-3 hours of figuring out what happened.

The log looks something like this:

{"instance":"dapr-sentry-68cbf79bb9-gdqdv","level":"warning","msg":"Dapr root certificate expiration warning: certificate expires in 2 days and 15 hours","scope":"dapr.sentry","time":"2024-04-01T23:43:35.931825236Z","type":"log","ver":"1.6.0"}

The log entry you're interested:


"Dapr root certificate expiration warning: certificate expires in 2 days and 15 hours"

Setting up an Azure Monitor alert that looks for a container insights log with the Dapr root certificate expiring would have solved the problem. Also, when it dies, it dies. You need to reinstall it from scratch, or that's how my experience was.

The third and last problem was that I used the AKS extension Install the Dapr extension for Azure Kubernetes Service (AKS) and Arc-enabled Kubernetes - Azure Kubernetes Service | Microsoft Learn, which sounds good on paper; however, it's not what you need. It simplifies upgrade management, but you don't want automatic updates, and also, when the certificate fails, you cannot easily renew it as Dapr needs to check the helm chart and guess what? The MSFT version doesn't exist in a public repository, and you will get an error like this:

certificate rotation failed: chart "dapr" version "1.13.1-msft.3" not found in https://dapr.github.io/helm-charts repository

My advice here is to rip off the band-aid and install the official version and don't use the extension unless you don't value your time :)

Ending notes on Dapr and running it in AKS or others

Ultimately, I do not regret running Dapr in production, starting the process, and learning to use it. The first time I started working with it, I had doubts that it might add more complexity than needed, but it paid off. There are so many things dependent on it that it would take me over six months to replace it with custom solutions that it achieves out of the box.

So, to simplify, there are a few key things to consider when you think of Dapr:

  • Learning Curve: Although Dapr simplifies many things, there's still a learning curve to understand its components and how they interact with your application.
  • Debugging Complexity: When something goes wrong, the extra layer introduced by Dapr (i.e., the sidecar) can sometimes obscure the root cause. Good logging and tracing become essential.
  • Resource Overhead: Running sidecars for every application instance adds some overhead regarding resource usage, which could be noticeable in smaller clusters.
  • Configuration Management: Dapr's component configurations (for state stores, pub/sub, etc.) must be managed carefully. Incorrect settings may lead to unexpected behaviour.

For the rest of the information, I suggest checking out https://dapr.io/ and starting from there, trust me you will love it.

That being said, have a good one!