Running Azure Bots in Kubernetes

Recently I started working a lot more with Kubernetes and I started migrating more and more workloads towards it.

My latest challenge was to migrate an Azure bot that was running in an Azure App Service to Kubernetes and the problem was that simply creating a container with the bot and starting it in Kubernetes was not enough.

Azure App Services do a lot of magic behind the scenes to make a web application work. For example, it exposes Python applications with Gunicorn whether they are Flask apps or aio http.

So you would think that slapping an NGINX ingress route towards the container would solve the problem without any code changes but the reality is that it's not that simple.

Let's take the Python Bot Framework bot example

from aiohttp import web
from aiohttp.web import Request, Response, json_response

async def messages(req: Request) -> Response:
    # Main bot message handler.
    if "application/json" in req.headers["Content-Type"]:
        body = await req.json()
    else:
        return Response(status=415)

    activity = Activity().deserialize(body)
    auth_header = req.headers["Authorization"] if "Authorization" in req.headers else ""

    response = await ADAPTER.process_activity(activity, auth_header, BOT.on_turn)
    if response:
        return json_response(data=response.body, status=response.status)
    return Response(status=201)


async def notify(req: Request) -> Response:  # pylint: disable=unused-argument
    await _send_proactive_message()
    return Response(status=HTTPStatus.OK, text="Proactive messages have been sent")


APP = web.Application(middlewares=[aiohttp_error_middleware])
APP.router.add_post("/api/messages", messages)

if __name__ == "__main__":
    try:
        web.run_app(APP, port=CONFIG.PORT)
    except Exception as error:
        raise error

Trimmed snippet app.py -> BotBuilder-Samples/samples/python/13.core-bot at main · microsoft/BotBuilder-Samples (github.com)

This example works perfectly locally with the Bot Emulator and in an App Service Plan but if we build a container with the same sample using the following dockerfile

FROM python:3.8.3

RUN apt-get clean \
  && apt-get -y update

COPY . /app/bot

RUN mkdir logs

WORKDIR /app/bot

COPY requirements.txt /app/bot

RUN pip install --upgrade pip

RUN pip install wheel

RUN pip install -r requirements.txt

EXPOSE 3978

CMD ["python", "app.py"]

Docker file that will fail to run

We will find out very fast that the container will fail with a boatload of http 404 or http 500 errors or just plainly not work at all. The amount red herrings its going to throw at you will make your head hurt.

So what's the solution?

The solution is quite simple after you figure it out but nerve wrecking while debugging it.

First of all, we need to convert the app.py file from aiohttp to flask and run Gunicorn on top.

from flask import Flask, request, Response
import asyncio

LOOP = asyncio.get_event_loop()
APP = Flask(__name__, instance_relative_config=True)
APP.config.from_object("config.DefaultConfig")

# Listen for incoming requests on /api/messages.
@APP.route("/api/messages", methods=["POST"])
def messages():
    """Main bot message handler."""
    if "application/json" in request.headers["Content-Type"]:
        body = request.json
    else:
        return Response(status=415)

    activity = Activity().deserialize(body)
    auth_header = (
        request.headers["Authorization"] if "Authorization" in request.headers else ""
    )

    async def aux_func(turn_context):
        await BOT.on_turn(turn_context)

    try:
        task = LOOP.create_task(
            ADAPTER.process_activity(activity, auth_header, aux_func)
        )
        LOOP.run_until_complete(task)
        return Response(status=201)

    except Exception as exception:
        raise exception


if __name__ == "__main__":
    try:
        APP.run(debug=True, port=APP.config["PORT"])  # nosec debug
    except Exception as exception:
        raise exception

Snippet of a conversion to Flask for app.py

The above shows roughly what you need to change to make it a flask app. For Bot Framework you will have to create an async loop for it to keep running hence the import of asyncio and the Loop = asyncio.get_event_loop()

Once we converted the app.py to flask, we need to adjust the dockerfile to run Gunicorn. So first add in your requirements.txt file Gunicorn and flask with what ever versions you want to run then create the following dockerfile

FROM python:3.8.3

RUN apt-get clean \
  && apt-get -y update

COPY . /app/bot

RUN mkdir logs

WORKDIR /app/bot

COPY requirements.txt /app/bot

RUN pip install --upgrade pip

RUN pip install wheel

RUN pip install -r requirements.txt

EXPOSE 3978

CMD ["gunicorn", "app:APP" ,"--bind", "0.0.0.0", "--access-logfile=gunicorn-access.log" ,"--error-logfile" ,"gunicorn-error.log"]

As you can see instead of running python app.py we're running Gunicorn and calling the flask app which in our example is called APP.

Now we can send it to Kubernetes and expose it with your favorite ingress system or just attach an IP to it.

Gunicorn by default runs on port 8000 so we're going to create the pod and service to expose the containerport 8000

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bot-deployment
  namespace: bot-bot
  labels:
    app: bot
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: bot
  template:
    metadata:
      labels:
        app: bot
    spec:
      containers:
        - name: bot
          image: myacr.azurecr.io/bot:latest
          imagePullPolicy: Always
          ports:
            - containerPort: 8000
          resources:
            requests:
              memory: '64Mi'
              cpu: '10m'
            limits:
              memory: '512Mi'
              cpu: '250m'
          env:
            - name: MicrosoftAppId
              value: <VERYSECUREGUID>
            - name: MicrosoftAppPassword
              value: <VERYSECUREGUIDPASSWORD>

---
apiVersion: v1
kind: Service
metadata:
  name: bot-service
  namespace: bot-bot
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 8000
  selector:
    app: clopsbot
  type: LoadBalancer

YAML for Azure Kubernetes

The yaml above shows how to create everything and attach an Azure IP to it so you can take Bot Emulator and test it.

One key takeaway from this example is that bot emulator needs to run ngrok otherwise the bot will want to send responses to localhost which will break it.

Once you managed to get the bot running in a Kube cluster, all that remains is to optimize it a bit for large scale. The example above will not scale well, it just showcases some hoops you have to jump to get it working.

That being said, have a good one.

Running Azure Bots in Kubernetes

So what's the solution?

Azure AD Workload Identity in AKS

Native RDP / SSH with Azure Bastion

AKS Automatic - Kubernetes without headaches

LetsEncrypt Certificates in AKS

Under the Hood of Azure AKS – Nodes, Control Plane, Networking, and Operations

Latest Posts

AKS Automatic - Kubernetes without headaches

Managed Prometheus and Grafana - No hassle cluster monitoring

LetsEncrypt Certificates in AKS

Running Azure Bots in Kubernetes

So what's the solution?

Azure AD Workload Identity in AKS

Native RDP / SSH with Azure Bastion

You may also like

AKS Automatic - Kubernetes without headaches

LetsEncrypt Certificates in AKS

Under the Hood of Azure AKS – Nodes, Control Plane, Networking, and Operations

Places to look at

Latest Posts

AKS Automatic - Kubernetes without headaches

Managed Prometheus and Grafana - No hassle cluster monitoring

LetsEncrypt Certificates in AKS

Explore Tags