Serverless anywhere with Kubernetes

Serverless anywhere with Kubernetes

KEDA (Kubernetes-based Event-Driven Autoscaling) is an opensource project built by Microsoft in collaboration with Red Hat, which provides event-driven autoscaling to containers running on an AKS (Azure Kubernetes Service), EKS ( Elastic Kubernetes Service), GKE (Google Kubernetes Engine) or on-premises Kubernetes clusters 😉

KEDA allows for fine-grained autoscaling (including to/from zero) for event-driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition. Most of the time, we scale systems (manually or automatically) using some metrics that get triggered.

For example, if CPU > 60% for 5 minutes, scale our app service out to a second instance. By the time we’ve raised the trigger and completed the scale-out, the burst of traffic/events has passed. KEDA, on the other hand, exposes rich events data like Azure Message Queue length to the horizontal pod auto-scaler so that it can manage the scale-out for us. Once one or more pods have been deployed to meet the event demands, events (or messages) can be consumed directly from the source, which in our example is the Azure Queue.

Getting started with KEDA

Prerequisites:

  • Azure Subscription
  • Azure Container Registry
  • AKS Cluster

To follow this example without installing anything locally, you can load up the Azure Cloud Shell or you can install azcli, kubectl, helm and the Azure Functions SDK.

I will do everything in the cloud with monitoring and running other commands with Lens | Kubernetes IDE from my workstation.

Now open up your Azure Cloud Shell, rollup your sleeves and let’s get started:

Be attentive that there are <> brackets in the code. Do not copy-paste. Adjust the code for your deployment

The next step after you’re done installing KEDA on your AKS cluster is to instantiate your function:

After everything is up and running I run the PowerShell script from below to flood the queue with messages.

Once the PowerShell script started running, we get the first messages in the queue

With kubectl in wait mode I start observing the fireworks happening

Lens | Kubernetes IDE

Looks but can you expand more on this? How can I actually use this solution?

This solution basically exists to have the Azure Functions product everywhere. At this point in time, you can run Azure Functions wherever you want for free. You are not obligated to run functions in Azure, you can run them in Google or AWS if you want. Want on-premises? Go crazy, it works as long as you’re running Kubernetes.

The main idea is to bring Functions as a Service (FaaS) on any system and that’s very cool. While preparing for this post, I took a running system that uses Bot Framework, Service Bus and Azure Functions and ported the Azure Function part to KEDA. Zero downtime, no difference, it just works. The chatbot adds messages in a service bus and then the function triggers. I ran them in parallel for five minutes then I took down the Azure Function one.

The main point of using KEDA is to be cloud-agnostic. You know Kubernetes, you know Azure Functions, you don’t need to change anything. Hell, if you’re in those regulated environments where the cloud is a big no-no then you can just as well integrate KEDA in your clusters and then set up triggers with whatever messaging system you have or even just set up plain HTTP triggers and be done with it. Once the regulatory part relaxes and the company starts to dips it’s toes in the water, you can just as well copy-paste the code in Azure 🙂

That being said, have a good one!

Container security in the Cloud. A state of uncertainty.

Container security in the Cloud. A state of uncertainty.

Thinking that you managed to escape security? Guess again.

In this post, I will focus on containers with Azure Kubernetes in mind but these examples apply to all the worlds out there -EKS, GKE, and on-premises.

Containers gained popularity because they made publishing and updating applications a straightforward process. You can go from build to release very fast with little overhead, whether it’s the cloud or on-premises, but there’s still more to do.

The adoption of containers in recent years has skyrocketed in such a way that we see enterprises in banking, public sector, and health looking into or are already deploying containers. The main difference here would be that these companies have a more extensive list of security requirements and they need to be applied to everything. Containers included.

Orchestrator security

This part will focus on orchestrator security. What you can do to harden your orchestrators from malicious intent.

Keep your Kubernetes cluster as updated as possible.

This might sound obvious for some of you but it’s better to say it. Kubernetes has a support cycle of three major versions. Meaning that they will only support only the latest three major versions, anything less and you’re out of support. They also keep release branches for three minor versions as well. Keep that in mind when you’re doing cluster maintenance.

Patch your master and data nodes

Say what? Yup, your read that correctly, if you’re using a managed offering like Azure Kubernetes Service or Elastic Kubernetes Service then you have to patch your worker nodes. If you’re running KOTS, AKSEngine or any other flavor of Kubernetes in VMs (be it cloud or on-premises) then you have to do patch management for the master nodes as well.

A solution for this problem is to install a daemonset called Kured (Kubernetes Reboot Daemon) which performs automatic node reboots when they require it. When the package manager sees that updates are available for the installed packages then it adds a file in /var/run/ called reboot-required and Kured looks for that. Once it sees that the nodes require a reboot it will cordon and drain the nodes and uncordon them after.

Use namespaces

Using namespaces doesn’t add a big layer of protection but they surely add a layer of segregation between the pods. By using namespaces and not throwing everything in the default namespace adds a bit towards security and reduced kubectl get pods clutter.

Enable RBAC and give permissions as strictly as possible

If you already have Kubernetes cluster deployed then tough luck; You have to dismantle and redeploy them. You wanted bleeding edge? There you go 🙂 Jokes aside, having RBAC enabled is a major win towards security because you’re not giving full cluster-admin access on the Kubernetes cluster. This is a good time to polish your ARM or Terraform skills and start treating your clusters as cattle not pets.

Leverage AAD integration with AKS

This recommendation goes hand in hand with the RBAC enabled clusters. This feature works in Azure Kubernetes clusters and again this is an on-create feature. By integrating your AKS cluster with Azure Active Directory, you will have very clear control of who is accessing the resources and you also the get added benefit of AAD protection, MFA support and everything else that Microsoft adds in it 🙂

Get rid of Helm 2 when Helm 3 releases

Yes, yes I know that Helm 3 is not here yet but listen up, once it goes GA, migrate as fast as possible towards it and don’t look behind. Helm 2 is not a bad package manager but that damned tiller pod should die in a fire. As you may know, tiller never gets initiated with TLS enabled or any type of security. You’re getting a prompt that you should that but nobody does that. Tiller is basically the Helm Server. With Helm 3, tiller goes in the corner to die off. Good riddance.

Use Network Policy for your pods

By default, all deployed pods can talk with each other, it doesn’t matter if they are segregated in different namespaces or not. This is the beauty of container orchestrators as they give you all the infrastructure you need and you just publish your code. The downside is that if a container gets a malicious infiltration then all other pods are left vulnerable.
This is the same concept as in VM security, you just need only one machine to be vulnerable because after that you can do as much lateral movement as you want.

For this to work, you need to have CNI initiated. In Azure for AKS, you need to deploy the cluster with advanced networking enabled or in the CLI add the flag –network-plugin azure

Take a look at AppArmor and Secure Computing (SecComp)

Security features like the ones I referenced above and AppArmor, SecComp are additional levels of security that transform the Kubernetes environments in hostile environments for attackers. In AKS AppArmor is enabled by default but you can create profiles that restrict the pods even more. With an AppArmor profile, you can block actions as read, write, execute or system mount which limits, even more, the possibilities of a malicious actor.

If you want to leverage App Armor profiles then SSH to your AKS node and create a file named deny-write.profile and paste the following code

After that from your Cloud Shell or local machine create a YAML file with the code below and try to apply it.

The pod will instantiate correctly but if you want to do something funky with file write, then it’s not going to work.

Secure Computing or SECComp is another security module that exists in the AKS Cluster nodes and enabled by default. With SECComp you can specify filters for what a POD cannot do and work from there. An example would be like the one from below

Create a file named prevent-chmod in /var/lib/kubelet/seccomp/prevent-chmod . It will be loaded immediately

After that, from your Cloud Shell or local machine run the following YAML file.

That should be enough for orchestrator security, next on the list are containers

Container security

This part will focus on container security. What you should do to keep your containers as secure as possible without causing your application go die off a horrible death.

Use Distroless or use lightweight images e.g., alpine

If you’ve never heard of Distroless, it’s a container image built by Google that doesn’t contain an operating system. All the system-specific programs like package managers, shell, networking stuff and so on do not exist in the Distroless image. Using an image like this reduces the security impact tenfold. You have a lower attack surface, fewer image vulnerabilities, you gain true immutability and so on, I suggest you give them a try 🙂

Remove package managers and network utilities

If Distroless is too hardcore for you then start with alpine images and remove the package managers, network utilities and shell access. Just by doing this, you’re going to get a more secure container. Name of the game? Reduce the attack surface. Go.

Remove files system modification utilities (chmod, chown)

It’s not enough to gain access to the container. You need to actually run some shell scripts and execute other types of commands. If the attack doesn’t have chmod or chown, their life is hell. If you combine this with the previous recommendation then you’re golden.

Scan your containers images for vulnerabilities (Aqua Security, Twistlock, etc.)

This is a no-brainer in production systems. If you’re scanning VMs for malicious intent then you should be scanning your containers as well. Containers, as well as VMs, are meant to run applications. Containers are by fact smaller in size when compared to a VM but that doesn’t mean that you should just trust your gut feeling and go with it in production. Get a container scanning solution and implement it in your environment. The more information you have about your containers, the better.

Keep an eye out for ephemeral containers in Kubernetes – bind a container to an existing pod for debugging purposes

This is an awesome feature that’s come out in alpha with Kubernetes 1.16. Can’t wait for it to be available in Azure 🙂

Basically ephemeral containers allow the user to attach a container to an existing pod in order to debug it. If you use ephemeral containers in tandem with Distroless containers then you’re in a win-win scenario. You gain the benefit of security while retaining the debugging capabilities. albeit in other containers.

By the way. A pod is not equal to a container. A pod can have multiple containers under it but it’s not a frequent practice.

Enforce user-mode containers

You don’t need root access inside the container. By default, all containers run as root and this introduces a few security concerns because file system permissions do not apply to the root user and cherry on top the root user and read and write file on the file system, change stuff in the container, creating reverse shells and things like that.

In Kubernetes, you can enforce containers to run in user-mode by using the pod & container security context.

Enforce the filesystem to be in read-only mode

By default, the filesystem of a given container is read/write enabled but should why should we be writing files in a container? Cache files or temporary files are ok but you lose everything when the container dies so, to be honest, you shouldn’t need to write anything or import anything in a container. Dev/Test doesn’t apply here.

Set the FileSystem in read-only mode and be safer. If you need to write temp files fine, mount a /tmp folder. Example below:

Conclusion time

If you want a TL;DR, here it is. Containers require security controls around them and they need to be built with security in mind. There’s no easy way around it, we need to be security conscious otherwise bad things will happen.

If you’re going to implement any of the recommendations from above, please take in mind that everything needs to be tested before deploying in production.

That’s kinds it. I hope all the recommendations from above are helpful and as always, have a good one!

AKS – Working with GPUs

AKS – Working with GPUs

I’ve discussed about AKS before but recently I have been doing a lot of production deployments of AKS, and the recent deployment I’ve done was with Nvidia GPUs. 

This blog post will take you through my learnings after dealing with a deploying of this type because boy some things are not that simple as they look. 

The first problems come after deploying the cluster. Most of the times if not all, the NVIDIA driver doesn’t get installed and you cannot deploy any type of GPU constrained resources. The solution is to basically install an NVIDIA daemon and go from there but that also depends on the AKS version.

For example, if your AKS is running version 1.10 or 1.11 then the NVIDIA Daemon plugin must be 1.10 or 1.11 or anything that matches your version located here

The code snip from above creates a DaemonSet that installs the NVIDIA driver on all the nodes that are provisioned in your cluster. So for three nodes, you will have 3 Nvidia pods.

The problem that can appear is when you upgrade your cluster. You go to Azure and upgrade the cluster and guess what, you forgot to update the yaml file and everything that relies on those GPUs dies on you.

The best example I can give is the TensorFlow Serving container which crashed with a very “informative” error that the Nvidia version was wrong.

Other problems that appear is monitoring. How can I monitor GPU usage? What tools should I use?

Here you have a good solution which can be deployed via Helm. If you do a helm search for prometheus-operator you will find the best solution to monitor your cluster and your GPU 🙂

The prometheus-operator chart comes with Prometheus, Grafana and Alertmanager but out of the box you will not get the GPU metrics that are required for monitoring because of an error in the Helm chart which sets the cAdvisor metrics with https, the solution would be to modify the exporter HTTPS to false.

And import the dashboard required to monitor your GPUs which you can find here: https://grafana.com/dashboards/8769/revisions and set it up as a configmap.

In most cases, you will want to monitor your cluster from outside and for that you will need to install / upgrade the prometheus-operator chart with the grafana.ingress.enabled value as true and grafana.ingress.hosts={domain.tld}

Next in line, you have to deploy your actual containers that use the GPU. As a rule, a container cannot use a part of a GPU but only the whole GPU so thread carefully when you’re deploying your cluster because you can only scale horizontally as of now.

When you’re defining the POD, add in the container spec the following snip below:

End result would look like this deployment:

What happens if everything blows up and nothing is working?

In some rare cases, the Nvidia driver may blow up your data nodes. Yes that happened to me and needed to solved it.

The manifestation looks like this. The ingress controller works randomly, cluster resources show as evicted. The nvidia device restarts frequently and your GPU containers are stuck in pending.

The way to fix it is first by deleting the evicted / error status pods by running this command:

And then restart all the data nodes from Azure. You can find them in the Resource Group called MC_<ClusterRG><ClusterName><Region>

That being said, it’s fun times to run AKS in production 🙂

Signing out.

Integrating Azure Container Instances in AKS

In a previous blog post, I talked about how excellent the managed Kubernetes service is in Azure and in another blog post I spoke about Azure Container Instances. In this blog post, we will be combining them so that we get the best of both worlds.

We know that we can use ACI for some simple scenarios like task automation, CI/CD agents like VSTS agents (Windows or Linux), simple web servers and so on but it’s another thing that we need to manage. Even though that ACI has almost no strings attached, e.g. no VM management, custom resource sizing and fast startup, we still may want to control them from a single pane of glass.

ACI doesn’t provide you with auto-scaling, rolling upgrades, load balancing and affinity/anti-affinity, that’s the work of a container orchestrator. So if we want the best of both worlds, we need an ACI connector.

The ACI Connector is a virtual kubelet that get’s installed on your AKS cluster, and from there you can deploy containers just by merely referencing the node.

If you’re interested in the project, you can take a look here.

To install the ACI Connector, we need to cover some prerequisites.
The first thing that we need to do is to do is to create a service principal for the ACI connector. You can follow this document here on how to do it.

When you’ve created the SPN, grant it contributor rights on your AKS Resource Group and then continue with the setup.

I won’t be covering the Windows Subsystem for Linux or any other bash system as those have different prerequisites. What I will cover in this blog post is how to get started using the Azure Cloud Shell.

So pop open an Azure Cloud Shell and (assuming you already have an AKS cluster) get the credentials.

After that, you will need to install helm and upgrade tiller. For that, you will run the following.

The reason that you need to initialize helm and upgrade tiller is not very clear to me but I believe that helm and tiller should be installed and upgraded to the latest version every time.

Once those are installed, you’re ready to install the ACI connector as a virtual kubelet. Azure CLI installs the connector using a helm chart. Type in the command below using the SPN you created.

As you can see the in command from above, I typed both for the –os-type. ACI supports Windows and Linux containers so there’s no reason not to get both 🙂

After the install, you can query the Kubernetes cluster for the ACI Connector.

Now that the kubelet is installed, all you need to do is just to run kubectl -f create YAML file, and you’re done 🙂

If you want to target the ACI Connector with the YAML file, you need to reference a nodeName of virtual-kubelet-ACICONNECTORNAME-linux or windows.

You run that example from above and the AKS cluster will provision an ACI for you.

What you should know

The ACI connector allows the Kubernetes cluster to connect to Azure and provision Container Instances for you. That doesn’t mean that it will provision the containers in the same VNET as the K8 is so you can do some burst processing or those types of workloads. This is let’s say an alpha concept which is being built upon and new ways of using it are being presented every day. I have been asked by people, what’s the purpose of this thing because I cannot connect to it, but the fact is that you cannot expect that much from a preview product. I have given suggestions on how to improve it, and I suggest you should too.

Well that’s it for today. As always have a good one!

Deploying your containers to Kubernetes using VSTS

Visual Studio Team Services or VSTS is Microsoft’s cloud offering that provides a complete set of tools and services that ease the life of small teams or enterprises when they are developing software.

I don’t want to get into a VSTS introduction in this blog post, but what we need to know about VSTS is that it’s the most integrated CI/CD system with Azure. The beautiful part is that Microsoft has a marketplace with lots of excellent add-ons that extend the functionally of VSTS.

Creating a CI/CD pipeline in VSTS to deploy containers to Kubernetes is quite easy. I will show in this blog post a straightforward pipeline design to build the container and deploy it to the AKS cluster.

The prerequisites for are the following:
VSTS Tenant and Project – Create for free here with a Microsoft Account that has access to the Azure subscription
VSTS Task installed – Replace Tokens Task
AKS Cluster
Azure Container Registry

Before we even start building the VSTS pipeline, we need to get some connection prerequisites out of the way. To deploy containers to the Kubernetes cluster, we need to have a working connection with it.

Open a Cloud Shell in Azure and type in:

It will tell you that the current context is located in “/home/NAME/.kube/config.”

Now open the /home/NAME/.kube/config with nano or cat and paste everything from there in a notepad. You need that wall of text to establish the connection to the cluster using VSTS.

Let’s go to VSTS where we will create a service endpoint to our Kubernetes cluster.
At the project dashboard, press on the whell icon and press on services.

Press on the New Service Endpoint and select Kubernetes.

Paste in the details from the .kube/config file in the kubeconfig box and the https://aksdns

Create a repository and add the following files and contents to it:
*I know it would be easier to clone from my Github Repo but when I’m learning I like doing copying and pasting stuff in VSCode, analyse and then upload.

You will need to add an “index.html” with whatever you want it to be written in it. I went with “One does not simply push changes to containers. Said no one ever.”

You’re done building the repository; now it’s time to setup the build definition.

Go to Build and Releases, Press on New and select the new Git repository that you just created.

At the template screen press on Container and press Apply.


On the new screen, go to variables and create two new variables

ACR_DNS with the value of your ACR registry link in the form of name.azurecr.io
BUILD_ID with the value $(Build.BuildId)

Now go back to the task pane by pressing the cross where the phase 1 task says and add the Replace Tokens task and Publish Artifacts task. The result should look like the screenshot below.

For each task fill in the following:

Build an Image
Container Registry Type = Azure Container Registry
Azure Subscription = Your Subscription
Azure Container Registry = Select what you created
Action = Build an Image
Docker File = **\Dockerfile
Use Default Build Context = Checked
ImageName = nginxdemo
Qualify Image Name = Checked
Additional Image Tags = $(Build.BuildId)

Push an Image
Container Registry Type = Azure Container Registry
Azure Subscription = Your Subscription
Azure Container Registry = Select what you created
Action = Puysh an image
ImageName = nginxdemo
Qualify Image Name = Checked
Additional Image Tags = $(Build.BuildId)

Replace Tokens
Target Files = **/*.yaml
Files Encoding = auto
Advanced
Token Prefix = __ (Double Underscore)
Token Suffix = __ (Double Underscore)

Publish Artifact
Path to publish = deploy.yaml
Artifact name = deploy
Artifact publish location = Visual Studio Team Service/TFS

Now go to triggers, select Continuous integration and check “Enable continuous integration” then press the arrow on Save & queue and press save.

The build has been defined; now we need to create a release.

Go to Build and Releases and press on Release

Press on the cross and then on the “Create release definition”

In the New Release Definition pane, select the “Deploy to Kubernetes Cluster template and press on Apply

Now that the template is pre-populated to deploy to the Kubernetes Cluster, you need to add an artifact, select the Build Definition and add it.

Now it’s time to enable Continuous deployment so press on the lightning bolt that’s located in the upper right corner of the artifact and enable the CD trigger.

Now go to the Tasks tab located near the Pipeline and modify the kubectl apply command.
kubectl apply
Kubernetes Service Connection = Select the K8 connection that you created
Command = Apply
Use Configuration files = Checked
Configuration File = press on the three dots and reference the deploy.yaml or copy what is below.
$(System.DefaultWorkingDirectory)/K8Demo/deploy/deploy.yaml

Now press save, queue a new build and wait for the container to get deployed and when it’s done just type in the Azure Cloud Shell kubectl get services and the IP will pop.

Final Thoughts

So you finished configuring the CI/CD pipeline and deployed your first container to an AKS cluster. This might seem complicated at first but once you do this a couple of times, you will be a pro at it, and the problems you will face will be on how to make it more modular. I do similar things at clients most of the times when I’m automating application deployments for cloud-ready or legacy applications. This type of CI/CD deployment is quite easy to deploy, when you want to automate a full blow microservices infrastructure, then you will have a lot more tasks to do jobs. My most significant CI/CD pipeline consisted of 150 tasks that were needed to automate a legacy application.

What I would consider some best practices for CI/CD pipelines in VSTS or any other CI/CD tool is to never hard code parameters into tasks and make use of variables/variable groups. Tasks like the “Replace Tokens” one permit you to reference those variables so when one changes or you create one dynamically, they just get filled in the code. This is very useful when your release pipeline deploys to more than one environment, and you can have global variables and environment specific variables.

Well, I hope this was useful.

Until next time!

Pin It on Pinterest