Kubernetes managed clusters in Azure

Coming from the infrastructure world, I would say that I had a bit of a hard time wrapping my head around how you would manage containers when they get out of control. When you’re playing around with 1-2 containers, that’s not a big deal, but when you’re getting in the hundreds, then that’s where the problems start. I as an infrastructure guy always ask the nasty questions as:

Where do I keep them?
How do I secure them?
How do I update them?
How do I protect myself from the 2 AM calls?

Containers are immutable images that work everywhere, but when you’re building a very complex application that runs on containers, you’re asking yourself “where do I put them?”. The answer to that question is a container orchestrator but which one? You just search, and you find out that there are multiple ones. If your operations are mostly in the cloud, you’re looking for container orchestrators in marketplace offering, and you find where you will find the Azure Container Service that provides you with deployment options for Docker Swarm, DC/OS and Kubernetes. The question that arises at that moment is “Which one should I pick?”

ACS just provides you with a consistent way of deploying those container orchestrators but in IaaS fashion. You will still have to do patch and security management. Kubernetes is considered a first tier product in Azure, and it’s the most integrated orchestrator in Azure. When you deploy containers in a Kubernetes cluster, you don’t have to allocate IPs or provision disks. The system calls Azure’s APIs and does that for you, out of the box without any extra work.

With all that in mind, Microsoft brought forth a new offering in preview called Azure Container Service (AKS) that builds from scratch a high available Kubernetes cluster which you don’t manage it entirely. The parts that are under your management are the agent nodes where your containers will sit. When you need to do scale-out operations, you just tell the system that you want to scale out, and it will do that by itself. Think of DSC (Desired State Configuration) or ARM Templates (Azure Resource Manager), you declare what you want, and the system proceeds in doing that.

Creating an AKS

Before you start creating an AKS cluster you need to create a service principal in your Azure Active Directory tenant and generate an SSH private key.

Creating an Azure Service Principal is just as easy as creating an SSH key. You can do that by following this article here
I generate SSH keys with Putty and you do that by following this article here

After you create the Service Principal, grant it contributor rights on the subscription otherwise, it will not be able to deploy disks, file shares or IPs in its Resource Group. For production scenarios, you will create the SPN, grant it contributor access and after deploying the AKS, you can use RBAC to grant it contributor access to the AKS RG. We have to do this workaround because there’s no RG to grant it permissions.

Save the Application ID, secret and SSH private key in a text file because we will use them later.

You have two simple options for creating an AKS cluster; Portal or the CLI.

From the Azure marketplace, you search for AKS and the Azure Container Service (AKS) preview will show up. Click on it and let’s follow the steps.

In the first phase we will have to give the cluster a name, a DNS prefix (if we want to), choose the Kubernetes version (preferably the latest one), select the subscription create an RG and location.

The next phase we will use the generated Service Principal and SSH key and paste them accordingly. The Node count means the number of agent nodes we will have available. This is not a hardcoded number so if we want to scale-out, then we will have this option without an issue. You can see from here that are not asked to specify the number of master nodes. This is the part that’s managed by Azure.

Once you’re done and the deployment finishes, you will have two new resource groups in your subscription. The resource group you referenced, in my case AKS-RG and a resource group named after the RG, cluster name and location MC_AKS-RG_lfa-aks_westeurope

The CLI way is much simpler. You pop up a Cloud Shell, or you can go to shell.azure.com and paste this in:

This will quickly create an AKS cluster for you and give you the SSH Keys.

So which one is simpler? Apparently, the CLI way but do remember that we don’t always have access to everything in an Azure Subscription. If we do not have access to the Azure Active Directory tenant, then we won’t be able to create that Service Principal and somebody with the right permissions will have to give them to us.

I have a cluster, now what?

When I first started playing around with AKS, I tried the hard way of installing all the required tools so that I can manage it and to be honest I got bored fast. If you want to do this on your machine then starters, you need Azure CLI installed and connected to the subscription, and after that, you will need kubectl and helm for cluster management and package management. Once you’re done with that, then you can start working with it. I found that the best way around everything is either to use shell.azure.com or configured it in VSCode CloudShell VS Code

In the CLI you can type az aks get-credentials -n clustername -g RGName and it will save the credentials that will be used to connect to the cluster in the current context.

Once all that’s done, you can leverage kubectl to play around with the cluster

Useful commands:

Creating a container is pretty simple. I create a deployment with kubectl create -f yaml file

Then I type in kubectl get service –watch and wait for Azure to provision a public IP for the service I just created. This process can take a few seconds or a few minutes, this is the part where you depend on Azure 🙂

After the deployment is done, you will get a public IP address and access it.

Scaling up the deployment is straightforward. You use the command kubectl scale –replicas and deployment name and you scale up the deployment.


If you want to use the autoscaler, you need to have CPU request and limits defined in the yaml file.

Once your yaml file contains the requests and limits for the service that you want to enable autoscaling

Scaling out the cluster

The procedure for scaling out the cluster is similar to the pod scaling. You run the AZCLI command to increase the node numbers, and that’s it.

Upgrading the cluster

Upgrading the cluster is just as simple as scaling-out, but the problem is that being a preview offering, you might meet some issues as I have. I for one couldn’t manage to upgrade any AKS offering from the CLI due to problems in either Azure CLI or the AKS offering. This is not a big problem at the moment because it’s in preview but be warned that this is not production ready, and if you deploy a critical business application on the cluster, you might have problems.

The upgrade process is pretty simple; you first have to run the AZCLI command to find out what version is available and then just run the upgrade command.

My thoughts

The AKS offering is pretty solid from what I played around with it, and the experience of deploying a cluster manually end-to-end is not a pleasant experience. ACS and AKS allow you to deploy container orchestrators in a snap and just get on with your life. My little gripe with AKS is that the agent nodes are on standard VMs and not VMSS (Virtual Machine Scale Sets) and I don’t quite understand why they chose this way of doing things. Service Fabric runs on VMSS, DC/OS runs on VMSS so I don’t see why Kubernetes would be a problem. Time will tell regarding this one.

There are some limitations at the moment, mostly to the availability of the offering and public IP limits. You might not be able to create an AKS cluster, and if that happens, you just try again and from a services standpoint, you’re limited to 10 IPs because of the basic load balancer limitation.

From a pricing standpoint, I must say that it’s the best thing you can get. You pay just for the VMs. You’re not paying for anything that’s on top of it which is a big plus when compared to the other cloud providers which bill you for the service as well. What you need to know when it comes to billing is that when you create and AKS cluster, be aware that Azure is provisioning three master-K8 VM nodes which you will not see, but you will pay for them.

We will see how AKS will grow, but from what I’m seeing, it’s going in the right direction.

As always, have a good one!

Azure IaaS – SLAs – Single VMs, Availability Sets and Availability Zones

You have some options in Azure when you want to have a financially backed SLA for your VM deployments. When you can go into a distributed model, you can get 99.95, when you can’t then you have the option of getting 99.9% SLA when you’re using Premium Disks. But what if I want more?

If you want more, then it’s going to cost you more but before we jump into solutions, let’s understand what the numbers mean and why we should care.

You probably heard of the N nines SLA; three nines, four nines, five, six. To explain what that means, down below we have an excellent table which illustrates to us what those numbers mean in actual downtime.

In Azure for IaaS deployment, we have to option of gaining a 99.9% and 99.95% SLA. 99.9% translates into an acceptable downtime of 8.45 hours per year while 99.95% translates in around 4.22 hours per year. Now does this mean that we will have 4 or 8 hours of downtime for all of our IaaS deployments? Of course not but it might happen, that’s why you need to take all the necessary precautions so that your business critical application stays online all the time. We didn’t have the option of receiving a financially backed SLA for single VMs until recently so this is a big plus.

Recently Microsoft announced to ignite the public preview of Availability Zones which boost the SLA number to 99.99%, lowering the downtime to around 52 minutes in a year. But what are they exactly?

Availability Zones are the actual datacenter in a single region. All regions start with three zones but you during this preview, you might not be able to deploy services to all of them. If we’re talking about West Europe, then this region has three data centers that are physically separated in all terms and purposes. In order for Microsoft to financially back you for 99.99% SLA all the datacenters in a region have different power, network, and cooling providers so that if something happens to said provider then you won’t have a full region downtime and they are also 30 KM apart from each other, so they are protected from physical faults as well.

With Availability Zones, they also released Zone aware SKUs for some services like the Standard Load Balancer and Standard Public IP. At the time of writing we have the possibility of deploying VMs, VMSS, Managed Disks and IPs in an Availability Zone and SQL DB, Cosmos DB, Web Apps and Application Gateway already span three zones.

If you want to benefit from the four nine SLA, then you either deploy directly into availability zones or you redeploy your VMs.

Reference Architecture:

As you can see from the above diagram, you need to use services that span zones, and after that, you need to deploy them in pairs just as you would do with Availability Sets. You clone your deployments, implement them in different zones, and you benefit from the 99.99% SLA.

*Preview Service: You have no guaranteed SLA while this service is in a preview. Once it goes GA, you will receive a financially backed SLA.

Achieving the SLA.

We have a couple of SLA numbers in our head, let’s now understand how to obtain them.

99.9% SLA- Single VMs – All your single VM deployments have to be backed by premium storage. That means that both the OS and Data disks have to be SSDs. We cannot mix and match and still qualify for the financially backed SLA. The best candidates for single VMs are the ones running relational databases or systems that cannot run in a distributed model. I wouldn’t recommend running web servers in single VM; you have App Services for that.

99.95% SLA – Availability Sets – All your distributed systems should run in Availability Sets to benefit from the 99.95% SLA and compared to single VM deployments, it doesn’t matter if you’re running Standard or Premium storage on them. AV Sets work nicely for Web Servers or other types of applications that are stateless or keep their state somewhere else. If your application has to keep its state on the actual VM, then your options are limited to the Load Balancer which can be set to have Sticky Sessions, but you will have problems in the long run. For stateful applications, it’s best to keep their state in a Redis Cache, Database or Azure Files Shares. This type of deployment works very well for most apps out there.

99.99% SLA – Availability Zones – This is the strongest SLA you can get at this time for your IaaS VMs. Availability Zones are similar in concept to the Availability Set deployment; you need to be aware of what candidates you’re deploying to the zones from an application standpoint and also from a financial standpoint. I’m saying financial because you need to use zone spanning services like the Standard SKU for the Public Load Balancer and Public IP. The standard Load Balancer is not free as the basic one, you pay for the number of load balancing rules you have, and you also pay for the data processed by it.

Financially backed SLA

Now that we have a basic understanding of SLAs, we have to understand what financially backed means regarding any cloud provider. When they say that the SLAs are financially supported, they mean that if something on the provider’s side causes an SLA breach, they will reimburse the running costs of the VM when the downtime occurred.
The formula looks like this:

Multiple VMs in Availability Sets
Monthly Uptime % = (Maximum Available Minutes-Downtime) / Maximum Available Minutes X 100

Maximum Available Minutes – This is the total number of runtime minutes for two or more VMs in a month.
Downtime – This is the total number of minutes where there was no connectivity on any of the VMs the AV Set.

This means that if the Monthly Uptime percentage is lower than 99.95%, you can ask Microsoft to grant you service credits.

Single VMs with Premium Storage

Monthly Uptime % = (Minutes in the Month – Downtime) / Minutes in the Month X 100

Minutes in a Month – Total number of minutes in a month.
Downtime – Total number of downtime minutes from the Minutes in a Month metric.

This means that if the calculated Monthly Uptime percentage is lower than 99.9%, then you can Microsoft to grant you service credits.

You might ask; How do I know that I had an SLA breach?

Well, you need to measure the uptime of your application. In the end, you might not care if one VM from your Availability set is down for say 10 minutes, but you will care if somebody calls you when the Website is down. You have multiple options out there to measure the availability of your application like UptimeRobot, Monitis, Pingdom, etc. You also have the possibility of doing measurements in Azure with Azure Monitor, but you’re not getting application uptime, so you need the best of both worlds to have an accurate view of the situation. I configure both because I want to know when something happens to a VM, and I also want to know if the application is up and healthy. The reason is that if you’re using say VMs and PaaS services, you need to know which one caused the downtime and if it was a human error. Microsoft will not pay for your mistakes, so you need to have self-healing systems in place to avoid human error. There are a lot of Configuration Management systems out there, systems like DSC / Chef / Puppet which ensure you that your configuration didn’t fail. Azure has Desired State Configuration integrated into it for example which grants you the ability to enforce states on VMs based on a configuration manifest.

That being said, gaining a financially backed SLA in Azure is not rocket science. I hope you obtained some useful information from this post 🙂

Have a good one!

Post-Event Conferinta de Cloud – Bucharest

Conferinta de Cloud

On the 23rd of November, we started the first premium cloud conference in Bucharest, and we’ve had a blast. The event lasted for one full day with two tracks (Business and Technical) with subjects about Cloud, GDPR, Security and other great stuff. Over 170 cloud hungry people participated at the event.

You can find the event photos here:
https://www.facebook.com/search/str/conferinta+de+cloud/photos-keyword

My session was on Azure Site Recovery & Backup in Microsoft Azure

Description:

The need for protecting your data against disaster is not a new concept. It doesn’t matter if we’re talking about SMBs or large enterprises because data is something that they all have in common and they all have to protect it so that their business can run smoothly. Traditional backup/disaster recovery solutions have the disadvantage that they require an upfront investment for the initial implementation and on-going allocated resources (people, money, time) to maintain and test them. When you want to implement a backup solution in a company, you have to buy the server, install the solution and after that maintain it so that it works all the time. When it comes to disaster recovery, we’re talking about a different scale, you have to build a separate standby data center that has to be synchronised with the main one and hope you never get to test your failover solution.
In this session we will talk about the Recovery Services in Azure and how they can help us implement a backup and disaster recovery plan with minimal upfront costs and pay-as-you-go model. By using Azure as our backup and disaster recovery solution, we do not need to buy new servers to do backups or build data centers for disaster recovery.

I hope you were at the event and had as much fun as I did. 🙂

Using Azure App Service for WordPress – What you need to know

What I think is pretty clear that we all know WordPress and have something to love or hate about it. I use for blogging, some friends are using it for small eCommerce, and I’ve seen companies use it for massive operations.

The problem with any website is that it needs to run on a web server for it to be available to the world and that brings up other issues.

When it comes to where to host it, there are a lot of options for hosting/deploying WordPress out there:
1. The possibility of hosting our blog on a shared WHM.
2. The choice of renting a VM and setting up a WHM / Cpanel environment to host it.
3. The option of paying for a “SaaS” like WordPress solution
4. Azure App Services

Today we will apparently talk about Windows / Linux App Services and what you need to know.

When it comes to App Services things might seem pretty clear. I provision an App Service, create a Web Site and deploy my WordPress into it. Simple no? Not really.

You have two flavours of App Services:
1. Windows
2. Linux (I wrote a blog post regarding Linux ones here)

You have to choose one of them because you do not have the possibility of switching between each other without redeploying your solution. The best thing you can do in my opinion is to go with the Linux offering because Apache or Nginx work much better with WordPress than IIS.

You chose an App Service flavour, what now?

Windows

Windows App Services run on IIS with PHP / Python / Java extensions. What you need to know.

When you first create the APP Service, you need to modify the application settings, so you have the best of the best performance out there.

Modify PHP version from 5.6 to 7.2 – You will get a significant performance boost just by modifying the PHP version.
Change the Platform to 64-bit – We are in 2017, let’s run everything on 64 bit shall we? 🙂
Set Always On to On – By default, web applications turn off if there’s no traffic on the website and when you initiate an HTTP connection it will have a cold start, and the first viewer will have to wait until the instance boots up. From a cost management standpoint, you’re not saving any money by having this option off so turning it to on it will maintain the website active even though you don’t have any traffic.
Set ARR Affinity to Off – WordPress is not a distributed application, and it’s quite hard to make it one. The option of turning off ARR will disable the feature in IIS and will speed up the loading time.

If you need to modify the PHP configuration of the Web Application, then you need to go into Kudu and add a “.user.ini” to site/wwwroot folder.

The most common settings for WordPress are the following:

Windows App Services persist local storage by leveraging Azure Files shares over SMB. So be aware of this “limitation” because Azure Files is slow (500 IOPS / 60MB/s)

Linux

Linux App Services are based on containers. You have the option of creating an App Service with pre-built binaries, or you can just bring your container from a container registry (Docker Hub / Azure Container Registry)

The prebuilt containers have the following Runtime stacks:
Node.Js
PHP
.NET Core
Ruby

The ones referenced above a starting point. I prefer creating my container because I have more control over the binaries that are inside the container and I like NGINX more than Apache.

The Azure marketplace has a WordPress image allows you to have a “one-click” deployment from which you can just import your current WordPress instance. This works nicely for migrations because you just need to move the content, database and other settings. For this kind of job, there are multiple plugins in the WordPress marketplace which allow you to do these types of migrations. The plugin that works best for me is: All in One WP Migration

If you create the instance using the one-click deployment, then most of the Application Settings are pre-populated, and you don’t quite need to do anything but if you’re like me and like creating your container with your stack then this is what you need to take into consideration.

Get the modified WP-Config file from here: WP-Config for App Services

Build your container image as you wish and then create the App Service for Linux and set the following Application settings:

Application Settings:

WEBSITES_ENABLE_APP_SERVICE_STORAGE = True
DATABASE_HOST
DATABASE_NAME
DATABASE_PASSWORD
DATABASE_USERNAME

Connection String:
defaultConnection = mysql connection string

WEBSITES_ENABLE_APP_SERVICE_STORAGE command is crucial for WordPress sites (or any other site that requires persistence) because this tells the App Service to mount the /home directory on Azure Files shares for persistence and scalability. Containers being stateless/immutable means that anything that happens inside it will be lost with the first restart.

General optimizations

WordPress works very nicely in VMs but when you’re deploying an instance in an Azure App Service things change a bit, and you need to do some optimisation for it to work great.

The tool that I use for checking and optimizing my WP blog is Google PageSpeed Insights which is great for desktop and mobile websites. It gives you suggestions on how to improve general performance, increase speed and have a lower time to first byte.

Some extensions I use, and I recommend for improving your WP Instance. (TEST BEFORE YOU USE)

Caching is extremly important so the extensions I recommend are:
WP Super Cache – Free
WP Rocket – Paid

If you want to leverage Azure Redis then you can use :
Redis Cache

For Minifying your code you can use:
Merge + Minify + Refresh

For finding issues with your WP instance, I recommend provisioning an Application Insights instance and install the WP extension. App Insights WordPress

Other more advanced ways of optimizing your instance are to use a CDN and Blob storage. Media files are better served by a CDN and not your App instance, this depends on a case by case scenario, and your mileage may vary. If your WP instance is image heavy then just by offloading those images to blob storage will greatly improve performance. Azure Blob Storage WP Plugin is something I used for clients and it works very well.

Have a good one!

Hosting a single container in Azure – Azure Container Instances

You’ve probably heard of containers and what you can do with them in some simple scenarios. Containers brought an exciting concept in application development and infrastructure management. Containerizing an application removes the ping-pong between Dev and Ops and the famous phrase “it works on my machine”. You get a docker file or the actual container image from a public/private repository and just run it. If it worked in the development environment, then it will work correctly in the staging and production environment without any changes.

The problem with containers is that they need to be hosted in a container orchestration system like Docker Swarm, DC/OS or Kubernetes. These systems are not cheap to run and not easy to maintain. If you have a significant application that requires a container orchestration tool then that’s a no-brainer but what if you need to run one single container for one hour because I need something processed and then I’m done? Well, you didn’t have any other possibility other than running it on your machine or in a container orchestrator, but recently Azure introduced a public preview of Azure Container Instances that allow you to run single containers at a per second billing.

Azure Container Instances

An Azure Container Instance is a single container that starts in seconds/minutes (depends if you’re using Linux or Windows) and you are billed by the second. You can pretty much call it a Container as a Service offering or CaaS 🙂

This concept is pretty sweet from multiple standpoints. I for one found some significant use cases for my needs. For example, when I’m doing workshops or training classes, I usually use VSTS to show off the possibilities of deploying applications to Azure. The problem I have is that the hosted agent free time is not enough for my preparation of demos and I usually spend some time setting up Windows and Linux agents. With ACI, I just create a Windows and Linux container with the agents and only deploy them from an Azure Container Registry or Docker Hub.

Another use case I found is web application load testing. I can just spin up a couple of containers and do load tests on my web application, pay for a minute of usage and be done with it.

I just thought of two useful things that you can do with ACI but that’s just the tip of the iceberg and at the moment they are preview which means MS is not done working on them and awesome stuff should appear soon 🙂
If you only have a 150$ Azure MSDN Subscription then you know that you have to do a lot of micro-management to just keep that credit when you’re doing presentations / workshops or training classes.

Getting started

Spinning up a container instance is extremely simple. You can either spin up an ACI by using the Azure Portal, or you can use the Azure CLI via the Cloud Shell to run some simple commands to provision your container.
If you’re using the Azure Portal, you go to new -> Search MarketPlace for Azure Container Instance -> Go through the steps where you reference a public or private registry, specify the amount of CPU and Memory you need and presto, DONE 🙂


The commands for doing it in the Azure CLI are like this:

Billing

This service looks great and sounds like a good idea for load testing, VSTS agents and other types of one-off things that you may need, but the billing is not straightforward. You have a flat fee for when you’re creating the container, and after that, you get billed by the second for the memory and CPU that you’re using.

I won’t reference pricing on this one because prices change but what I can say is that if you leave one running for a day, you will pay around 6 EUR which is not much 🙂

In my opininon ACI is a great Azure service addition and I’m waiting to find out what will Azure bring next 🙂

That being said, have a good one!

Pin It on Pinterest