Security is not something to kid about and when it comes to cloud, you have to be very through when you’re deploying your cloud infrastructure. Which means that you are still required to do defense in depth, use anti-malware systems, configure extended monitoring, logging and reporting mechanisms. When you’re going to the cloud, you have to be aware of the Shared Responsibility Matrix which applies to any cloud provider.
As you can see in the image above, you as a customer still have a responsibility to secure your cloud environment. So those skills you’ve developed while working on-premises will still be of value in the cloud.
The subject for today’s topic is managing Network Security Groups using a feature in Azure, called Application Security Groups.
What are they?
Application Security Groups are a mechanism to group virtual machines that reside in the same virtual network and apply Network Security Groups to them.
The way you deployed NSGs in Azure subscription was that you would assign them to a network interface or a subnet and then configure them in a granular manner, based on the deployment type. The reality was that it’s utopic to do this cleanly and it gets messy after a couple of months. So ASGs came to the rescue where it helped you group a set of VMs based on roles like Web, DB, Middleware etc. and apply NSG Allow / Deny rules on them.
By using an ASG, you simply your management overhead by just adding the VMs that you create in those groups and automatically you get the security policies applied from your NSG.
Creating / using Application Security Groups is easy. Go to the Azure Portal -> Create a resource -> Type in Application Security Group and press create.
Or you can simply use Powershell
#PS Example for creating Application Security Groups.
As you can see, it’s pretty easy to secure your VMs, and considering that it can become a pain to manage the NSGs even for simple deployments. I’m not even talking about very complex ARM deployments which deploy tens of VMs and link them together 🙂
You have some options in Azure when you want to have a financially backed SLA for your VM deployments. When you can go into a distributed model, you can get 99.95, when you can’t then you have the option of getting 99.9% SLA when you’re using Premium Disks. But what if I want more?
If you want more, then it’s going to cost you more but before we jump into solutions, let’s understand what the numbers mean and why we should care.
You probably heard of the N nines SLA; three nines, four nines, five, six. To explain what that means, down below we have an excellent table which illustrates to us what those numbers mean in actual downtime.
In Azure for IaaS deployment, we have to option of gaining a 99.9% and 99.95% SLA. 99.9% translates into an acceptable downtime of 8.45 hours per year while 99.95% translates in around 4.22 hours per year. Now does this mean that we will have 4 or 8 hours of downtime for all of our IaaS deployments? Of course not but it might happen, that’s why you need to take all the necessary precautions so that your business critical application stays online all the time. We didn’t have the option of receiving a financially backed SLA for single VMs until recently so this is a big plus.
Recently Microsoft announced to ignite the public preview of Availability Zones which boost the SLA number to 99.99%, lowering the downtime to around 52 minutes in a year. But what are they exactly?
Availability Zones are the actual datacenter in a single region. All regions start with three zones but you during this preview, you might not be able to deploy services to all of them. If we’re talking about West Europe, then this region has three data centers that are physically separated in all terms and purposes. In order for Microsoft to financially back you for 99.99% SLA all the datacenters in a region have different power, network, and cooling providers so that if something happens to said provider then you won’t have a full region downtime and they are also 30 KM apart from each other, so they are protected from physical faults as well.
With Availability Zones, they also released Zone aware SKUs for some services like the Standard Load Balancer and Standard Public IP. At the time of writing we have the possibility of deploying VMs, VMSS, Managed Disks and IPs in an Availability Zone and SQL DB, Cosmos DB, Web Apps and Application Gateway already span three zones.
If you want to benefit from the four nine SLA, then you either deploy directly into availability zones or you redeploy your VMs.
As you can see from the above diagram, you need to use services that span zones, and after that, you need to deploy them in pairs just as you would do with Availability Sets. You clone your deployments, implement them in different zones, and you benefit from the 99.99% SLA.
*Preview Service: You have no guaranteed SLA while this service is in a preview. Once it goes GA, you will receive a financially backed SLA.
Achieving the SLA.
We have a couple of SLA numbers in our head, let’s now understand how to obtain them.
99.9% SLA- Single VMs – All your single VM deployments have to be backed by premium storage. That means that both the OS and Data disks have to be SSDs. We cannot mix and match and still qualify for the financially backed SLA. The best candidates for single VMs are the ones running relational databases or systems that cannot run in a distributed model. I wouldn’t recommend running web servers in single VM; you have App Services for that.
99.95% SLA – Availability Sets – All your distributed systems should run in Availability Sets to benefit from the 99.95% SLA and compared to single VM deployments, it doesn’t matter if you’re running Standard or Premium storage on them. AV Sets work nicely for Web Servers or other types of applications that are stateless or keep their state somewhere else. If your application has to keep its state on the actual VM, then your options are limited to the Load Balancer which can be set to have Sticky Sessions, but you will have problems in the long run. For stateful applications, it’s best to keep their state in a Redis Cache, Database or Azure Files Shares. This type of deployment works very well for most apps out there.
99.99% SLA – Availability Zones – This is the strongest SLA you can get at this time for your IaaS VMs. Availability Zones are similar in concept to the Availability Set deployment; you need to be aware of what candidates you’re deploying to the zones from an application standpoint and also from a financial standpoint. I’m saying financial because you need to use zone spanning services like the Standard SKU for the Public Load Balancer and Public IP. The standard Load Balancer is not free as the basic one, you pay for the number of load balancing rules you have, and you also pay for the data processed by it.
Financially backed SLA
Now that we have a basic understanding of SLAs, we have to understand what financially backed means regarding any cloud provider. When they say that the SLAs are financially supported, they mean that if something on the provider’s side causes an SLA breach, they will reimburse the running costs of the VM when the downtime occurred.
The formula looks like this:
Multiple VMs in Availability Sets
Monthly Uptime % = (Maximum Available Minutes-Downtime) / Maximum Available Minutes X 100
Maximum Available Minutes – This is the total number of runtime minutes for two or more VMs in a month.
Downtime – This is the total number of minutes where there was no connectivity on any of the VMs the AV Set.
This means that if the Monthly Uptime percentage is lower than 99.95%, you can ask Microsoft to grant you service credits.
Single VMs with Premium Storage
Monthly Uptime % = (Minutes in the Month – Downtime) / Minutes in the Month X 100
Minutes in a Month – Total number of minutes in a month.
Downtime – Total number of downtime minutes from the Minutes in a Month metric.
This means that if the calculated Monthly Uptime percentage is lower than 99.9%, then you can Microsoft to grant you service credits.
You might ask; How do I know that I had an SLA breach?
Well, you need to measure the uptime of your application. In the end, you might not care if one VM from your Availability set is down for say 10 minutes, but you will care if somebody calls you when the Website is down. You have multiple options out there to measure the availability of your application like UptimeRobot, Monitis, Pingdom, etc. You also have the possibility of doing measurements in Azure with Azure Monitor, but you’re not getting application uptime, so you need the best of both worlds to have an accurate view of the situation. I configure both because I want to know when something happens to a VM, and I also want to know if the application is up and healthy. The reason is that if you’re using say VMs and PaaS services, you need to know which one caused the downtime and if it was a human error. Microsoft will not pay for your mistakes, so you need to have self-healing systems in place to avoid human error. There are a lot of Configuration Management systems out there, systems like DSC / Chef / Puppet which ensure you that your configuration didn’t fail. Azure has Desired State Configuration integrated into it for example which grants you the ability to enforce states on VMs based on a configuration manifest.
That being said, gaining a financially backed SLA in Azure is not rocket science. I hope you obtained some useful information from this post 🙂
A friend of mine recently started working with Azure and loved it once he got the hang of it. I encouraged him to start using PowerShell to automate various Azure operations but it didn’t quite stick with him on the first try. He started automating Azure operations using the Azure CLI and while it’s not a bad tool, it’s quite lacking in features compared to PowerShell and I’m pretty sure that it will not be maintained much longer since Microsoft open sourced PowerShell and gave the Linux / Mac community a taste. The funny part of this story is that he’s a Windows user, uses Windows 10 and yet he’s still using Azure CLI.
Where am I going with this? While giving him some tips on how to deploy some production / staging environments in Azure, I saw how he was automating resource creation in Azure using the CLI. He basically created a golden image in a storage account and with 35 lines of Azure CLI code, he was provisioning the environments. That made me cringe and motivated me to teach him how to do it using ARM templates and custom scripts. (more…)
For those that you don’t know, Linux Integration Services or LIS in short, are a set of drivers that enable synthetic device support in Linux VMs that are running on Hyper-V. Plainly speaking, these drivers allow you have almost all the nice stuff you take for granted when you use Windows under Hyper-V and that’s mouse support (remember the days when you had to press CTRL + Left/Right arrow? Not anymore), dynamic memory, synthetic network adapters (not the legacy Intel 1000 which only works in 10mbps mode) and lots more feature where you can read about HERE
Now recently I got a chance to update LIS from 4.07 to 4.0.11 on some production CentOS servers that were updated to 6.7 a while back that were still running version 4.0.7 without any issues, even if CentOS 6.7 wasn’t officially supported by Microsoft.
So I did all the necessary steps for roll-back purposes and started the upgrade process which worked just fine until I rebooted the VMs and got a very nice cup of kernel panic. (more…)