Recently I started working a lot more with Kubernetes and I started migrating more and more workloads towards it. My latest challenge was to
Moving to the cloud is “easy”; Managing it is another ordeal.
I’m going to start with a disclaimer; This post focuses on achieving governance and security in Azure; This doesn’t mean that what I’m going to write here cannot apply to AWS or Google, they have other technologies that can help you achieve full governance;
You’ve heard of digital transformation, modern workplace, and the whole 9 yards. Easy? Nope.
You’ve been told that moving to the cloud will grant you all the security you want, all the control you wish, lower costs, everything is enabled out of the box, and it just works. The reality is that no cloud provider will ever do that for you out of the box.
Let’s see what marketing says that you get with the cloud:
- Cost Control
- Efficiency & Scalability
- Security & Compliance
- Exotic technologies
- Pay as you go -Usage-based payment model
Sounds good? Indeed it does, but there’s a catch, all of the above have a price. They require an organizational change, a change in mentality, and a lot of sweat. You cannot expect to get all of those things by just performing a lift & shift; Migrating your VMs to the cloud means that you’re shifting ALL your on-premises costs to the cloud.
What do I mean by ALL the costs?
Datacenters cost money -You have power, cooling, staffing costs. Then you have hardware costs, then you have hardware replacement costs, and then you have hardware refresh costs, and then you have capacity costs. Should I continue?
If you’re going IaaS, then you’re going to pay the cost of running IaaS, and most of the time, you’re going to pay more for running a virtual machine in the cloud than on-premises.
Why? You might ask. Well, because the cloud provider didn’t cut corners as you did. The cloud provider offers a financially backed SLA, which means that if your virtual machine goes does down, you get a cut from the bill. Now to provide you with that bill cut, the cloud provider has to ensure that the underlying infrastructure is very redundant. Who do you think is paying for that redundant infrastructure? You are.
Then why PaaS is much cheaper?
IaaS means Infrastructure-as-a-Service, which means that you’re getting an infrastructure where you can put your virtual machines on top of it, and you manage everything. The cloud provider cannot do anything with your virtual machines. They have no control over your virtual machines, and they don’t want any control.
PaaS or Platform-as-a-Service means that you’re getting a place where you can upload your application code and manage the application as is. No virtual machines involved, no OS management. Let’s call it minimal overhead. This means that the cloud provider has control of the system that’s running your code.
PaaS is much cheaper because you’re getting a shared pool of resources where you can deploy your application, and the cloud provider manages the underlying infrastructure. Your responsibility is the application and the controls associated with it (identity, app-level controls, and so on.). In a nutshell, by sharing VMs resources with other tenants, you’re sharing the redundant infrastructure costs with other tenants as well.
That’s the short version of IaaS vs. PaaS. The situation is much more complicated than sharing the underlying infrastructure costs, but you get the idea. Azure has some dedicated environments where you’re not getting reduced costs, but more control in those situations where you’re limited by ancient compliance and regulations controls that have not adapted yet adapted to the cloud.
You mentioned something of the responsibility of the cloud provider
The “cloud” works on a shared responsibility model. The picture below is the shared responsibility matrix for Microsoft services.
Shared responsibilities by service: Shared Responsibility by Service
What does this mean? This means that you’re still responsible for the deployment that you’re doing in the cloud, but you transfer a subset of problems to the cloud provider.
- No more hardware swaps
- No more hardware refresh
- No more dead servers
- No more waiting weeks for capacity -CPU, RAM, Storage
- No more dealing with vendors – Except the cloud vendor.
- No more datacenter associated costs -Power, Cooling, Staffing
These are the clear responsibilities that you transfer to the cloud provider, but unfortunately, the cloud doesn’t solve all the op-premises issues. You’re still going have to deal with the old issues and some new:
- User identities
- Vulnerable code
- Patch Management
- Security management
- Cost Management (this one is fun)
- Backup and Disaster Recovery
- Traditional approach induced issues (Cloud is cloud, and on-prem is on-prem -don’t mix and match)
- Inflexibility induced costs
- And more.
Depending on the cloud model you pick, you’re going to have more or fewer issues that I’ve outlined. You might have seen two issues that raised your eyebrows a bit.
Traditional approach induced issues and Inflexibility induced costs.
The cloud comes with a plethora of services that can replace your bloated infrastructure systems. You don’t need firewall appliances, proxies, multi-hop IDPs, MFA appliances, and so on. You don’t need cascading network devices to have segregation of networks, and you certainly don’t need VLANs.
Azure has services for almost all of those things; You have Azure Firewall for your firewall needs, you have Azure Active Directory for your IDP, you have Azure MFA builtin for MFA. You have VNET peering for network segregation; you have NSGs for stateful firewall-ing at a VM or subnet level. The list can go on.
By adopting an on-premises approach to the cloud, you will inevitably have more on your plate than you signed up. Keep it simple, adapt your technologies, and sleep well at night.
Second, Inflexibility induced costs; The cloud provides you an enormous amount of flexibility. You don’t need to overcommit capacity; you can scale up/down right or left as much as you want when you want. You can automate your workloads to be resource-efficient based on usage, e.g., Scale down your workloads during the weekends.
With PaaS, you can do all of that. With IaaS, up to a point, you cannot. If you’re adamant about IaaS, then you’re going to pay the price. You don’t need a VM to run a web application, you have App Services for that, you don’t need an NGFW to manage your network, you have Azure Firewall for that, and you absolutely don’t need a file server cluster to manage files, you have Azure Storage for that.
Don’t get me wrong; I understand that there are situations where the application is so old that you cannot adapt it to the cloud. I’ve seen a lot of applications fail horribly just because you’ve put them on different VMs if you want to benefit from the cloud and not pay hand over fist.
What does everything translate to?
Moving to the cloud is not a walk in the park; it’s a lengthy and complicated project. The complexity is directly proportionate with the amount of luggage you’re bringing.
What can I do to achieve a successful digital transformation?
Digital transformation comes in the cloud has three parts:
The cloud comes with some challenges:
- Identity & Data
- Data Classification, Labeling, and Protection
- Multi-geo deployments, and GDPR
- OPEX vs CAPEX
- Different payment possibilities -PAYG, CSP, EA, MCA
- Resource Locks
- Resource Tagging
- Resource Auditing
All of those challenges can be overcome with understanding and proper planning. Not all deployments are cloud worthy; There’s no such thing as shutting down existing data centers and moving everything in the cloud. Hybrid is the way to go with existing deployments.
For example, if you already have Active Directory then moving to the cloud is only done via the Hybrid Approach where you synchronize your identities to Azure Active Directory and gain the benefits from both worlds.
Applying governance principles to Azure subscriptions
Everybody starts from scratch. You create one subscription and start moving workloads in it.
That’s wrong at many levels because you lose control of your cloud environment and starting asking questions like:
- Who created that resource?
- How much does that application cost?
- Who is the owner of that environment?
- Who is the owner of that resource group?
- Why does our cloud cost so much?
- What caused that security breach?
- Where is our data?
- Who can see that data?
- Application is down, what changed?
The list can go on for ages if you don’t do proper governance. Here’s a list of good practices you can use to improve your governance state in Azure:
- Management groups
- Multiple subscriptions
- Infrastructure as Code
The last one is a bit tricky to implement for existing services but it can be done. You can leverage Azure Resource Manager templates or Terraform to get a desired state configuration in your cloud environment.
Let’s start off with the most important in achieving governance.
Azure Tags are key-value pairs that allow you to organize your resources logically. Plain and simple, they don’t add any technical value to the proposition but in the big picture, tags can give you a lot of insight into the question storm that can popup. You can tag resources, resource groups, and subscriptions and after that, you can group them together in views. Tags go hand in hand with the Cost Management feature of Azure where you can generate a report of how much did that specific tag cost.
Azure Policies allow you to enforce a set of rules or effects over your resources that exist in a specific subscription. With Azure Policies, you can deploy the Log Analytics agent to all the resources that support it and keep that retain that state. Something changed and it’s not compliant anymore? The system automatically redeploys the policy.
This service allows you to do keep your defined governance state across all resources, regardless of state. If the policy cannot apply then it will show that resource as noncompliant. An example of starting off with Azure Policies is to setup resource tagging by owner and cost center. You allow deployments only if they have the owner and cost-center tags. If they don’t, the deployment fails with an error.
Management groups and multiple subscriptions go hand in hand. Having one subscription for everything just complicates everything but having multiple subscriptions without any type of management is worse. So the Management Groups offering that’s available in Azure can help you group together multiple subscriptions based on cost centers, departments, applications, you name it.
Management groups allow you to set up RBAC, Tagging, Policies at scale. You group all your subscriptions under a management group and they inherit all the settings from above. Even more, you can nest Management Groups together so you can apply settings granularly.
Let’s say that you need to keep data inside the European Union and want to minimize the risk of having data outside it. You can set up a policy on a top tier management group to only allow deployments in West Europe and North Europe regions. This setting would propagate down the stack to everything and from that point, nobody can deploy resources outside the specified regions.
This pretty much covers the basics; Is it enough? Not even close. Achieving full governance is a long-running task and you have to keep going to maintain it.
Microsoft published a framework to help businesses get a sense of how to approach this situation and it’s called the Cloud Adoption Framework for Azure which is a good starting point and you should use it as a guideline.
What about security? What are my options?
When it comes to security, you have multiple possibilities for a secure infrastructure as long as you leverage it. Azure puts forth a lot of capabilities to properly secure your cloud environment and you don’t have to install or configure over-complicated systems.
The image above doesn’t go into much detail when it comes to what security offerings we should use but it tells us to leverage the security systems that make sense and leverage the intelligence provided by the cloud.
The list of security services in Azure is pretty exhaustive so I will just mention the absolute necessary ones:
- Azure Security Center
- Azure Sentinel
- Azure Privileged Identity Management
I suggest checking the link above to see all the services that are available so you can get a sense of how you can handle your current or future cloud deployment (the list is old and some services do not show up there). I won’t cover all the services in this post as not all of them are mandatory for a successful deployment.
Let’s start with the mandatory security offerings
Azure Security Center is the be-all and end-all of security monitoring in your Azure environment. It offers you a birds-eye view of your security state based on a set of policies and recommendations that come out of the box. It comes in two tiers, Free and Standard. While the free tier works up to a point for non-critical, non-production subscriptions, the standard tier is what is the best to enable for critical, production subscriptions.
Out of the box, the Standard tier offers Just-In-Time feature access for virtual machines where you can block management ports by default and automatically allow access to them by a request in the portal or CLI and Adaptive Application Controls which allows you to specify what applications should be running on the virtual machines in scope.
Azure Security Center also has the possibility of throwing alerts and even gives you the possibility of automating them. The alerting system pulls data from the Intelligent Security Graph which is a huge benefit out of the box. This means that anybody that gets attacked in the Microsoft Cloud (Azure, Office 365), all the data of that attack is going back in the graph and you can get alerted if anything similar happens to your workloads.
Azure Sentinel is Microsoft’s approach to a cloud-native SIEM and SOAR. It’s a query-based, AI & ML powered system that collects data from Office 365, Azure AD, Azure ATP, Microsoft Cloud App Security, WDATP, Azure Security Center and third party systems. It’s connector based so you need to enable those connectors for the systems that you want to monitor.
Sentinel runs off a Log Analytics workspace and it’s RBAC capable. The recommendation here would be to have a centralized monitoring sentinel workspace where you set up dashboards and alerting rules. Being a query-based system like Splunk, you have the possibility of converting Splunk type queries to Azure Sentinel queries and not start entirely from scratch.
Azure Privileged Identity management is a system based on Azure Active Directory which as the name suggests; The system manages identities, privileged ones. Historically, most attacks happen from inside meaning that having more access than necessary can be a cause of concern.
PIM works on Azure Active Directory and on Azure Resources. You have the possibility to convert the permanent roles that are assigned in your organization to eligible roles and from there nobody has any more access unless granted (via self-service or approval). On the AAD side, it applies to all AAD roles and on the Azure side, it applies to Management Groups, subscriptions, resource groups or resources and it supports custom RBAC roles as well.
PIM can co-exist with standing privileges so your rollout can be slow with a determined scope.
This has been a very long post which might have been a good candidate for multiple parts. I for one don’t like doing that as I lose my trail of thought and go off the trail I started on.
The main takeaway of this post is that Digital Transformation, Modern Workplace, and all the other buzzwords cannot be implemented without planning, openness and time. Emphasis on time, a project of this scale can take 18-24 months for a large enterprise. It takes that much because there are many procedures and policies that need to change for it to be a success and not a crash & burn.
My recommendation would be to start small, identity the environment, set up auditing policies, tag everything and then move to lockdown.
That being said, have a good one!