Moving to the cloud is “easy”; Managing it is another ordeal.
I’m going to start with a disclaimer; This post focuses on achieving governance and security in Azure; This doesn’t mean that what I’m going to write here cannot apply to AWS or Google, they have other technologies that can help you achieve full governance;
You’ve heard of digital transformation, modern workplace, and the whole 9 yards. Easy? Nope.
You’ve been told that moving to the cloud will grant you all the security you want, all the control you wish, lower costs, everything is enabled out of the box, and it just works. The reality is that no cloud provider will ever do that for you out of the box.
Let’s see what marketing says that you get with the cloud:
Efficiency & Scalability
Security & Compliance
Pay as you go -Usage-based payment model
Sounds good? Indeed it does, but there’s a catch, all of the above have a price. They require an organizational change, a change in mentality, and a lot of sweat. You cannot expect to get all of those things by just performing a lift & shift; Migrating your VMs to the cloud means that you’re shifting ALL your on-premises costs to the cloud.
What do I mean by ALL the costs?
Datacenters cost money -You have power, cooling, staffing costs. Then you have hardware costs, then you have hardware replacement costs, and then you have hardware refresh costs, and then you have capacity costs. Should I continue?
If you’re going IaaS, then you’re going to pay the cost of running IaaS, and most of the time, you’re going to pay more for running a virtual machine in the cloud than on-premises.
Why? You might ask. Well, because the cloud provider didn’t cut corners as you did. The cloud provider offers a financially backed SLA, which means that if your virtual machine goes does down, you get a cut from the bill. Now to provide you with that bill cut, the cloud provider has to ensure that the underlying infrastructure is very redundant. Who do you think is paying for that redundant infrastructure? You are.
Then why PaaS is much cheaper?
IaaS means Infrastructure-as-a-Service, which means that you’re getting an infrastructure where you can put your virtual machines on top of it, and you manage everything. The cloud provider cannot do anything with your virtual machines. They have no control over your virtual machines, and they don’t want any control.
PaaS or Platform-as-a-Service means that you’re getting a place where you can upload your application code and manage the application as is. No virtual machines involved, no OS management. Let’s call it minimal overhead. This means that the cloud provider has control of the system that’s running your code.
PaaS is much cheaper because you’re getting a shared pool of resources where you can deploy your application, and the cloud provider manages the underlying infrastructure. Your responsibility is the application and the controls associated with it (identity, app-level controls, and so on.). In a nutshell, by sharing VMs resources with other tenants, you’re sharing the redundant infrastructure costs with other tenants as well.
That’s the short version of IaaS vs. PaaS. The situation is much more complicated than sharing the underlying infrastructure costs, but you get the idea. Azure has some dedicated environments where you’re not getting reduced costs, but more control in those situations where you’re limited by ancient compliance and regulations controls that have not adapted yet adapted to the cloud.
What does this mean? This means that you’re still responsible for the deployment that you’re doing in the cloud, but you transfer a subset of problems to the cloud provider.
No more hardware swaps
No more hardware refresh
No more dead servers
No more waiting weeks for capacity -CPU, RAM, Storage
No more dealing with vendors – Except the cloud vendor.
No more datacenter associated costs -Power, Cooling, Staffing
These are the clear responsibilities that you transfer to the cloud provider, but unfortunately, the cloud doesn’t solve all the op-premises issues. You’re still going have to deal with the old issues and some new:
Cost Management (this one is fun)
Backup and Disaster Recovery
Traditional approach induced issues (Cloud is cloud, and on-prem is on-prem -don’t mix and match)
Inflexibility induced costs
Depending on the cloud model you pick, you’re going to have more or fewer issues that I’ve outlined. You might have seen two issues that raised your eyebrows a bit.
Traditional approach induced issues and Inflexibility induced costs.
The cloud comes with a plethora of services that can replace your bloated infrastructure systems. You don’t need firewall appliances, proxies, multi-hop IDPs, MFA appliances, and so on. You don’t need cascading network devices to have segregation of networks, and you certainly don’t need VLANs.
Azure has services for almost all of those things; You have Azure Firewall for your firewall needs, you have Azure Active Directory for your IDP, you have Azure MFA builtin for MFA. You have VNET peering for network segregation; you have NSGs for stateful firewall-ing at a VM or subnet level. The list can go on.
By adopting an on-premises approach to the cloud, you will inevitably have more on your plate than you signed up. Keep it simple, adapt your technologies, and sleep well at night.
Second, Inflexibility induced costs; The cloud provides you an enormous amount of flexibility. You don’t need to overcommit capacity; you can scale up/down right or left as much as you want when you want. You can automate your workloads to be resource-efficient based on usage, e.g., Scale down your workloads during the weekends.
With PaaS, you can do all of that. With IaaS, up to a point, you cannot. If you’re adamant about IaaS, then you’re going to pay the price. You don’t need a VM to run a web application, you have App Services for that, you don’t need an NGFW to manage your network, you have Azure Firewall for that, and you absolutely don’t need a file server cluster to manage files, you have Azure Storage for that.
Don’t get me wrong; I understand that there are situations where the application is so old that you cannot adapt it to the cloud. I’ve seen a lot of applications fail horribly just because you’ve put them on different VMs if you want to benefit from the cloud and not pay hand over fist.
What does everything translate to?
Moving to the cloud is not a walk in the park; it’s a lengthy and complicated project. The complexity is directly proportionate with the amount of luggage you’re bringing.
What can I do to achieve a successful digital transformation?
Digital transformation comes in the cloud has three parts:
The cloud comes with some challenges:
Identity & Data
Data Classification, Labeling, and Protection
Multi-geo deployments, and GDPR
OPEX vs CAPEX
Different payment possibilities -PAYG, CSP, EA, MCA
All of those challenges can be overcome with understanding and proper planning. Not all deployments are cloud worthy; There’s no such thing as shutting down existing data centers and moving everything in the cloud. Hybrid is the way to go with existing deployments.
For example, if you already have Active Directory then moving to the cloud is only done via the Hybrid Approach where you synchronize your identities to Azure Active Directory and gain the benefits from both worlds.
Applying governance principles to Azure subscriptions
Everybody starts from scratch. You create one subscription and start moving workloads in it.
That’s wrong at many levels because you lose control of your cloud environment and starting asking questions like:
Who created that resource?
How much does that application cost?
Who is the owner of that environment?
Who is the owner of that resource group?
Why does our cloud cost so much?
What caused that security breach?
Where is our data?
Who can see that data?
Application is down, what changed?
The list can go on for ages if you don’t do proper governance. Here’s a list of good practices you can use to improve your governance state in Azure:
Infrastructure as Code
The last one is a bit tricky to implement for existing services but it can be done. You can leverage Azure Resource Manager templates or Terraform to get a desired state configuration in your cloud environment.
Let’s start off with the most important in achieving governance.
Azure Tags are key-value pairs that allow you to organize your resources logically. Plain and simple, they don’t add any technical value to the proposition but in the big picture, tags can give you a lot of insight into the question storm that can popup. You can tag resources, resource groups, and subscriptions and after that, you can group them together in views. Tags go hand in hand with the Cost Management feature of Azure where you can generate a report of how much did that specific tag cost.
Azure Policies allow you to enforce a set of rules or effects over your resources that exist in a specific subscription. With Azure Policies, you can deploy the Log Analytics agent to all the resources that support it and keep that retain that state. Something changed and it’s not compliant anymore? The system automatically redeploys the policy.
This service allows you to do keep your defined governance state across all resources, regardless of state. If the policy cannot apply then it will show that resource as noncompliant. An example of starting off with Azure Policies is to setup resource tagging by owner and cost center. You allow deployments only if they have the owner and cost-center tags. If they don’t, the deployment fails with an error.
Management groups and multiple subscriptions go hand in hand. Having one subscription for everything just complicates everything but having multiple subscriptions without any type of management is worse. So the Management Groupsoffering that’s available in Azure can help you group together multiple subscriptions based on cost centers, departments, applications, you name it.
Management groups allow you to set up RBAC, Tagging, Policies at scale. You group all your subscriptions under a management group and they inherit all the settings from above. Even more, you can nest Management Groups together so you can apply settings granularly.
Let’s say that you need to keep data inside the European Union and want to minimize the risk of having data outside it. You can set up a policy on a top tier management group to only allow deployments in West Europe and North Europe regions. This setting would propagate down the stack to everything and from that point, nobody can deploy resources outside the specified regions.
This pretty much covers the basics; Is it enough? Not even close. Achieving full governance is a long-running task and you have to keep going to maintain it.
Microsoft published a framework to help businesses get a sense of how to approach this situation and it’s called the Cloud Adoption Framework for Azure which is a good starting point and you should use it as a guideline.
What about security? What are my options?
When it comes to security, you have multiple possibilities for a secure infrastructure as long as you leverage it. Azure puts forth a lot of capabilities to properly secure your cloud environment and you don’t have to install or configure over-complicated systems.
The image above doesn’t go into much detail when it comes to what security offerings we should use but it tells us to leverage the security systems that make sense and leverage the intelligence provided by the cloud.
The list of security services in Azure is pretty exhaustive so I will just mention the absolute necessary ones:
I suggest checking the link above to see all the services that are available so you can get a sense of how you can handle your current or future cloud deployment (the list is old and some services do not show up there). I won’t cover all the services in this post as not all of them are mandatory for a successful deployment.
Let’s start with the mandatory security offerings
Azure Security Center is the be-all and end-all of security monitoring in your Azure environment. It offers you a birds-eye view of your security state based on a set of policies and recommendations that come out of the box. It comes in two tiers, Free and Standard. While the free tier works up to a point for non-critical, non-production subscriptions, the standard tier is what is the best to enable for critical, production subscriptions.
Out of the box, the Standard tier offers Just-In-Time feature access for virtual machines where you can block management ports by default and automatically allow access to them by a request in the portal or CLI and Adaptive Application Controls which allows you to specify what applications should be running on the virtual machines in scope.
Azure Security Center also has the possibility of throwing alerts and even gives you the possibility of automating them. The alerting system pulls data from the Intelligent Security Graph which is a huge benefit out of the box. This means that anybody that gets attacked in the Microsoft Cloud (Azure, Office 365), all the data of that attack is going back in the graph and you can get alerted if anything similar happens to your workloads.
Azure Sentinel is Microsoft’s approach to a cloud-native SIEM and SOAR. It’s a query-based, AI & ML powered system that collects data from Office 365, Azure AD, Azure ATP, Microsoft Cloud App Security, WDATP, Azure Security Center and third party systems. It’s connector based so you need to enable those connectors for the systems that you want to monitor.
Sentinel runs off a Log Analytics workspace and it’s RBAC capable. The recommendation here would be to have a centralized monitoring sentinel workspace where you set up dashboards and alerting rules. Being a query-based system like Splunk, you have the possibility of converting Splunk type queries to Azure Sentinel queries and not start entirely from scratch.
Azure Privileged Identity management is a system based on Azure Active Directory which as the name suggests; The system manages identities, privileged ones. Historically, most attacks happen from inside meaning that having more access than necessary can be a cause of concern.
PIM works on Azure Active Directory and on Azure Resources. You have the possibility to convert the permanent roles that are assigned in your organization to eligible roles and from there nobody has any more access unless granted (via self-service or approval). On the AAD side, it applies to all AAD roles and on the Azure side, it applies to Management Groups, subscriptions, resource groups or resources and it supports custom RBAC roles as well.
PIM can co-exist with standing privileges so your rollout can be slow with a determined scope.
This has been a very long post which might have been a good candidate for multiple parts. I for one don’t like doing that as I lose my trail of thought and go off the trail I started on.
The main takeaway of this post is that Digital Transformation, Modern Workplace, and all the other buzzwords cannot be implemented without planning, openness and time. Emphasis on time, a project of this scale can take 18-24 months for a large enterprise. It takes that much because there are many procedures and policies that need to change for it to be a success and not a crash & burn.
My recommendation would be to start small, identity the environment, set up auditing policies, tag everything and then move to lockdown.
When they first came as an offering a long time ago, Azure SQL Databases always had a mystery attached to them regarding the performance tier you should use.
Database Throughput Unit or DTU is a way to describe the relative capacity of a performance SKU of Basic, Standard, and Premium databases. DTUs are based on a measure of CPU, memory, I/O reads, and writes. When you want to increase the “power” a database, you just increase the number of DTUs that are allocated to that database. A 100 DTU DB is much more powerful than a 50 DTU one.
Table of DTU SKUs
Development and production
Development and production
Development and production
Low, Medium, High
IO throughput (approximate)
1-5 IOPS per DTU
1-5 IOPS per DTU
25 IOPS per DTU
IO latency (approximate)
5 ms (read), 10 ms (write)
5 ms (read), 10 ms (write)
2 ms (read/write)
You can say that the DTU model is a great solution for people who want a preconfigured pool of resources out of the box for their workloads. The problem that can appear with the DTU model is that when you hit the DTU limit, you will get throttled which will result in query timeouts or slowdowns for which the solution is to increase the number of DTUs.
The concept that DTUs present is that when you need to increase the number of resources allocated to that database, you increase the number of DTUs but one issue is is that you don’t have the possibility of individually scaling the CPU / Storage / RAM.
This database model has a more classical approach to let’s say on-premises workloads. This mapping allows you to specify the number of cores, RAM, and I/O. So compared to the DTU model where you increase the CPU, RAM, and I/O automatically; In the vCore model, you have the possibility of doing it individually which allows you to have a lot of flexibility.
Scaling up and down in the vCore model is done on two planes with different CPU specs based on generation or VM model:
CPU plane – Generation specific
Storage plane – Generation specific -> vCore specific
– Intel E5-2673 v3 (Haswell) 2.4 GHz processors – Provision up to 24 vCores (1 vCore = 1 physical core)
– 7 GB per vCore – Provision up to 168 GB
Provisioned compute – Intel E5-2673 v4 (Broadwell) 2.3-GHz and Intel SP-8160 (Skylake)* processors – Provision up to 80 vCores (1 vCore = 1 hyper-thread)
Serverless compute – Intel E5-2673 v4 (Broadwell) 2.3-GHz and Intel SP-8160 (Skylake)* processors – Auto-scale up to 16 vCores (1 vCore = 1 hyper-thread)
Provisioned compute – 5.1 GB per vCore – Provision up to 408 GB
Serverless compute – Auto-scale up to 24 GB per vCore – Auto-scale up to 48 GB max
– Intel Xeon Platinum 8168 (SkyLake) processors – Featuring a sustained all core turbo clock speed of 3.4 GHz and a maximum single-core turbo clock speed of 3.7 GHz. – Provision 72 vCores (1 vCore = 1 hyper-thread)
As you can see, depending on the generation, you will get a specific CPU model per generation or VM series and the RAM allocation is done per vCore.
Choosing a generation can look complicated but you’re not locked into a choice. So if you decide post-deployment that a different generation or VM type works better for you, then that option is available.
DTU vs vCores
Now that we understand the difference between DTUs and vCores, let’s try and compare them.
Basic, Standard, Premium
Gen 4, Gen 5, Fsv2, M
Compute, Memory + Storage
DTU + Backup Storage
vCore, Storage, Backup Storage + Logs Storage
As you can see from the table there’s a hefty difference in specs from the DTU and vCore model and, after carefully analyzing the options available you might be inclined to go directly with the vCore model rather than the DTU but the difference is in the details.
One question that you might have would be “How many DTUs are equivalent to a vCore?” Which I can safely say that a generic mapping would be:
100 DTUs Standard = 1 vCore – General Purpose
125 DTUs Premium = 1 vCore – Business Critical
8000 DTUs -> 80 vCores but the maximum amount of DTUs pe SQL DB is 4000 🙂
Anything less than 100 DTUs would mean that you’re using less than a vCPU, more like a shared core but testing would be required to find the sweet spot.
Another benefit of the vCore model is that you can reserve the capacity in advance for 1/3 years and you get a better price, plus if you already have an on-premises SQL license with Software Assurance then you can activate the Hybrid benefits checkbox and get even more bang for your buck as you would get from an Azure VM.
So should you move to vCores?
The answer to this question is “depends“. While the vCore model looks more appealing from a traditional approach perspective but the real cost-benefit starts showing from 400 DTUs and up. If your workloads use less than 400 DTUs (roughly 4 vCores) then I would stick with the DTU model and when the time comes then I would just press a button in the portal and migrate to the vCore model.
Besides the tiers I mentioned above there are two other tiers called Serverless and HyperScale which have some benefits in some use cases and not all of them.
In the end, what I can say is that DTUs are not yet ready to be replaced by vCore but I’m expecting this as the next step. Until we get a clear alternative to DTUs, they are here to stay 🙂
KEDA (Kubernetes-based Event-Driven Autoscaling) is an opensource project built by Microsoft in collaboration with Red Hat, which provides event-driven autoscaling to containers running on an AKS (Azure Kubernetes Service), EKS ( Elastic Kubernetes Service), GKE (Google Kubernetes Engine) or on-premises Kubernetes clusters 😉
KEDA allows for fine-grained autoscaling (including to/from zero) for event-driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition. Most of the time, we scale systems (manually or automatically) using some metrics that get triggered.
For example, if CPU > 60% for 5 minutes, scale our app service out to a second instance. By the time we’ve raised the trigger and completed the scale-out, the burst of traffic/events has passed. KEDA, on the other hand, exposes rich events data like Azure Message Queue length to the horizontal pod auto-scaler so that it can manage the scale-out for us. Once one or more pods have been deployed to meet the event demands, events (or messages) can be consumed directly from the source, which in our example is the Azure Queue.
Getting started with KEDA
Azure Container Registry
To follow this example without installing anything locally, you can load up the Azure Cloud Shell or you can install azcli, kubectl, helm and the Azure Functions SDK.
I will do everything in the cloud with monitoring and running other commands with Lens | Kubernetes IDE from my workstation.
Now open up your Azure Cloud Shell, rollup your sleeves and let’s get started:
Be attentive that there are <> brackets in the code. Do not copy-paste. Adjust the code for your deployment
#Login to your AKS cluster
az aks get-credentials-n<aksClusterName>-g<resourceGroupofSaidCluster>
Once the PowerShell script started running, we get the first messages in the queue
With kubectl in wait mode I start observing the fireworks happening
Looks but can you expand more on this? How can I actually use this solution?
This solution basically exists to have the Azure Functions product everywhere. At this point in time, you can run Azure Functions wherever you want for free. You are not obligated to run functions in Azure, you can run them in Google or AWS if you want. Want on-premises? Go crazy, it works as long as you’re running Kubernetes.
The main idea is to bring Functions as a Service (FaaS) on any system and that’s very cool. While preparing for this post, I took a running system that uses Bot Framework, Service Bus and Azure Functions and ported the Azure Function part to KEDA. Zero downtime, no difference, it just works. The chatbot adds messages in a service bus and then the function triggers. I ran them in parallel for five minutes then I took down the Azure Function one.
The main point of using KEDA is to be cloud-agnostic. You know Kubernetes, you know Azure Functions, you don’t need to change anything. Hell, if you’re in those regulated environments where the cloud is a big no-no then you can just as well integrate KEDA in your clusters and then set up triggers with whatever messaging system you have or even just set up plain HTTP triggers and be done with it. Once the regulatory part relaxes and the company starts to dips it’s toes in the water, you can just as well copy-paste the code in Azure 🙂
Running a container in the cloud is quite easy these days but how about multi-web app containers?
In this post, we will find out about the multi-container feature in Azure Web Apps and how we can leverage it.
Getting started with a multi-container in Azure Web Apps is quite easy as a matter of fact; The documentation is quite good as a starting point but when we want it to in production is where the real problems start.
Doing everything from the portal is an option but my preferred method is to use the Azure Cloud Shell or Visual Studio Code with the Azure module where you load the Azure Cloud Shell
Let’s start with the basics:
Load up the Azure Cloud Shell
Go to shell.azure.com, select the tenant where you want to load up the Cloud Shell
[Skip this step if not applicable] Select the Azure SubscriptionSelect the Azure Subscription you’re going to use for the multi-container app deployment
az account set –subscription “<SubscriptionName>”
Git Clone the sample code from the Azure Samples repo
git clone https://github.com/Azure-Samples/multicontainerwordpress in the Azure Cloud Shell
Create a Resource Group where the App Service and App Service Plan will sit-in
az group create –name Multi-Container-App-RG –location “westeurope”
Create an App Service Plan
az appservice plan create –name MultiContainerServicePlan –resource-group Multi-Container-App-RG –sku S1 –is-linux
These are the basic steps you have to take in order to get a multi-container web app up and running. The last step uses the –multicontainer-config-type compose which has the following code inside it
The docker-compose file is in YAML (Yet Another Markup Language) and you can see in there that you have two services referenced, DB and WordPress; This means that the docker-compose is telling the App Service to run two containers in parallel. Down below is a small explanation of the docker-compose specific commands
The services block is where you define the containers that will run; The format is like this; services: servicename: <settings>
The container image that the App Service will pull. It can be any container registry, public or private
Where the container will write its files; By default, it’s running the files inside the container but you can specify an external “folder” where to write the files
This is the container restart policy; If the container crashes then you want docker to restart it. These are the following restart policies available at the time writing this article restart: no restart: always restart: on-failure restart: unless-stopped
Instantiate or use one or more environment variables; App Service configuration fields are set up as key-value pairs inside a container
Depends_on field means that this container will not start unless the depending containers are already running
This means on which port the container should run and if it should be exposed to the internet The syntax is as follows: CONTAINER_PORT:EXPOSED PORT 8000:80 -> Means instantiate the container on port 8000 expose it as port 80 in the internet
The table from above explains the relevant setting blocks we’re going to need when we will be using the multi-container feature in Azure App Services. The most important setting of them all is the ports one. Why you may ask? Well in App Services you cannot override the default 80/443 ports, you only have the possibility of mapping them.
In the example above you’re seeing that port 8000 on the WordPress container is mapped to port 80 on the App Service and that the “db” service is not exposed in any way. By mapping port 8000 to port 80 tells the App Service how to route traffic to that specific container.
How does the WordPress container know where to connect to the database?
If you look closely to the docker-compose file, you’re going to see an environment variable called WORDPRESS_DB_HOST: db:3306
Docker, Kubernetes, and other container runtimes/orchestrators offer by default a service discovery feature where it simply requires the alias, service name or label and it will automagically make things happen on the backend without having to deal with any networking.
In the example above, having a service named “db” means that container IP (whatever it may be) has a DB value attached to it (like DNS) which allows us to tell the WordPress container that hey the database you’re looking for is this called “DB” and Dockers problem to tell you the IP Address.
Nailed the basics; How should I run this in production?
First. At the time of writing this article -It’s a preview offering with no SLA and any disaster is on you.
Concept, demo, POC environments are easy because you’re not exposing them to the real world. There are multiple problems that can appear with multi-container apps and I have not experienced them all. As long as you know your app well and you know that it can run without any problems wrapped in a single App Service then you should not have any problems.
I personally don’t recommend multi-containers in one service (App Service / Pod) because it goes against the one container -> one service design principle but there are cases where they fit together like a glove. Very rare cases.
My experience of running multi-container applications in App Services and Kubernetes (multi-container pods) is not that positive. If you’re not involved from the start in the dev/modernization process then it’s going to be a very rough ride.
Problems that you might hit when running the multi-container feature in App Services.
One of the containers has a memory leak and causes an OOM exception
This means that all the containers will get restarted regardless of the restart policy
One container crashed because of reasons
Everything gets restarted regardless of restart policy
The web service container is running but App Service is showing 404
Web Service container should be the first one in the Services stack in the docker-compose file
The web Service container is running but cannot access the backend container
The backend container is not starting or it needs to have its port exposed explicitly
Use expose docker-compose command
Cannot access anything; Kudu is not working; App Service is not responding;
Delete and restore the web app. No, I’m not kidding, it’s dead Jim.
Very slow download speed from Container Registry
You cannot speed it up, unfortunately.
If you’re not recommending the, then why are you using them?
You might go to the valley of “do as I say and not as I do”. Setting the joke aside, I very carefully analyze the possible issues that might arise with an application and do a cost/benefit analysis as well. In some cases, the benefits outweigh the risks and it’s worth the hassle.
That being said, this feature works quite well with new applications but not with recently modernized applications.
Ever heard of a jump server or bastion server? No? Well then this post is for you
Before we dive into what’s Azure Bastion, we should understand how things work right now with regular jump servers or bastion hosts.
A jump server/bastion host is a virtual machine/server which sits inside a network with the purpose to allow remote access to servers and services without adding public IPs to them, thus exposing them to the internet.
Access to that jump server can be granted in numerous ways but the most common are:
Public Endpoint with Access Control authentication e.g., Cloudflare Access rules
Public Endpoint with a Just In Time solution
Remote Desktop Gateway access with AD Authentication
The list can go on; The idea is that the endpoint that’s used to access the production network is as secure as possible because it’s being exposed in one way or another.
Most of my deployments in Azure which have virtual machines have a jump server/bastion host configured. The setup looks something like this:
In some cases, there’s a need to have more than two sessions towards that VM so there’s a need for a different solution like a remote desktop gateway service.
The problem with these VMs is that you need to manage them and if you’re doing an audit then they get added in the scope of the audit and you need to explain the whole process of managing those VMs; Starting from regular patch management to security management, risk management, and incident management. These things do not help with your mental health (been through multiple audits like this). The solution to this problem is introducing a managed offering where you don’t do any of those things, just use it that’s it. From here we segway towards Azure Bastion.
What is Azure Bastion? (Preview)
Azure Bastion is Microsoft’s answer to jump servers/bastion hosts with a PaaS offering that you deploy in a VNET from the marketplace. Simple as that.
The beauty of this solution is that you get “direct” connectivity to the VMs that you want to login to using RDP / SSH, no double hop or any of that nonsense.
With a Jump-Server-as-a-Service, you don’t need to do any more patch management, security management, scaling and all other things that are associated with something like this. Plus in case of an audit, you can just say that it’s managed by the cloud provider and you’re done with it 🙂
What you need to do when you want to login to a VM securely using Bastion is to just go towards the VM blade, press connect and select Bastion. From there you input your credentials and a new tab pops up.
Deploying the Bastion:
The experience from the portal is pretty good. You need to go to the marketplace, type in “bastion” and select create.
From there you will encounter the blade from below:
From there you walk through the steps and press on create. At this point in time, you cannot specify a different subnet than the AzureBastionSubnet and if you create one it has to have exactly that same name other it won’t work. This is a small inconvenience and, probably a button or a quick create option will appear in the future; For this, I quickly cooked up an ARM template that you can adapt to your needs🙂
After the deployment is finished, go to the VM, press on connect, select Bastion, input your credentials and press connect.
Connecting to the VM:
That’s it basically; It’s quite simple to deploy and use the Azure Bastion offering.
Some issues with the current offering but on the roadmap:
VNET Peering is not supported
No Seamless SSO
No AAD Integration -> No MFA challenge and such
No native client support -> need browser access to the portal
Are they deal breakers? Yes and no. I would have loved out of the box support for VNET peering but fingers crossed:)
That being said, have a good one and ARM template is below 🙂
ARM Template for 1-click deployment for Bastion services