Azure Private link is a relatively new service that allows the users to connect to a PaaS service privately using an IP address.
You might say that this feature existed for ages with Service Endpoints which injected the PaaS service in a virtual network and allowed private connectivity towards the service without it being exposed on the internet. The main difference between the two is that Service Endpoints only injects the service in the Virtual Network and cuts off the internet connection while Private Link is a service which attaches a private IP that exists in your virtual network and you can access it using that RFC1918 IP address,
As I mentioned, before Private Link we had Azure Service Endpoints which injected the PaaS service in the delegated subnet and then the Azure gateway cut off any internet traffic towards it.
The problem with service endpoints is that it requires a cloud-native operation for it to work properly, meaning that you don’t use third-party NVAs to funnel the 0/0 traffic towards them for traffic inspection because if you do that it breaks the system and you lose access to your storage account/database or other Service Endpoint enabled systems. Azure Firewall, for example, supports this scenario – Example –
The real problem is that Azure Firewall came late and lacking some critical features that most companies require. It’s getting there but it’s going to take some more time. All modesty aside, I managed to convince a bank CISO that service endpoints are enough for their SQL Databases and this was about an e-banking system. The problem with the cloud is that people do not understand it and most security people if not all if they do not understand something 100% then they’re going to block it but I digress.
To overcome the complete storm of complaints regarding service endpoints and to satisfy the general public requests to attach private IPs to PaaS Services, Microsoft came up with the Private Link service which basically attaches an RFC1918 IP address to a supported PaaS system and has that system be only available in the VNET where the IP exists.
This is a much more understandable system than service endpoints. You simply tell whoever may be concerned that the system doesn’t have a public IP address, do a demo and done.
Blob (blob, blob_secondary) Data Lake File System Gen2 (dfs, dfs_secondary)
Azure Cosmos DB
Sql, MongoDB, Cassandra, Gremlin, Table
Azure Database for PostgreSQL -Single server
Azure Database for MySQL
Azure Database for MariaDB
Azure Key Vault
Azure Kubernetes Service – Kubernetes API
Azure Container Registry
Azure App Configuration
Azure Event Hub
Azure Service Bus
Azure Event Grid
Azure Event Grid
As you can see, a lot of services are getting Private Link support and Microsoft knows that this is a must because the blocker in adoption was always the publicly exposed part of the service. Now that you’re able to actually have a private IP attached to that service, you don’t have a “security” concern attached. Don’t get me wrong, I’m a strong believer in security, but security should not add roadblocks but guardrails.
Sounds good! How do I get started?
First of all, be very wary that some PaaS Services may be in preview and breaking changes may occur. Next, validate that you do not use Service Endpoints in the VNET where you will be adding the Private Link connection.
At the time of writing this article, there’s a huge capacity issue in Azure and I cannot create bespoke scripts to automate this cycle end to end 🙁
The example from above shows how to connect an Azure VM to an Azure SQL Database using the private IP. Please take notice that in the guide above references an Azure Private DNS Zone that is injected in the VNET. Azure SQL Databases cannot be accessed via the private IP directly and this limitation is by design.
How do I go full-fledged production?
For starters, here’s a reference architecture that shows how you would integrate Private Link in your hybrid cloud environment.
Implementing this in production depends on a system by system basis. You need to know the limitations of each system and work around them. For SQL Databases you know that you need to call them using the Private DNS CNAME, storage accounts work the same. Implementing access on-premises can be done with a simple DNS forwarder. I say simple but it can be a hassle -Been there done that 🙂
I would say start small, always in the development environment and then move to production. Never start in production unless you want a new job.
Moving to the cloud is “easy”; Managing it is another ordeal.
I’m going to start with a disclaimer; This post focuses on achieving governance and security in Azure; This doesn’t mean that what I’m going to write here cannot apply to AWS or Google, they have other technologies that can help you achieve full governance;
You’ve heard of digital transformation, modern workplace, and the whole 9 yards. Easy? Nope.
You’ve been told that moving to the cloud will grant you all the security you want, all the control you wish, lower costs, everything is enabled out of the box, and it just works. The reality is that no cloud provider will ever do that for you out of the box.
Let’s see what marketing says that you get with the cloud:
Efficiency & Scalability
Security & Compliance
Pay as you go -Usage-based payment model
Sounds good? Indeed it does, but there’s a catch, all of the above have a price. They require an organizational change, a change in mentality, and a lot of sweat. You cannot expect to get all of those things by just performing a lift & shift; Migrating your VMs to the cloud means that you’re shifting ALL your on-premises costs to the cloud.
What do I mean by ALL the costs?
Datacenters cost money -You have power, cooling, staffing costs. Then you have hardware costs, then you have hardware replacement costs, and then you have hardware refresh costs, and then you have capacity costs. Should I continue?
If you’re going IaaS, then you’re going to pay the cost of running IaaS, and most of the time, you’re going to pay more for running a virtual machine in the cloud than on-premises.
Why? You might ask. Well, because the cloud provider didn’t cut corners as you did. The cloud provider offers a financially backed SLA, which means that if your virtual machine goes does down, you get a cut from the bill. Now to provide you with that bill cut, the cloud provider has to ensure that the underlying infrastructure is very redundant. Who do you think is paying for that redundant infrastructure? You are.
Then why PaaS is much cheaper?
IaaS means Infrastructure-as-a-Service, which means that you’re getting an infrastructure where you can put your virtual machines on top of it, and you manage everything. The cloud provider cannot do anything with your virtual machines. They have no control over your virtual machines, and they don’t want any control.
PaaS or Platform-as-a-Service means that you’re getting a place where you can upload your application code and manage the application as is. No virtual machines involved, no OS management. Let’s call it minimal overhead. This means that the cloud provider has control of the system that’s running your code.
PaaS is much cheaper because you’re getting a shared pool of resources where you can deploy your application, and the cloud provider manages the underlying infrastructure. Your responsibility is the application and the controls associated with it (identity, app-level controls, and so on.). In a nutshell, by sharing VMs resources with other tenants, you’re sharing the redundant infrastructure costs with other tenants as well.
That’s the short version of IaaS vs. PaaS. The situation is much more complicated than sharing the underlying infrastructure costs, but you get the idea. Azure has some dedicated environments where you’re not getting reduced costs, but more control in those situations where you’re limited by ancient compliance and regulations controls that have not adapted yet adapted to the cloud.
What does this mean? This means that you’re still responsible for the deployment that you’re doing in the cloud, but you transfer a subset of problems to the cloud provider.
No more hardware swaps
No more hardware refresh
No more dead servers
No more waiting weeks for capacity -CPU, RAM, Storage
No more dealing with vendors – Except the cloud vendor.
No more datacenter associated costs -Power, Cooling, Staffing
These are the clear responsibilities that you transfer to the cloud provider, but unfortunately, the cloud doesn’t solve all the op-premises issues. You’re still going have to deal with the old issues and some new:
Cost Management (this one is fun)
Backup and Disaster Recovery
Traditional approach induced issues (Cloud is cloud, and on-prem is on-prem -don’t mix and match)
Inflexibility induced costs
Depending on the cloud model you pick, you’re going to have more or fewer issues that I’ve outlined. You might have seen two issues that raised your eyebrows a bit.
Traditional approach induced issues and Inflexibility induced costs.
The cloud comes with a plethora of services that can replace your bloated infrastructure systems. You don’t need firewall appliances, proxies, multi-hop IDPs, MFA appliances, and so on. You don’t need cascading network devices to have segregation of networks, and you certainly don’t need VLANs.
Azure has services for almost all of those things; You have Azure Firewall for your firewall needs, you have Azure Active Directory for your IDP, you have Azure MFA builtin for MFA. You have VNET peering for network segregation; you have NSGs for stateful firewall-ing at a VM or subnet level. The list can go on.
By adopting an on-premises approach to the cloud, you will inevitably have more on your plate than you signed up. Keep it simple, adapt your technologies, and sleep well at night.
Second, Inflexibility induced costs; The cloud provides you an enormous amount of flexibility. You don’t need to overcommit capacity; you can scale up/down right or left as much as you want when you want. You can automate your workloads to be resource-efficient based on usage, e.g., Scale down your workloads during the weekends.
With PaaS, you can do all of that. With IaaS, up to a point, you cannot. If you’re adamant about IaaS, then you’re going to pay the price. You don’t need a VM to run a web application, you have App Services for that, you don’t need an NGFW to manage your network, you have Azure Firewall for that, and you absolutely don’t need a file server cluster to manage files, you have Azure Storage for that.
Don’t get me wrong; I understand that there are situations where the application is so old that you cannot adapt it to the cloud. I’ve seen a lot of applications fail horribly just because you’ve put them on different VMs if you want to benefit from the cloud and not pay hand over fist.
What does everything translate to?
Moving to the cloud is not a walk in the park; it’s a lengthy and complicated project. The complexity is directly proportionate with the amount of luggage you’re bringing.
What can I do to achieve a successful digital transformation?
Digital transformation comes in the cloud has three parts:
The cloud comes with some challenges:
Identity & Data
Data Classification, Labeling, and Protection
Multi-geo deployments, and GDPR
OPEX vs CAPEX
Different payment possibilities -PAYG, CSP, EA, MCA
All of those challenges can be overcome with understanding and proper planning. Not all deployments are cloud worthy; There’s no such thing as shutting down existing data centers and moving everything in the cloud. Hybrid is the way to go with existing deployments.
For example, if you already have Active Directory then moving to the cloud is only done via the Hybrid Approach where you synchronize your identities to Azure Active Directory and gain the benefits from both worlds.
Applying governance principles to Azure subscriptions
Everybody starts from scratch. You create one subscription and start moving workloads in it.
That’s wrong at many levels because you lose control of your cloud environment and starting asking questions like:
Who created that resource?
How much does that application cost?
Who is the owner of that environment?
Who is the owner of that resource group?
Why does our cloud cost so much?
What caused that security breach?
Where is our data?
Who can see that data?
Application is down, what changed?
The list can go on for ages if you don’t do proper governance. Here’s a list of good practices you can use to improve your governance state in Azure:
Infrastructure as Code
The last one is a bit tricky to implement for existing services but it can be done. You can leverage Azure Resource Manager templates or Terraform to get a desired state configuration in your cloud environment.
Let’s start off with the most important in achieving governance.
Azure Tags are key-value pairs that allow you to organize your resources logically. Plain and simple, they don’t add any technical value to the proposition but in the big picture, tags can give you a lot of insight into the question storm that can popup. You can tag resources, resource groups, and subscriptions and after that, you can group them together in views. Tags go hand in hand with the Cost Management feature of Azure where you can generate a report of how much did that specific tag cost.
Azure Policies allow you to enforce a set of rules or effects over your resources that exist in a specific subscription. With Azure Policies, you can deploy the Log Analytics agent to all the resources that support it and keep that retain that state. Something changed and it’s not compliant anymore? The system automatically redeploys the policy.
This service allows you to do keep your defined governance state across all resources, regardless of state. If the policy cannot apply then it will show that resource as noncompliant. An example of starting off with Azure Policies is to setup resource tagging by owner and cost center. You allow deployments only if they have the owner and cost-center tags. If they don’t, the deployment fails with an error.
Management groups and multiple subscriptions go hand in hand. Having one subscription for everything just complicates everything but having multiple subscriptions without any type of management is worse. So the Management Groupsoffering that’s available in Azure can help you group together multiple subscriptions based on cost centers, departments, applications, you name it.
Management groups allow you to set up RBAC, Tagging, Policies at scale. You group all your subscriptions under a management group and they inherit all the settings from above. Even more, you can nest Management Groups together so you can apply settings granularly.
Let’s say that you need to keep data inside the European Union and want to minimize the risk of having data outside it. You can set up a policy on a top tier management group to only allow deployments in West Europe and North Europe regions. This setting would propagate down the stack to everything and from that point, nobody can deploy resources outside the specified regions.
This pretty much covers the basics; Is it enough? Not even close. Achieving full governance is a long-running task and you have to keep going to maintain it.
Microsoft published a framework to help businesses get a sense of how to approach this situation and it’s called the Cloud Adoption Framework for Azure which is a good starting point and you should use it as a guideline.
What about security? What are my options?
When it comes to security, you have multiple possibilities for a secure infrastructure as long as you leverage it. Azure puts forth a lot of capabilities to properly secure your cloud environment and you don’t have to install or configure over-complicated systems.
The image above doesn’t go into much detail when it comes to what security offerings we should use but it tells us to leverage the security systems that make sense and leverage the intelligence provided by the cloud.
The list of security services in Azure is pretty exhaustive so I will just mention the absolute necessary ones:
I suggest checking the link above to see all the services that are available so you can get a sense of how you can handle your current or future cloud deployment (the list is old and some services do not show up there). I won’t cover all the services in this post as not all of them are mandatory for a successful deployment.
Let’s start with the mandatory security offerings
Azure Security Center is the be-all and end-all of security monitoring in your Azure environment. It offers you a birds-eye view of your security state based on a set of policies and recommendations that come out of the box. It comes in two tiers, Free and Standard. While the free tier works up to a point for non-critical, non-production subscriptions, the standard tier is what is the best to enable for critical, production subscriptions.
Out of the box, the Standard tier offers Just-In-Time feature access for virtual machines where you can block management ports by default and automatically allow access to them by a request in the portal or CLI and Adaptive Application Controls which allows you to specify what applications should be running on the virtual machines in scope.
Azure Security Center also has the possibility of throwing alerts and even gives you the possibility of automating them. The alerting system pulls data from the Intelligent Security Graph which is a huge benefit out of the box. This means that anybody that gets attacked in the Microsoft Cloud (Azure, Office 365), all the data of that attack is going back in the graph and you can get alerted if anything similar happens to your workloads.
Azure Sentinel is Microsoft’s approach to a cloud-native SIEM and SOAR. It’s a query-based, AI & ML powered system that collects data from Office 365, Azure AD, Azure ATP, Microsoft Cloud App Security, WDATP, Azure Security Center and third party systems. It’s connector based so you need to enable those connectors for the systems that you want to monitor.
Sentinel runs off a Log Analytics workspace and it’s RBAC capable. The recommendation here would be to have a centralized monitoring sentinel workspace where you set up dashboards and alerting rules. Being a query-based system like Splunk, you have the possibility of converting Splunk type queries to Azure Sentinel queries and not start entirely from scratch.
Azure Privileged Identity management is a system based on Azure Active Directory which as the name suggests; The system manages identities, privileged ones. Historically, most attacks happen from inside meaning that having more access than necessary can be a cause of concern.
PIM works on Azure Active Directory and on Azure Resources. You have the possibility to convert the permanent roles that are assigned in your organization to eligible roles and from there nobody has any more access unless granted (via self-service or approval). On the AAD side, it applies to all AAD roles and on the Azure side, it applies to Management Groups, subscriptions, resource groups or resources and it supports custom RBAC roles as well.
PIM can co-exist with standing privileges so your rollout can be slow with a determined scope.
This has been a very long post which might have been a good candidate for multiple parts. I for one don’t like doing that as I lose my trail of thought and go off the trail I started on.
The main takeaway of this post is that Digital Transformation, Modern Workplace, and all the other buzzwords cannot be implemented without planning, openness and time. Emphasis on time, a project of this scale can take 18-24 months for a large enterprise. It takes that much because there are many procedures and policies that need to change for it to be a success and not a crash & burn.
My recommendation would be to start small, identity the environment, set up auditing policies, tag everything and then move to lockdown.
When they first came as an offering a long time ago, Azure SQL Databases always had a mystery attached to them regarding the performance tier you should use.
Database Throughput Unit or DTU is a way to describe the relative capacity of a performance SKU of Basic, Standard, and Premium databases. DTUs are based on a measure of CPU, memory, I/O reads, and writes. When you want to increase the “power” a database, you just increase the number of DTUs that are allocated to that database. A 100 DTU DB is much more powerful than a 50 DTU one.
Table of DTU SKUs
Development and production
Development and production
Development and production
Low, Medium, High
IO throughput (approximate)
1-5 IOPS per DTU
1-5 IOPS per DTU
25 IOPS per DTU
IO latency (approximate)
5 ms (read), 10 ms (write)
5 ms (read), 10 ms (write)
2 ms (read/write)
You can say that the DTU model is a great solution for people who want a preconfigured pool of resources out of the box for their workloads. The problem that can appear with the DTU model is that when you hit the DTU limit, you will get throttled which will result in query timeouts or slowdowns for which the solution is to increase the number of DTUs.
The concept that DTUs present is that when you need to increase the number of resources allocated to that database, you increase the number of DTUs but one issue is is that you don’t have the possibility of individually scaling the CPU / Storage / RAM.
This database model has a more classical approach to let’s say on-premises workloads. This mapping allows you to specify the number of cores, RAM, and I/O. So compared to the DTU model where you increase the CPU, RAM, and I/O automatically; In the vCore model, you have the possibility of doing it individually which allows you to have a lot of flexibility.
Scaling up and down in the vCore model is done on two planes with different CPU specs based on generation or VM model:
CPU plane – Generation specific
Storage plane – Generation specific -> vCore specific
– Intel E5-2673 v3 (Haswell) 2.4 GHz processors – Provision up to 24 vCores (1 vCore = 1 physical core)
– 7 GB per vCore – Provision up to 168 GB
Provisioned compute – Intel E5-2673 v4 (Broadwell) 2.3-GHz and Intel SP-8160 (Skylake)* processors – Provision up to 80 vCores (1 vCore = 1 hyper-thread)
Serverless compute – Intel E5-2673 v4 (Broadwell) 2.3-GHz and Intel SP-8160 (Skylake)* processors – Auto-scale up to 16 vCores (1 vCore = 1 hyper-thread)
Provisioned compute – 5.1 GB per vCore – Provision up to 408 GB
Serverless compute – Auto-scale up to 24 GB per vCore – Auto-scale up to 48 GB max
– Intel Xeon Platinum 8168 (SkyLake) processors – Featuring a sustained all core turbo clock speed of 3.4 GHz and a maximum single-core turbo clock speed of 3.7 GHz. – Provision 72 vCores (1 vCore = 1 hyper-thread)
As you can see, depending on the generation, you will get a specific CPU model per generation or VM series and the RAM allocation is done per vCore.
Choosing a generation can look complicated but you’re not locked into a choice. So if you decide post-deployment that a different generation or VM type works better for you, then that option is available.
DTU vs vCores
Now that we understand the difference between DTUs and vCores, let’s try and compare them.
Basic, Standard, Premium
Gen 4, Gen 5, Fsv2, M
Compute, Memory + Storage
DTU + Backup Storage
vCore, Storage, Backup Storage + Logs Storage
As you can see from the table there’s a hefty difference in specs from the DTU and vCore model and, after carefully analyzing the options available you might be inclined to go directly with the vCore model rather than the DTU but the difference is in the details.
One question that you might have would be “How many DTUs are equivalent to a vCore?” Which I can safely say that a generic mapping would be:
100 DTUs Standard = 1 vCore – General Purpose
125 DTUs Premium = 1 vCore – Business Critical
8000 DTUs -> 80 vCores but the maximum amount of DTUs pe SQL DB is 4000 🙂
Anything less than 100 DTUs would mean that you’re using less than a vCPU, more like a shared core but testing would be required to find the sweet spot.
Another benefit of the vCore model is that you can reserve the capacity in advance for 1/3 years and you get a better price, plus if you already have an on-premises SQL license with Software Assurance then you can activate the Hybrid benefits checkbox and get even more bang for your buck as you would get from an Azure VM.
So should you move to vCores?
The answer to this question is “depends“. While the vCore model looks more appealing from a traditional approach perspective but the real cost-benefit starts showing from 400 DTUs and up. If your workloads use less than 400 DTUs (roughly 4 vCores) then I would stick with the DTU model and when the time comes then I would just press a button in the portal and migrate to the vCore model.
Besides the tiers I mentioned above there are two other tiers called Serverless and HyperScale which have some benefits in some use cases and not all of them.
In the end, what I can say is that DTUs are not yet ready to be replaced by vCore but I’m expecting this as the next step. Until we get a clear alternative to DTUs, they are here to stay 🙂
KEDA (Kubernetes-based Event-Driven Autoscaling) is an opensource project built by Microsoft in collaboration with Red Hat, which provides event-driven autoscaling to containers running on an AKS (Azure Kubernetes Service), EKS ( Elastic Kubernetes Service), GKE (Google Kubernetes Engine) or on-premises Kubernetes clusters 😉
KEDA allows for fine-grained autoscaling (including to/from zero) for event-driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition. Most of the time, we scale systems (manually or automatically) using some metrics that get triggered.
For example, if CPU > 60% for 5 minutes, scale our app service out to a second instance. By the time we’ve raised the trigger and completed the scale-out, the burst of traffic/events has passed. KEDA, on the other hand, exposes rich events data like Azure Message Queue length to the horizontal pod auto-scaler so that it can manage the scale-out for us. Once one or more pods have been deployed to meet the event demands, events (or messages) can be consumed directly from the source, which in our example is the Azure Queue.
Getting started with KEDA
Azure Container Registry
To follow this example without installing anything locally, you can load up the Azure Cloud Shell or you can install azcli, kubectl, helm and the Azure Functions SDK.
I will do everything in the cloud with monitoring and running other commands with Lens | Kubernetes IDE from my workstation.
Now open up your Azure Cloud Shell, rollup your sleeves and let’s get started:
Be attentive that there are <> brackets in the code. Do not copy-paste. Adjust the code for your deployment
#Login to your AKS cluster
az aks get-credentials-n<aksClusterName>-g<resourceGroupofSaidCluster>
Once the PowerShell script started running, we get the first messages in the queue
With kubectl in wait mode I start observing the fireworks happening
Looks but can you expand more on this? How can I actually use this solution?
This solution basically exists to have the Azure Functions product everywhere. At this point in time, you can run Azure Functions wherever you want for free. You are not obligated to run functions in Azure, you can run them in Google or AWS if you want. Want on-premises? Go crazy, it works as long as you’re running Kubernetes.
The main idea is to bring Functions as a Service (FaaS) on any system and that’s very cool. While preparing for this post, I took a running system that uses Bot Framework, Service Bus and Azure Functions and ported the Azure Function part to KEDA. Zero downtime, no difference, it just works. The chatbot adds messages in a service bus and then the function triggers. I ran them in parallel for five minutes then I took down the Azure Function one.
The main point of using KEDA is to be cloud-agnostic. You know Kubernetes, you know Azure Functions, you don’t need to change anything. Hell, if you’re in those regulated environments where the cloud is a big no-no then you can just as well integrate KEDA in your clusters and then set up triggers with whatever messaging system you have or even just set up plain HTTP triggers and be done with it. Once the regulatory part relaxes and the company starts to dips it’s toes in the water, you can just as well copy-paste the code in Azure 🙂
This month Proximity Placement Groups have been announced as a public preview offer and this post is here to tell you about them.
For a long time, we’ve been using availability sets to bring our applications as close as possible to ensure the lowest latency possible however this scenario couldn’t always be achieved because the cloud is shared with multiple customers and you’re deploying resources in tandem with other people. This means that if you’re looking to run a latency-sensitive application in Azure then Availability Sets or Availability Zones are not always the answer.
Proximity Placement Groups are here to remove that latency obstacle for your deployments. They introduce a new concept of co-location where all your deployments in a PPG will always be constrained to the same datacenter and as close as possible. This was a thing that could be achieved easily with Availability Sets but as Azure grows, data centers grow apart and network latency increases.
Before you start using Proximity Placement Groups, take note that with lower latency, you’re restricting your VM placement and that will cause a lot of deployment issues. You will get more frequent deployment failures because you’re limiting Azure where to deploy your VMs, with lower latency comes less capacity.
Getting started with PPG is not that simple for the GUI folks because Portal support is not added yet and you have to revert to the old arm template 🙂