Azure Labs – What is it and use cases

As a trainer, I always have a set of prerequisites when I’m about to deliver a training. Usually those prerequisites are sent weeks in advance but most of the times if not all, the participants never have them installed. What I have in my back pocket is an ARM template with two-three predefined images which I mass deploy before a training and provide access to the participants so we prevent this hassle. 

The reality is that having this approach is complicated. My images are created with Packer within a an Azure DevOps pipeline and while it’s all fun and geeky to do everything by yourself, you don’t always have time to update the packages, you forget VMs running and so on. 

I was stoked when Microsoft came out with a new feature in Azure called Lab Services which opened up the possibility of doing everything I just mentioned, in a simple, secure setting. 

This feature / offering is similar to DevTest labs but it provides a new portal where the Lab creators and Lab participants  can open without much hassle. 

So how can we use it? 

Creating and using the Azure Lab Service is pretty simple as shown below: 

Creating a Lab services account is pretty simple, you go to the Azure Portal, type in Lab Services and create it in a resource group.  

After you created the lab, your next step is to add yourself and or other people as the Lab Creator RBAC role via the IAM blade because even if you’re owner, you will not be able to use the labs. Once that’s done, you can proceed to

On a first look, the lab portal is pretty simple. If it’s newly created, you will be prompted to create a new lab.

Step by step process

If you want to create a new lab, go to the new lab icon in the upper left corner, type in a name and set the maximum number of VMs per lab. Don’t worry the number you set there is not permanent and you can change it later if required.

After you press save, you will be presented with the next screen where you can select what virtual machine you will want to use for your template. You have a number of virtual machines presented in that list but if you want to expand that list, you have to go to the Azure portal on the Labs resource and select the Marketplace images from the policies tab where you have the option of enabling other type of images.

Once you select the image that you want and press next, you will be prompted with the next screen where you will input the username and password for the template VM and all the VMs that will be created after it.

After you press create, the template will be created and you’re going to have to wait a while for it to be completed πŸ™‚

Next up is the configuration phase where you will connect to the VM, do your configuration and then complete the lab configuration.

Next screen is a review screen where you can either publish the lab or save for later.

The publishing phase takes a while so this is the time to get a donut or hit that Netflix show πŸ™‚

How does it look like?

Once the lab is done, it will pop up in the main screen where if you’re a lab creator, you will have the option of customizing some settings for the lab like:

  • Re-configuring the template
  • Republishing the lab
  • Set up on/off schedule for the VMs
  • Configure a restricted user list for the labs or make it public if they have the registration link.


One of the minor caveats of the solution is that the participants require to log in using either an MSA or a work account. I call it minor because most of the times, the participants have an MSA or work account, but there are times when you’re doing public hands-on labs, workshop settings and others where you cannot expect that all of the participants have that.

The solution to this problem is Azure B2C. You create an Azure B2C tenant, link it to your Azure Subscription and create B2C accounts and add them to the lab services. That’s the best solution out there for these kinds of cases because you don’t deal with e-mail accounts and any other PII information and second, you have complete control over the user accounts.

Another issue that I found is that if you’re Lab Creator owner with the same account on multiple labs, it will not prompt you which lab you want so waiting for a fix on that.

For the final notes, this is an excellent offering for me as I will be using it heavily for my training session or workshops.

Azure DevOps and “VSProfessional” licenses

Azure DevOps and “VSProfessional” licenses

This is something I encountered at a client and I figured that I should write it here because it took a while to find the solution and the only answer came via a support ticket to MS. 

A while back when Azure DevOps was called VSTS or Visual Studio Online, you had the possibility to link the tenant to your Azure subscription for billing purposes. This thing allowed you to purchase basic use right licenses to the platform and it even allowed you to purchase Visual Studio Professional licenses which allowed you to license the user VS Pro installations via the platform. 

Azure view

The problem that I faced with this customer was that he was in this position and suddenly they started facing issues with the VS Pro licenses starting to expire and not working anymore. We tried figuring out what was the problem and why it didn’t work but unfortunately we hit a dead end and had to open a support ticket so we can get some assistance while in parallel we were investigating. 

We knew that Visual Studio monthly licenses were located in the marketplace – – but we didn’t understand the correlation between one and another. 

On a hunch we purchased a few VS Pro Monthly licenses for some users to test out a theory and the lucky part was it worked but we didn’t have an answer as to why the issue existed. 

The answer came from the support person on MS end which provided an awesome explanation as to why the problem existed and how to basically fix it. 

The problem was that licensing users via the Azure Portal was deprecated a while ago and MS didn’t have a solution for seamless migrations to the new licensing model, so they allowed it to work for existing customers while they removed the capability from the portal. 

The licenses that appeared on the billing invoice were called “VSPRO – Monthly” which coincidentally matches the name with the VS Pro licenses from the marketplace. The reality was that the licenses that you could get from the Azure Portal were “Professional” Licenses which were tied to the old VSOnline model and it was allowed to work in parallel until it died by itself. 

Basically the old Professional license allowed you to run Visual Studio Professional and be a licensed user in VSTS / Azure DevOps but being deprecated, updates or newer versions of Visual Studio (starting from 2017 and going to 2019) simply started not being able to parse that licensing info assigned to the work account for the user and the instances ended up in an Extended Trial mode. 

The solution to this problem was to simply purchase the licenses from the marketplace, assigned them to all the “Professional” and after a day or two just remove the offering from the Azure Portal. 

After doing the whole operation, everything licensed correctly and the issue was solved.

Signing out. Have a good one!

Azure – Application Security Groups

Azure – Application Security Groups

Security is not something to kid about and when it comes to cloud, you have to be very through when you’re deploying your cloud infrastructure. Which means that you are still required to do defense in depth, use anti-malware systems, configure extended monitoring, logging and reporting mechanisms. When you’re going to the cloud, you have to be aware of the Shared Responsibility Matrix which applies to any cloud provider.

As you can see in the image above, you as a customer still have a responsibility to secure your cloud environment. So those skills you’ve developed while working on-premises will still be of value in the cloud.

The subject for today’s topic is managing Network Security Groups using a feature in Azure, called Application Security Groups.

What are they?

Application Security Groups are a mechanism to group virtual machines that reside in the same virtual network and apply Network Security Groups to them.

The way you deployed NSGs in Azure subscription was that you would assign them to a network interface or a subnet and then configure them in a granular manner, based on the deployment type. The reality was that it’s utopic to do this cleanly and it gets messy after a couple of months. So ASGs came to the rescue where it helped you group a set of VMs based on roles like Web, DB, Middleware etc. and apply NSG Allow / Deny rules on them.

By using an ASG, you simply your management overhead by just adding the VMs that you create in those groups and automatically you get the security policies applied from your NSG.

ASG Example – Source Ignite

Getting Started

Creating / using Application Security Groups is easy. Go to the Azure Portal -> Create a resource -> Type in Application Security Group and press create.

Or you can simply use Powershell

After you’ve created the ASG, the next thing that you need to do is to assign it to some VMs, which can be done via the Portal or PS.

Next step is to add or modify your inbound / outbound rules to use those new ASGs you’ve created. Doing that is very simple and you can also do it via the portal or CLI.

As you can see, it’s pretty easy to secure your VMs, and considering that it can become a pain to manage the NSGs even for simple deployments. I’m not even talking about very complex ARM deployments which deploy tens of VMs and link them together πŸ™‚

AKS – Working with GPUs

AKS – Working with GPUs

I’ve discussed about AKS before but recently I have been doing a lot of production deployments of AKS, and the recent deployment I’ve done was with Nvidia GPUs. 

This blog post will take you through my learnings after dealing with a deploying of this type because boy some things are not that simple as they look. 

The first problems come after deploying the cluster. Most of the times if not all, the NVIDIA driver doesn’t get installed and you cannot deploy any type of GPU constrained resources. The solution is to basically install an NVIDIA daemon and go from there but that also depends on the AKS version.

For example, if your AKS is running version 1.10 or 1.11 then the NVIDIA Daemon plugin must be 1.10 or 1.11 or anything that matches your version located here

The code snip from above creates a DaemonSet that installs the NVIDIA driver on all the nodes that are provisioned in your cluster. So for three nodes, you will have 3 Nvidia pods.

The problem that can appear is when you upgrade your cluster. You go to Azure and upgrade the cluster and guess what, you forgot to update the yaml file and everything that relies on those GPUs dies on you.

The best example I can give is the TensorFlow Serving container which crashed with a very “informative” error that the Nvidia version was wrong.

Other problems that appear is monitoring. How can I monitor GPU usage? What tools should I use?

Here you have a good solution which can be deployed via Helm. If you do a helm search for prometheus-operator you will find the best solution to monitor your cluster and your GPU πŸ™‚

The prometheus-operator chart comes with Prometheus, Grafana and Alertmanager but out of the box you will not get the GPU metrics that are required for monitoring because of an error in the Helm chart which sets the cAdvisor metrics with https, the solution would be to modify the exporter HTTPS to false.

And import the dashboard required to monitor your GPUs which you can find here: and set it up as a configmap.

In most cases, you will want to monitor your cluster from outside and for that you will need to install / upgrade the prometheus-operator chart with the grafana.ingress.enabled value as true and grafana.ingress.hosts={domain.tld}

Next in line, you have to deploy your actual containers that use the GPU. As a rule, a container cannot use a part of a GPU but only the whole GPU so thread carefully when you’re deploying your cluster because you can only scale horizontally as of now.

When you’re defining the POD, add in the container spec the following snip below:

End result would look like this deployment:

What happens if everything blows up and nothing is working?

In some rare cases, the Nvidia driver may blow up your data nodes. Yes that happened to me and needed to solved it.

The manifestation looks like this. The ingress controller works randomly, cluster resources show as evicted. The nvidia device restarts frequently and your GPU containers are stuck in pending.

The way to fix it is first by deleting the evicted / error status pods by running this command:

And then restart all the data nodes from Azure. You can find them in the Resource Group called MC_<ClusterRG><ClusterName><Region>

That being said, it’s fun times to run AKS in production πŸ™‚

Signing out.

Azure Firewall – What is it and how to use it

Azure Firewall – What is it and how to use it

There’s no shortage of solutions when it comes to NGFW in the cloud but they all come at a hefty price, steep learning curve and require continuous maintenance from the ops teams. We have solutions from Barracuda, Fortigate, Checkpoint, Cisco and so on but in the end, they are some Linux Virtual Machines that have some third party software on them with or without built-in HA. Azure Firewall is here to provide another solution that can solve some of these issues that come from NVAs deployed in the cloud…but not all of them.

Let’s start off with what Azure Firewall can do and what it can not do at this moment:

Azure Firewall is:

  • A stateful firewall as a service
  • Has built-in high availability
  • Can do FQDN filtering
  • It has support for FQDN tags – At the time of writing we have support for Windows Update, ASE and Azure Backup
  • You can add network traffic filtering rules
  • Has outbound SNAT support
  • Has inbound DNAT support
  • You can centrally create, enforce, and log application and network connectivity policies across Azure subscriptions and VNETs

Azure Firewall is NOT:

  • An Intrusion Prevention System (IPS)
  • An Intrusion Detection System (IDS)

If you compare Azure Firewall with any NGFW solution from the marketplace you will see that it lacks a lot of features and might not appear to solve any of today’s current issues but stay a while and listen πŸ™‚

Think of this. The current third-party firewalls started from the on-premises environment as physical appliances and then got slowly evolved towards virtual appliances, so most (not all) of them have features that are useless in the cloud (and you pay for them). Another thing is that you have to manage them end to end and even back them up. They are not a managed service that you licenses from a provider and just consume the service, it’s a full-blown IaaS machine and the list can go on.

What is Azure Firewall for?

Azure Firewall is a cloud-native stateful firewalling service that is not deployed as a VM. It’s a fully managed security service by Microsoft that scales automatically and requires no maintenance from the user (hence the fully managed part), and the only thing that you need to do is to configure it correctly.

At the time of writing this post, Azure Firewall blocks all inbound/outbound traffic with the possibility allow IP addresses, FQDNs or CIDR blocks and it deploys a UDR in the VNET it creates to redirect the 0/0 traffic through it, just like an NVA and it also plugs into Azure Monitor and I suspect that it will plug into Traffic Analytics and ASC because it makes sense on the long term.

Deploying an Azure Firewall is pretty simple and it doesn’t require too much configuration and a reference architecture looks something like this:

Azure Firewall Ref Architecture ; Source MS Docs

The best-practices around Azure Firewall show that it should be configured in a hub & spoke architecture where you deploy your core / shared services and have spokes that connect through them. The main reason for this is that the entry price is 780 EUR per scaling unit. The way I see it is that in combination with NSGs, App Gateway WAF and other services like DDOS Protection Standard would add more value to the enterprise client than anything else.

Ref Architecture; MS Ignite

Finally I would like to add that from my point of view, Azure Firewall is still a work in progress but a very welcome addition to the cloud security offering that Microsoft adds in Azure.

Resetting RHEL Root PW with Azure Serial Console

Oh Snap.

Did this problem ever happen to you? If yes, then you know that the way to solving this issue is by booting the distro into the Single User mode. But how do you do that in Azure? Well Serial Console to the rescue!

Usually this is easily solvable using the Run Command or by using the Reset Password blade but in this case imagine that they don’t work. This is the case of the SAP deployment using the RHEL VMs. You cannot do anything if you’ve lost access and if the VM crashes it’s even worse.

Nope, No SYSRQ for you.

You need to get to grub so you can boot the VM in single user mode. The problem here is that the VM is very fast for the serial console to connect and press the ESC button in the magic moment.

So what can you do?

The solution to that problem is to stop the VM without de-allocating it. This means that the VM on the Hyper-V server in the backend is not deleted but preserved. This means that you can have the serial console in standby to have a chance at that magic moment. How do you know that? Check figs 1 and 2.

Fig.1 This is where you have to be.
Fig.2. If you’re here, repeat the first step.

Once you’ve gotten to the screens that the VM is starting, this is what you need to watch for and then mash the ESC button:

Once you’ve managed to enter GRUB, you’re home free to reset the password using the steps below Press e in the Serial Console to edit the first OS line.

  • Go to the kernel line which starts with linux16
  • Add rd.break to the end of the line which will break the boot cycle. If selinux is enabled then add rd.break enforcing=0
  • Exit GRUB and reboot with the rd.break command saved by pressing ctrl x
  • During this reboot, the VM will go into the Emergency Mode where you have to mount the systemroot using the “mount -o remount,rw /sysroot” command.
  • This will boot you in single user mode, where you will have to type in chroot /sysroot to switch into the sysroot jail and then reset the password for the root user with passwd
  • Edit the sshd_config file “nano /etc/ssh/sshd_config” using your preferred editor so you enable root access using the Serial Console by setting PermitRootLogin yes
  • Once you’re done, reboot the VM and you’ve gotten root access.

GIF from Azure Docs – Grub editing representation

After you’re done resetting all the passwords, installing all the agents so you’re not confronted with this again, set PermitRootLogin no and you’re golden πŸ™‚

Have a good one!

Azure Serial Console – What is it

Azure Serial Console – What is it

For a long time Azure had a feature that permitted the users to see what was happening when the VM was booting which allowed them to do root cause analyses for when a VM crashed and stopped booting or any other issues that could occur in the boot process. This feature is called boot diagnostics which takes screenshots of your VM console and serial output so you can do your debugging. 

The problem was that you had the information, you knew what happened and knew exactly what to do to fix the issue but the only way you could apply any fix was to download the VHD, boot the VM in Hyper-V, apply the fix and then re-upload the VM back to Azure and continue. While you might say that you should have had backups and just do a simple restore; This is something that’s not always possible. 

Microsoft came up with a solution to this problem with the feature called Azure Serial Console which provides you with a text based console via COM1 that allows to you to run simple diagnostic operations or start a Bash / PowerShell session and get on working and the only thing that you need to do for it to simply work, is just to have boot diagnostics enabled on the VM. 

You might ask yourself, why did Microsoft take so much time to develop this while others had it?  The answer to the question was security. Other were using the NPAPI API to tunnel the traffic to the VM which was deprecated in all the major browsers. The problem was that in a hyper-scale environment is that you share the underlying infrastructure with others and a feature like this could be used to siphon data from one VM or all of them for that matter. Basically Microsoft solved this problem by developing a new secure way that tunnel the COM1 traffic to the specific user interface via the Hyper-V VMBus so that you have access to the VM that you own and not others. 

How to use it? 

First of all, the VM must have boot diagnostics on. If it’s not enabled then Serial Console will not work:

Then you need to have contributor rights to the storage account (where you enabled the boot diagnostics) and the VM.

Wherever it is a Linux or Windows VM, simply just go to the portal and press on the Azure Serial Console from the VM blade in the Support + Troubleshooting section

This pops up a screen which shows the dmesg output if it’s a Linux VM or some VM Health reports if it’s Windows. When you press the Enter key it will push you to a login screen where you will need to provide the admin credentials to login to the console. 

For a Windows machine, this process a bit different because Windows by default doesn’t send output to the COM1 port and Microsoft had to develop a Special Administrative Console (SAC for short).  

The SAC allows you do to simple RCA steps and if needed you can pop up a PowerShell console and rock on fixing it! 

The login experience is the same as with a Putty session. The nice part is that you have the option of sending NMIs or other SysRq commands πŸ™‚

When you get to the SAC> channel, you will need to perform the following to open up a PS Command:

If you type in help and / or ch -h , you will get a list of help items that will allow you to navigate throw the console:

As you can see, in SAC you have some useful commands for Windows and if you want to start a PS session, input the commands as shown below:

Cool huh? This is a great step forward when it comes to debugging virtual machines because in the past, you only had one way to do that and I mentioned it above. Is it perfect? No, there are some quirks to it but it’s better than nothing πŸ™‚

That being said, have a good one!

Azure Monitor – Monitoring your AZ cloud – Overview

Azure Monitor – Monitoring your AZ cloud – Overview

Monitoring your subscription(s) is something very important when it comes to the cloud. Whether you’re looking at the performance, health or cost of the subscription.

In today’s blog we will go through Azure Monitor which allows you to monitor everything that you have deployed in your Azure Subscription at an infrastructure level and even application level by using Application Insights.

Azure monitor can pull data from multiple sources, not only the Azure cloud. This data is then aggregated and can be queried using the Kusto query language which can be complicated at first but you get used to it. I highly recommend this Pluralsight course which can introduce you to Kusto –

What can Azure Monitor pull?

  • AAD tenant: Any data that’s generated in your Azure Active Directory tenant can be pulled into Azure Monitor.
  • Azure Resources: Data about the operation of an Azure resource.
  • Apps: This ties in with Application Insights which can pull data from any platform or programing language.
  • Guest OS: This type of data is generated by the monitoring agent that’s installed inside a VM. This is cloud-agnostic and can work in any cloud or on-premises
  • Subscription(s): Service Health Events, Planned Maintenance, etc.
  • Containers: AKS node, cluster, container events.
  • Security Events: Azure Security center <3

Data in Azure monitor is separated in two categories:

  • Metrics
  • Logs

Based on what you’re looking for, you can set up monitoring dashboards to check up performance metrics on your resources or you can set up other types of dashboards that query log data. For example I have a dashboard for monitoring my WordPress blog πŸ™‚

In the screenshot from above, you can see that I have a dashboard set up to monitor sessions and unique users. (Hence the privacy pop up from the web site) But you don’t stop here; You need data points in a 30 day time-frame? You can build a dashboard like that without a problem. Your imagination is the limit.

How can I query data?

You can query data by simply going to the Logs entry in the leftmost blade or by pressing on the Search Logs button in the Overview screen.

There you will be presented with a series of example queries which gives you a starting point and you also have a Query Explorer which gives you some useful general queries.

Those queries are populated from Log Analytics based on what imported solutions you have. If you don’t have anything then Log Management is the only one that will pop up.

That’s all with the overview. My next blog on Azure Monitor post will be more in-depth based on each solution.

Stay tuned, and have a good one!

Azure VM Disk Swap

If you’re coming from on-premises, you know that before you do any changes on a virtual machine (updates, upgrades, configuration changes, etc.), you do a snapshot of the disk just in case of a failure. This is something we’ve all ignored up to a point until it bit us badly and we’ve then just ran snapshots for every change.

The problem in the cloud is that you do not have the same easy way to do VM snapshots and restores in case of an issue. This changed in Azure when they introduced the OS Disk swap feature which allows administrators to run snapshots of a virtual machine, do their thing and if anything goes wrong they just restore the checkpoint.

As I mentioned above, if anything goes wrong with a VM that you’re maintaining then you have a few options available depending on the situation. Before OS Disk swap existed, your only solution to fixing a broken VM was to either restore it from Azure Backup, download the VHD and hope to fix it from a Hyper-V machine, and the worst case was to redeploy it. Now you have the option of just doing a snapshot of the VM, do your stuff and if something happens just swap in the good disk πŸ™‚

How do I do it?

Swapping the OS Disks is a simple operation that can be done via PowerShell or CLI (no portal support yet). You need either the latest AzureRM PowerShell Module installed on your computer or just use the Azure Cloud Shell –

For this example, I will use my WorkStation VM in Azure located in the Workstation RG

Let’s start by setting the VM object in a variable:

Now it’s time to stop the VM in order to have a consistent snapshot. – You can do skip this step if you don’t want to stop the VM.

While the VM is running or stopped you can now proceed to snapshot the current OS Disk:

Caption of the Snapshot

Let’s assume that you utterly broke the VM, beyond repair so now you have to start the OS Disk Swap procedure. This part requires to stop the VM.

The Update command will start up the VM and you will have reverted a disaster πŸ™‚

Managed Disk from Snapshot
Swap completed successfully.

After you RDP / SSH to the VM and validate that everything is working as before, you can just take the old disk image and play around with it in Hyper-V instance so you can see what happened or just delete it.

Signing off. Have a good one!

Privacy Preference Center

    Pin It on Pinterest