Running Multi-Containers in Azure Web Apps

Running a container in the cloud is quite easy these days but how about multi-web app containers?

In this post, we will find out about the multi-container feature in Azure Web Apps and how we can leverage it.

Getting started with a multi-container in Azure Web Apps is quite easy as a matter of fact; The documentation is quite good as a starting point but when we want it to in production is where the real problems start.

CONTINUE READING
Azure Proximity Placement Groups

Azure Proximity Placement Groups

This month Proximity Placement Groups have been announced as a public preview offer and this post is here to tell you about them.

For a long time, we’ve been using availability sets to bring our applications as close as possible to ensure the lowest latency possible however this scenario couldn’t always be achieved because the cloud is shared with multiple customers and you’re deploying resources in tandem with other people. This means that if you’re looking to run a latency-sensitive application in Azure then Availability Sets or Availability Zones are not always the answer.

CONTINUE READING

Azure Bastion – Managed Jump Server

Ever heard of a jump server or bastion server? No? Well then this post is for you

Before we dive into what’s Azure Bastion, we should understand how things work right now with regular jump servers or bastion hosts.

A jump server/bastion host is a virtual machine/server which sits inside a network with the purpose to allow remote access to servers and services without adding public IPs to them, thus exposing them to the internet.

Access to that jump server can be granted in numerous ways but the most common are:

  • VPN Access
  • Public Endpoint with Access Control authentication e.g., Cloudflare Access rules
  • Public Endpoint with a Just In Time solution
  • Remote Desktop Gateway access with AD Authentication

The list can go on; The idea is that the endpoint that’s used to access the production network is as secure as possible because it’s being exposed in one way or another.

CONTINUE READING
Azure Service Fabric in production -Field notes

Azure Service Fabric in production -Field notes

I’ve been holding off this post for a while now to gather more information, to figure out how things can be done better and so on. This post is a culmination of experiences I’ve had with Service Fabric, how I solved them and hopefully solve your issues before you have a disaster on your hands.

Starting from the beginning. What is Service Fabric?

Service Fabric is a distributed systems platform that makes it easy to package, deploy, and manage scalable and reliable microservices and containers. Service Fabric also addresses the significant challenges in developing and managing cloud native applications. Developers and administrators can avoid complex infrastructure problems and focus on implementing mission-critical, demanding workloads that are scalable, reliable, and manageable. Service Fabric represents the next-generation platform for building and managing these enterprise-class, tier-1, cloud-scale applications running in containers.

Source: Azure docs

That being said, I have a small list of recommendations that should be enforced in practice so that you don’t repeat the mistakes I had to repair.

CONTINUE READING

Azure Storage Explorer – A hidden gem

Azure Storage Explorer is a tool built by Microsoft which allows you to securely access your blob storage accounts in Azure.

Many of you know that accessing Storage Accounts in Azure can be done using the portal which is an easy way to manage and maintain them but when it comes to copying data into or out of them the portal is not enough. When you want to mass copy files in a blob storage account you have to go back to the CLI and use the AZCopy tool.

AZCopy is a cmdline utility that permits you to copy files into or from your storage accounts. The problem with AZCopy is that it’s not built into the operating system and you need to install it externally in order to leverage it. My preferred method is using Chocolatey -> choco install azcopy

Lately, Microsoft added support for AzCopy in the latest build of Azure Storage Explorer

What is Azure Storage Explorer you might ask?

Azure Storage Explorer is a GUI tool that allows you graphically access your storage accounts without the need for the Azure Portal. It’s a very useful tool when you want to manage your storage accounts. The downside was that file copying in and out of a storage account was slow. I mean very slow and it wasn’t usable in that way. With the latest version of Azure Storage Explorer, Microsoft added AZCopy support into the tool and you can now leverage huge improvements in upload/download speeds. So now you can leverage the best of both worlds, without ever touching the CLI πŸ™‚

What are the performance benefits?

At the time of writing this post, I’m on a not so good connection so I will just borrow the examples from the example from the Azure blog.

Multi-File Upload -Source: Azure Blogs
Single file upload -Source: Azure Blogs

What else can I do with the Storage Explorer?

  • Login to storage accounts with Azure credentials, SAS or Accounts keys
  • You can create containers
  • You can drag & drop files to initiate the copy process
  • You can do blob snapshots
  • Change the access tier (hot, cold, archive)
  • Sync storage accounts as shown below

My personal favorite for Azure Storage Explorer is when I’m doing migrations towards Office 365 and I have to do PST uploads. For those that don’t know, when you need to upload pst files in Office 365, you are given a storage account with write access where you can upload your pst files and after that, you map them with a CSV file. For those savvy with the cmdline this is not a problem but for those lazy like me, it’s much simpler to just log in to that storage account with Storage Explorer and copy-paste the pst files.

That being said. I hope you found this post useful and as always, have a good one!

What is Network Watcher – Azure Cloud NetMonitoring on steroids

What is Network Watcher – Azure Cloud NetMonitoring on steroids

Network troubleshooting in the cloud was always a pain. Let’s talk about the Azure Network Watcher and what it can do for you.

Running workloads in the cloud can be very easy but when it comes to troubleshooting something you don’t have access to can prove to be quite a challenge.

As you may know, you’re not dealing with you’re regular on-premises network stack that you’re used to but you’re dealing with software-defined networking or SDN for short. This means that everything is virtualized and you don’t need to managed switches, routers or any other type of networking equipment. In the cloud, you manage virtual networks, subnets, IPs, VPN devices, network rules and so on but on a software level.

While everything is nice and fun with SDN, you will encounter in Azure most of the networking problems that you encounter on-premises. Emphasis on most. You will not deal with hardware problems, VLANs, STP and so on but you will deal with firewall rules, routes, priorities, wrong topologies.

Issues that you might encounter in Azure networking:

  • Loss of network connection
  • VM cannot connect to a service
  • VPN Gateway is not connecting to the on-premises server
  • VMs across VNET Peering cannot connect
  • Everything is wrong; Nothing works πŸ™‚

Sounds familiar? It’s mostly the same as on-premises but you’re dealing with a different technology stack.

What are my options?

In Azure, there’s a nice piece of free technology called Network Watcher which allows you to do network debugging and figure out most of the time, where the network problem is. When I say most of the time, I’m saying that there are those cases where you do everything in your power and still cannot figure out where’s the issue.

Enabling Network Watcher:

This is the easy part; Go to All Services in Azure and type in Network Watcher then in the overview blade select the region where it should be enabled.

Network Watcher capabilities

With Network Watcher you have the following capabilities:

  • Network Topology
  • Connection Monitor
  • Network Performance Monitor
  • IP Flow verification
  • Next Hop validation
  • Effective NSG rules
  • VPN Troubleshooting
  • Packet Capturing
  • Connection Troubleshooting

Network Topology:

Network topology gives you a network diagram of actual Virtual network in scope. You select the subscription, RG and Virtual Network where you require the diagram and you get a network topology of what’s connected and how it’s connected.

This allows you to visually map how you’re network is deployed in Azure which means that you and obviously it helps when somebody requests a very detailed network diagram.

Connection Monitor:

Connection monitor allows you to set up a continuous endpoint monitor which gives you metrics about the connections over a period of time.

To set up a connection monitor, you need to press on Add and then specify what you need to monitor.

If you’re selecting an Azure VM as the source then the AzureNetworkWatcheExtesion will be installed to that VM. One thing to watch out when you’re specifying an Azure VM in the source pane is that you will be able only to select Azure VMs that are part of the same VNET. Not a peered network or anything else. The workaround to this problem would be to just specify an IP address and that’s it.

Once you start the connection monitor, you will get prompted with some historical data in graph and data from connection metrics and status. In the example from above, you can see that I set up a connection monitor from the Azure VM to 8.8.8.8 and you can see that the VM can communicate with Google DNS and the return time is 2 MS (1MS round-trip). If I would have had a problem I would have known when the problem happened and validate in the Azure Monitor logs what happened.

IP Flow Verify

IP Flow Verify lets you validate the configuration of your Network Security Group rules; It requires you to input five packet details (Protocol, Direction, SourceIP, SourcePort, DestinationIP and DestinationPort)

Once provided, Network Watcher|IP Flow Verify will do an NSG check to validate if the connection succeeds or fails. This basically validates your NSG configuration and not your VM firewall. If it succeeds or fails it’s going to tell you which rule was hit.

Next Hop

Next Hop is one of the simplest tools available in Network Watcher; It’s basically a tool that tells you which where the packet will go. It also gives you which route table is affecting the packet route.

Some examples of Next Hop are: VNET Peer, Internet, VNET or Network Appliance (NGFW)

Packet Capture

Packet Capture or Network Capture is the be all end all of the network troubleshooting. This tool allows you to very detailed captures of what’s happening on a target VM.

Before diving into this tool, it’s recommended to have a storage account ready for the captures because it’s much fast to get and distribute the data that’s being monitored. Otherwise, you have the option of saving the file locally on the VM but you’re not so flexible at that point.

When you’re specifying the details for the packet capture, you have the possibility of filtering out irrelevant traffic as shown in the picture above.

What about the others?

This article wants to show getting the basics of network monitoring in Azure. There’s Network Performance Monitor which is a whole beast itself and requires a hefty deployment to show off the possibilities. VPN Troubleshoot which gives you a log of what’s happening so you can debug VPN settings (on-premises ones mostly) and the logging part which will be another article as the scope is different there.

That being said. Thanks for reading and have a good one!

Azure Labs – What is it and use cases

As a trainer, I always have a set of prerequisites when I’m about to deliver a training. Usually those prerequisites are sent weeks in advance but most of the times if not all, the participants never have them installed. What I have in my back pocket is an ARM template with two-three predefined images which I mass deploy before a training and provide access to the participants so we prevent this hassle. 

The reality is that having this approach is complicated. My images are created with Packer within a an Azure DevOps pipeline and while it’s all fun and geeky to do everything by yourself, you don’t always have time to update the packages, you forget VMs running and so on. 

I was stoked when Microsoft came out with a new feature in Azure called Lab Services which opened up the possibility of doing everything I just mentioned, in a simple, secure setting. 

This feature / offering is similar to DevTest labs but it provides a new portal where the Lab creators and Lab participants  can open without much hassle. 

So how can we use it? 

Creating and using the Azure Lab Service is pretty simple as shown below: 

Creating a Lab services account is pretty simple, you go to the Azure Portal, type in Lab Services and create it in a resource group.  

After you created the lab, your next step is to add yourself and or other people as the Lab Creator RBAC role via the IAM blade because even if you’re owner, you will not be able to use the labs. Once that’s done, you can proceed to https://labs.azure.com

On a first look, the lab portal is pretty simple. If it’s newly created, you will be prompted to create a new lab.

Step by step process

If you want to create a new lab, go to the new lab icon in the upper left corner, type in a name and set the maximum number of VMs per lab. Don’t worry the number you set there is not permanent and you can change it later if required.

After you press save, you will be presented with the next screen where you can select what virtual machine you will want to use for your template. You have a number of virtual machines presented in that list but if you want to expand that list, you have to go to the Azure portal on the Labs resource and select the Marketplace images from the policies tab where you have the option of enabling other type of images.

Once you select the image that you want and press next, you will be prompted with the next screen where you will input the username and password for the template VM and all the VMs that will be created after it.

After you press create, the template will be created and you’re going to have to wait a while for it to be completed πŸ™‚

Next up is the configuration phase where you will connect to the VM, do your configuration and then complete the lab configuration.

Next screen is a review screen where you can either publish the lab or save for later.

The publishing phase takes a while so this is the time to get a donut or hit that Netflix show πŸ™‚

How does it look like?

Once the lab is done, it will pop up in the main screen where if you’re a lab creator, you will have the option of customizing some settings for the lab like:

  • Re-configuring the template
  • Republishing the lab
  • Set up on/off schedule for the VMs
  • Configure a restricted user list for the labs or make it public if they have the registration link.

Caveats 

One of the minor caveats of the solution is that the participants require to log in using either an MSA or a work account. I call it minor because most of the times, the participants have an MSA or work account, but there are times when you’re doing public hands-on labs, workshop settings and others where you cannot expect that all of the participants have that.

The solution to this problem is Azure B2C. You create an Azure B2C tenant, link it to your Azure Subscription and create B2C accounts and add them to the lab services. That’s the best solution out there for these kinds of cases because you don’t deal with e-mail accounts and any other PII information and second, you have complete control over the user accounts.

Another issue that I found is that if you’re Lab Creator owner with the same account on multiple labs, it will not prompt you which lab you want so waiting for a fix on that.

For the final notes, this is an excellent offering for me as I will be using it heavily for my training session or workshops.

Azure DevOps and “VSProfessional” licenses

Azure DevOps and “VSProfessional” licenses

This is something I encountered at a client and I figured that I should write it here because it took a while to find the solution and the only answer came via a support ticket to MS. 

A while back when Azure DevOps was called VSTS or Visual Studio Online, you had the possibility to link the tenant to your Azure subscription for billing purposes. This thing allowed you to purchase basic use right licenses to the platform and it even allowed you to purchase Visual Studio Professional licenses which allowed you to license the user VS Pro installations via the platform. 

Azure view

The problem that I faced with this customer was that he was in this position and suddenly they started facing issues with the VS Pro licenses starting to expire and not working anymore. We tried figuring out what was the problem and why it didn’t work but unfortunately we hit a dead end and had to open a support ticket so we can get some assistance while in parallel we were investigating. 

We knew that Visual Studio monthly licenses were located in the marketplace –https://marketplace.visualstudio.com/items?itemName=ms.vs-professional-monthly – but we didn’t understand the correlation between one and another. 

On a hunch we purchased a few VS Pro Monthly licenses for some users to test out a theory and the lucky part was it worked but we didn’t have an answer as to why the issue existed. 

The answer came from the support person on MS end which provided an awesome explanation as to why the problem existed and how to basically fix it. 

The problem was that licensing users via the Azure Portal was deprecated a while ago and MS didn’t have a solution for seamless migrations to the new licensing model, so they allowed it to work for existing customers while they removed the capability from the portal. 

The licenses that appeared on the billing invoice were called “VSPRO – Monthly” which coincidentally matches the name with the VS Pro licenses from the marketplace. The reality was that the licenses that you could get from the Azure Portal were “Professional” Licenses which were tied to the old VSOnline model and it was allowed to work in parallel until it died by itself. 

Basically the old Professional license allowed you to run Visual Studio Professional and be a licensed user in VSTS / Azure DevOps but being deprecated, updates or newer versions of Visual Studio (starting from 2017 and going to 2019) simply started not being able to parse that licensing info assigned to the work account for the user and the instances ended up in an Extended Trial mode. 

The solution to this problem was to simply purchase the licenses from the marketplace, assigned them to all the “Professional” and after a day or two just remove the offering from the Azure Portal. 

After doing the whole operation, everything licensed correctly and the issue was solved.

Signing out. Have a good one!

Azure – Application Security Groups

Azure – Application Security Groups

Security is not something to kid about and when it comes to cloud, you have to be very through when you’re deploying your cloud infrastructure. Which means that you are still required to do defense in depth, use anti-malware systems, configure extended monitoring, logging and reporting mechanisms. When you’re going to the cloud, you have to be aware of the Shared Responsibility Matrix which applies to any cloud provider.

As you can see in the image above, you as a customer still have a responsibility to secure your cloud environment. So those skills you’ve developed while working on-premises will still be of value in the cloud.

The subject for today’s topic is managing Network Security Groups using a feature in Azure, called Application Security Groups.

What are they?

Application Security Groups are a mechanism to group virtual machines that reside in the same virtual network and apply Network Security Groups to them.

The way you deployed NSGs in Azure subscription was that you would assign them to a network interface or a subnet and then configure them in a granular manner, based on the deployment type. The reality was that it’s utopic to do this cleanly and it gets messy after a couple of months. So ASGs came to the rescue where it helped you group a set of VMs based on roles like Web, DB, Middleware etc. and apply NSG Allow / Deny rules on them.

By using an ASG, you simply your management overhead by just adding the VMs that you create in those groups and automatically you get the security policies applied from your NSG.

ASG Example – Source Ignite

Getting Started

Creating / using Application Security Groups is easy. Go to the Azure Portal -> Create a resource -> Type in Application Security Group and press create.

Or you can simply use Powershell

#PS Example for creating Application Security Groups.
$testAsg = New-AzureRmApplicationSecurityGroup -ResourceGroupName asgTest -Name testAsg -Location westeurope

After you’ve created the ASG, the next thing that you need to do is to assign it to some VMs, which can be done via the Portal or PS.

#PS Example for attaching ASGs
$Nic = Get-AzureRmNetworkInterface -Name test134 -ResourceGroupName asgtest
$Nic.IpConfigurations[0].ApplicationSecurityGroups = $testAsg
Set-AzureRmNetworkInterface -NetworkInterface $Nic

Next step is to add or modify your inbound / outbound rules to use those new ASGs you’ve created. Doing that is very simple and you can also do it via the portal or CLI.

#PS Example for adding ASG to NSG
Get-AzureRmNetworkSecurityGroup -Name  testNSG -ResourceGroupName asgtest |
Add-AzureRmNetworkSecurityRuleConfig -Name RDP-rule -Description "Allow RDP" -AccessAllow -Protocol Tcp -Direction Inbound -Priority 100 -SourceApplicationSecurityGroup $srcAsg -SourcePortRange * -DestinationApplicationSecurityGroup $destAsg DestinationPortRange 3389 | Set-AzureRmNetworkSecurityGroup

As you can see, it’s pretty easy to secure your VMs, and considering that it can become a pain to manage the NSGs even for simple deployments. I’m not even talking about very complex ARM deployments which deploy tens of VMs and link them together πŸ™‚

AKS – Working with GPUs

AKS – Working with GPUs

I’ve discussed about AKS before but recently I have been doing a lot of production deployments of AKS, and the recent deployment I’ve done was with Nvidia GPUs. 

This blog post will take you through my learnings after dealing with a deploying of this type because boy some things are not that simple as they look. 

The first problems come after deploying the cluster. Most of the times if not all, the NVIDIA driver doesn’t get installed and you cannot deploy any type of GPU constrained resources. The solution is to basically install an NVIDIA daemon and go from there but that also depends on the AKS version.

For example, if your AKS is running version 1.10 or 1.11 then the NVIDIA Daemon plugin must be 1.10 or 1.11 or anything that matches your version located here

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    kubernetes.io/cluster-service: "true"
  name: nvidia-device-plugin
  namespace: gpu-resources
spec:
  template:
    metadata:
      # Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
      # reserves resources for critical add-on pods so that they can be rescheduled after
      # a failure.  This annotation works in tandem with the toleration below.
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        name: nvidia-device-plugin-ds
    spec:
      tolerations:
      # Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
      # This, along with the annotation above marks this pod as a critical add-on.
      - key: CriticalAddonsOnly
        operator: Exists
      containers:
      - image: nvidia/k8s-device-plugin:1.10 # Update this tag to match your Kubernetes version
        name: nvidia-device-plugin-ctr
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
        volumeMounts:
          - name: device-plugin
            mountPath: /var/lib/kubelet/device-plugins
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins
      nodeSelector:
        beta.kubernetes.io/os: linux
        accelerator: nvidia

The code snip from above creates a DaemonSet that installs the NVIDIA driver on all the nodes that are provisioned in your cluster. So for three nodes, you will have 3 Nvidia pods.

The problem that can appear is when you upgrade your cluster. You go to Azure and upgrade the cluster and guess what, you forgot to update the yaml file and everything that relies on those GPUs dies on you.

The best example I can give is the TensorFlow Serving container which crashed with a very “informative” error that the Nvidia version was wrong.

Other problems that appear is monitoring. How can I monitor GPU usage? What tools should I use?

Here you have a good solution which can be deployed via Helm. If you do a helm search for prometheus-operator you will find the best solution to monitor your cluster and your GPU πŸ™‚

The prometheus-operator chart comes with Prometheus, Grafana and Alertmanager but out of the box you will not get the GPU metrics that are required for monitoring because of an error in the Helm chart which sets the cAdvisor metrics with https, the solution would be to modify the exporter HTTPS to false.

kubelet:
  enabled: true
  namespace: kube-system

  serviceMonitor:
    ## Scrape interval. If not set, the Prometheus default scrape interval is used.
    ##
    interval: ""

    ## Enable scraping the kubelet over https. For requirements to enable this see
    ## https://github.com/coreos/prometheus-operator/issues/926
    ##
    https: false

And import the dashboard required to monitor your GPUs which you can find here: https://grafana.com/dashboards/8769/revisions and set it up as a configmap.

In most cases, you will want to monitor your cluster from outside and for that you will need to install / upgrade the prometheus-operator chart with the grafana.ingress.enabled value as true and grafana.ingress.hosts={domain.tld}

Next in line, you have to deploy your actual containers that use the GPU. As a rule, a container cannot use a part of a GPU but only the whole GPU so thread carefully when you’re deploying your cluster because you can only scale horizontally as of now.

When you’re defining the POD, add in the container spec the following snip below:

    resources:
      limits:
       nvidia.com/gpu: 1

End result would look like this deployment:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: tensorflow
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: tensorflow
    spec:
      containers:
      - name: tensorflow
        image: tensorflow/serving:latest
        imagePullPolicy: IfNotPresent
        resources:
         limits:
          nvidia.com/gpu: 1

What happens if everything blows up and nothing is working?

In some rare cases, the Nvidia driver may blow up your data nodes. Yes that happened to me and needed to solved it.

The manifestation looks like this. The ingress controller works randomly, cluster resources show as evicted. The nvidia device restarts frequently and your GPU containers are stuck in pending.

The way to fix it is first by deleting the evicted / error status pods by running this command:

kubectl get pods --all-namespaces --field-selector 'status.phase==Failed' -o json | kubectl delete -f -

And then restart all the data nodes from Azure. You can find them in the Resource Group called MC_<ClusterRG><ClusterName><Region>

That being said, it’s fun times to run AKS in production πŸ™‚

Signing out.

Pin It on Pinterest