Azure Proximity Placement Groups

Azure Proximity Placement Groups

This month Proximity Placement Groups have been announced as a public preview offer and this post is here to tell you about them.

For a long time, we’ve been using availability sets to bring our applications as close as possible to ensure the lowest latency possible however this scenario couldn’t always be achieved because the cloud is shared with multiple customers and you’re deploying resources in tandem with other people. This means that if you’re looking to run a latency-sensitive application in Azure then Availability Sets or Availability Zones are not always the answer.

Proximity Placement Groups are here to remove that latency obstacle for your deployments. They introduce a new concept of co-location where all your deployments in a PPG will always be constrained to the same datacenter and as close as possible. This was a thing that could be achieved easily with Availability Sets but as Azure grows, data centers grow apart and network latency increases.

Before you start using Proximity Placement Groups, take note that with lower latency, you’re restricting your VM placement and that will cause a lot of deployment issues. You will get more frequent deployment failures because you’re limiting Azure where to deploy your VMs, with lower latency comes less capacity.

Getting started with PPG is not that simple for the GUI folks because Portal support is not added yet and you have to revert to the old arm template 🙂

Sample ARM Template.

Have a good one!

Azure Service Fabric in production -Field notes

Azure Service Fabric in production -Field notes

I’ve been holding off this post for a while now to gather more information, to figure out how things can be done better and so on. This post is a culmination of experiences I’ve had with Service Fabric, how I solved them and hopefully solve your issues before you have a disaster on your hands.

Starting from the beginning. What is Service Fabric?

Service Fabric is a distributed systems platform that makes it easy to package, deploy, and manage scalable and reliable microservices and containers. Service Fabric also addresses the significant challenges in developing and managing cloud native applications. Developers and administrators can avoid complex infrastructure problems and focus on implementing mission-critical, demanding workloads that are scalable, reliable, and manageable. Service Fabric represents the next-generation platform for building and managing these enterprise-class, tier-1, cloud-scale applications running in containers.

Source: Azure docs

That being said, I have a small list of recommendations that should be enforced in practice so that you don’t repeat the mistakes I had to repair.

Recommendation 1: Provision your Service Fabric Clusters from a pipeline like VSTS, Jenkins, TeamCity and use ARM as much as possible. I’ve learned from the master Jeffrey Snover when he talked about PowerShell DSC that you should treat you VMs as cattle not pets. I took that advice to heart and it saved me multiple times. If you put in the time to develop the scripts/arm templates to deploy your clusters then when something bad happens you start from scratch.

I’ve done it two times in the last four years. Everything that is Service Fabric related is automated in such a way that if a major cluster failure happens then in 30 minutes it’s back up and running.

Recommendation 2: When in doubt, have multiple application packages. This one is more of a design decision and evolved from multiple failures in production. It started out as a single app package with N services under it and the problem was that if one service died then all of them died. That’s not the idea of microservices so we decided to decouple them to not take down everything when a crash happens.

Recommendation 3: Don’t use self-signed certificates in production clusters (Yes, I’ve seen this one happen multiple times). I mean it, never use them and as a second-best practice, try to use a client and server certificate for our endpoints. If for whatever reason the certificate expires before you update the cluster then you’re going to have a total meltdown which will force you to redeploy the cluster.

When I say that you’re going to have a total meltdown, it means that the cluster goes in an insecure state and the system stops trusting the other nodes and no matter what you do it won’t restart. If debugging the problem takes more time than redeploying from scratch, delete and deploy.

Recommendation 4: If you really want HTTPS for your microservices then you need to be very specific about it because HTTP is hardcoded in the code. First of all, you need to instantiate an endpoint protocol for each of your microservices in the ServiceManifest.xml and bind the certificate in the code.

If you check OwinCommunicationListener::OpenAsync() you will find that this.listeningAddress has HTTP hardcoded, so obviously it doesn’t’ work with HTTPS.

Recommendation 5: Plan for outages and deploy accordingly. With Service Fabric you’re limited to an Availability Set and you don’t have access to Availability Zones to gain that 99.99% Availability SLA. That being said even with AZs, you should have DR ready for a serious outage like the cases where you need to wait 30 minutes for the cluster to be recreated. Use Traffic Manager in Priority Mode for an SFC specific endpoint (not an application) and test it every six months.

Recommendation 6: If you’re a decision-maker, don’t accept VPNs to connect to third-party services or any other things like that. This complicates the design and everything attached to it. I’ve been in this situation and it’s not pretty.

That being said, Service Fabric supports the deployment in dual load balancer mode with external and internal LBs. I’ve deployed a configuration like this and it quite works. ARM Template below:

ARM Template to deploy double LB clusters – External and Internal.

That’s all folks. Have a good one!

What is Network Watcher – Azure Cloud NetMonitoring on steroids

What is Network Watcher – Azure Cloud NetMonitoring on steroids

Network troubleshooting in the cloud was always a pain. Let’s talk about the Azure Network Watcher and what it can do for you.

Running workloads in the cloud can be very easy but when it comes to troubleshooting something you don’t have access to can prove to be quite a challenge.

As you may know, you’re not dealing with you’re regular on-premises network stack that you’re used to but you’re dealing with software-defined networking or SDN for short. This means that everything is virtualized and you don’t need to managed switches, routers or any other type of networking equipment. In the cloud, you manage virtual networks, subnets, IPs, VPN devices, network rules and so on but on a software level.

While everything is nice and fun with SDN, you will encounter in Azure most of the networking problems that you encounter on-premises. Emphasis on most. You will not deal with hardware problems, VLANs, STP and so on but you will deal with firewall rules, routes, priorities, wrong topologies.

Issues that you might encounter in Azure networking:

  • Loss of network connection
  • VM cannot connect to a service
  • VPN Gateway is not connecting to the on-premises server
  • VMs across VNET Peering cannot connect
  • Everything is wrong; Nothing works 🙂

Sounds familiar? It’s mostly the same as on-premises but you’re dealing with a different technology stack.

What are my options?

In Azure, there’s a nice piece of free technology called Network Watcher which allows you to do network debugging and figure out most of the time, where the network problem is. When I say most of the time, I’m saying that there are those cases where you do everything in your power and still cannot figure out where’s the issue.

Enabling Network Watcher:

This is the easy part; Go to All Services in Azure and type in Network Watcher then in the overview blade select the region where it should be enabled.

Network Watcher capabilities

With Network Watcher you have the following capabilities:

  • Network Topology
  • Connection Monitor
  • Network Performance Monitor
  • IP Flow verification
  • Next Hop validation
  • Effective NSG rules
  • VPN Troubleshooting
  • Packet Capturing
  • Connection Troubleshooting

Network Topology:

Network topology gives you a network diagram of actual Virtual network in scope. You select the subscription, RG and Virtual Network where you require the diagram and you get a network topology of what’s connected and how it’s connected.

This allows you to visually map how you’re network is deployed in Azure which means that you and obviously it helps when somebody requests a very detailed network diagram.

Connection Monitor:

Connection monitor allows you to set up a continuous endpoint monitor which gives you metrics about the connections over a period of time.

To set up a connection monitor, you need to press on Add and then specify what you need to monitor.

If you’re selecting an Azure VM as the source then the AzureNetworkWatcheExtesion will be installed to that VM. One thing to watch out when you’re specifying an Azure VM in the source pane is that you will be able only to select Azure VMs that are part of the same VNET. Not a peered network or anything else. The workaround to this problem would be to just specify an IP address and that’s it.

Once you start the connection monitor, you will get prompted with some historical data in graph and data from connection metrics and status. In the example from above, you can see that I set up a connection monitor from the Azure VM to 8.8.8.8 and you can see that the VM can communicate with Google DNS and the return time is 2 MS (1MS round-trip). If I would have had a problem I would have known when the problem happened and validate in the Azure Monitor logs what happened.

IP Flow Verify

IP Flow Verify lets you validate the configuration of your Network Security Group rules; It requires you to input five packet details (Protocol, Direction, SourceIP, SourcePort, DestinationIP and DestinationPort)

Once provided, Network Watcher|IP Flow Verify will do an NSG check to validate if the connection succeeds or fails. This basically validates your NSG configuration and not your VM firewall. If it succeeds or fails it’s going to tell you which rule was hit.

Next Hop

Next Hop is one of the simplest tools available in Network Watcher; It’s basically a tool that tells you which where the packet will go. It also gives you which route table is affecting the packet route.

Some examples of Next Hop are: VNET Peer, Internet, VNET or Network Appliance (NGFW)

Packet Capture

Packet Capture or Network Capture is the be all end all of the network troubleshooting. This tool allows you to very detailed captures of what’s happening on a target VM.

Before diving into this tool, it’s recommended to have a storage account ready for the captures because it’s much fast to get and distribute the data that’s being monitored. Otherwise, you have the option of saving the file locally on the VM but you’re not so flexible at that point.

When you’re specifying the details for the packet capture, you have the possibility of filtering out irrelevant traffic as shown in the picture above.

What about the others?

This article wants to show getting the basics of network monitoring in Azure. There’s Network Performance Monitor which is a whole beast itself and requires a hefty deployment to show off the possibilities. VPN Troubleshoot which gives you a log of what’s happening so you can debug VPN settings (on-premises ones mostly) and the logging part which will be another article as the scope is different there.

That being said. Thanks for reading and have a good one!

Azure – Application Security Groups

Azure – Application Security Groups

Security is not something to kid about and when it comes to cloud, you have to be very through when you’re deploying your cloud infrastructure. Which means that you are still required to do defense in depth, use anti-malware systems, configure extended monitoring, logging and reporting mechanisms. When you’re going to the cloud, you have to be aware of the Shared Responsibility Matrix which applies to any cloud provider.

As you can see in the image above, you as a customer still have a responsibility to secure your cloud environment. So those skills you’ve developed while working on-premises will still be of value in the cloud.

The subject for today’s topic is managing Network Security Groups using a feature in Azure, called Application Security Groups.

What are they?

Application Security Groups are a mechanism to group virtual machines that reside in the same virtual network and apply Network Security Groups to them.

The way you deployed NSGs in Azure subscription was that you would assign them to a network interface or a subnet and then configure them in a granular manner, based on the deployment type. The reality was that it’s utopic to do this cleanly and it gets messy after a couple of months. So ASGs came to the rescue where it helped you group a set of VMs based on roles like Web, DB, Middleware etc. and apply NSG Allow / Deny rules on them.

By using an ASG, you simply your management overhead by just adding the VMs that you create in those groups and automatically you get the security policies applied from your NSG.

ASG Example – Source Ignite

Getting Started

Creating / using Application Security Groups is easy. Go to the Azure Portal -> Create a resource -> Type in Application Security Group and press create.

Or you can simply use Powershell

After you’ve created the ASG, the next thing that you need to do is to assign it to some VMs, which can be done via the Portal or PS.

Next step is to add or modify your inbound / outbound rules to use those new ASGs you’ve created. Doing that is very simple and you can also do it via the portal or CLI.

As you can see, it’s pretty easy to secure your VMs, and considering that it can become a pain to manage the NSGs even for simple deployments. I’m not even talking about very complex ARM deployments which deploy tens of VMs and link them together 🙂

Resetting RHEL Root PW with Azure Serial Console

Oh Snap.

Did this problem ever happen to you? If yes, then you know that the way to solving this issue is by booting the distro into the Single User mode. But how do you do that in Azure? Well Serial Console to the rescue!

Usually this is easily solvable using the Run Command or by using the Reset Password blade but in this case imagine that they don’t work. This is the case of the SAP deployment using the RHEL VMs. You cannot do anything if you’ve lost access and if the VM crashes it’s even worse.

Nope, No SYSRQ for you.

You need to get to grub so you can boot the VM in single user mode. The problem here is that the VM is very fast for the serial console to connect and press the ESC button in the magic moment.

So what can you do?

The solution to that problem is to stop the VM without de-allocating it. This means that the VM on the Hyper-V server in the backend is not deleted but preserved. This means that you can have the serial console in standby to have a chance at that magic moment. How do you know that? Check figs 1 and 2.

Fig.1 This is where you have to be.
Fig.2. If you’re here, repeat the first step.

Once you’ve gotten to the screens that the VM is starting, this is what you need to watch for and then mash the ESC button:

Once you’ve managed to enter GRUB, you’re home free to reset the password using the steps below Press e in the Serial Console to edit the first OS line.

  • Go to the kernel line which starts with linux16
  • Add rd.break to the end of the line which will break the boot cycle. If selinux is enabled then add rd.break enforcing=0
  • Exit GRUB and reboot with the rd.break command saved by pressing ctrl x
  • During this reboot, the VM will go into the Emergency Mode where you have to mount the systemroot using the “mount -o remount,rw /sysroot” command.
  • This will boot you in single user mode, where you will have to type in chroot /sysroot to switch into the sysroot jail and then reset the password for the root user with passwd
  • Edit the sshd_config file “nano /etc/ssh/sshd_config” using your preferred editor so you enable root access using the Serial Console by setting PermitRootLogin yes
  • Once you’re done, reboot the VM and you’ve gotten root access.

GIF from Azure Docs – Grub editing representation

After you’re done resetting all the passwords, installing all the agents so you’re not confronted with this again, set PermitRootLogin no and you’re golden 🙂

Have a good one!

Pin It on Pinterest