The benefits of using Azure Active Directory outweigh the costs because a lot of issues that you have on-premises are handled automatically by Microsoft and the only thing that you have to do is to manage them properly from a management / security standpoint.
Single Sign-on is not something new or fancy or “next-gen,” a lot of people have been doing it using either ADFS or third-party tools or a combination of both. The problem was that this required a lot of knowledge of how the authentication mechanism works. With ADFS it was no joke to set up, and if there were no tutorial out there, you were left to figuring it out.
This blog post will cover the quirks that come up during the SSO / provisioning setup 🙂
To start off, first you need administrative access to the application(s) that you want to federate with because if you don’t have access then this is a little more complicated because you have to give out the right information to the person that’s managing that application.
Let’s assume, that you have all the access that’s required to third-party applications. Check the AAD tutorials if your application exists here
If your application is missing from the tutorial gallery, chances are it’s not out-of-box supported and you will have to build a custom application for it.
If you find your application exists in the gallery, don’t do the victory dance yet because you will still need to check your third party application if it supports custom SSO with a specific IDP or it requires a plan upgrade. If it’s none of the above then validate if the documentation is not outdated because if it is then you’re going to have issues during the configuration step.
Next up is the testing phase and the most painful if the app documentations don’t match. Once you’ve set up everything for the application to work with AAD, you have a testing step where you can validate if SSO is working. If SSO is not working then you need to run some debugging steps.
The best place you can look for the error codes that pop up is here. Some error codes are simple to debug. Most of them pop up during misconfiguration but some of them are ugly as hell and require a different type of debugging using Fiddler.
One example that I can provide is when I configured Google Apps with Azure Active Directory. The users were correctly provisioned in Google but for some reason, trying to open any Google APP resulted in a generic “Server Error”
How do you debug this one? The screenshot from above is the whole error. This can throw you off because you don’t think it’s an AAD problem but a Google problem right?
Well, the reality is that it was indeed an AAD problem but the problem started from on-premises. In AD the users didn’t have the E-Mail field populated and AAD Connect didn’t map them correctly in Office 365. After a thorough inspection, it turned out that the error was related to the fact that the new users got a domain suffix based on a primary domain which was with the xyz.onmicrosoft.com format, and that information was exclusively visible in the Exchange Online portal not AAD nor Office 365 user portal.
This was found during a Fiddler trace where we decoded the SAML response and presto, we found that AAD was sending the onmicrosoft.com domain to Google and not the correctly configured one.
So whilst Google returned the most generic (and useless) error one can think it, the error was correct.
Auto-provisioning is another fun thing available in AAD where it allows you to configure automatic provisioning for third party applications based on direct user assignment or via groups.
The problems that start here can be related to throttling, service issues or simple “un-optimized” coding.
The screenshot from above shows the automatic provisioning for Google Apps where you see that I have to press an Authorize button and enter a set of credentials there. The problem as you might have guessed is that auth token expires after a while (hours, days) if you have SSO configured with the same app. The solution to this one is to have another app for Google with Auto-Provisioning enabled. Some apps like Salesforce require a specific set of credentials which get encrypted or Service Principals.
The errors appear right the provisioning tab in the Enterprise Application and there you can find out if you have any type of problem
Beware that some applications simply do not have automatic provisioning enabled and that’s where you will have to have a mechanism to synchronize those identities. Azure Functions might works flawlessly for that 🙂
That’s all folks! Have a good one!
LE: Updated blog post and screenshots to be more accurate to today’s portal 🙂
There are times when as a consultant you’re challenged with very interesting tasks. My latest one was to migrate a client’s identity provider, from Google to Microsoft and, do that in such a seamless manner that nobody would notice the change.
Doing these types of migrations require a lot of planning, risk management and fall back scenarios. This is not an easy job to do and your mileage will vary depending on your use cases.
I’m not going to cover every aspect in this blog post because there are too many variables that depend on the current environment.
First, you start off with the discovery phase, where you document all the applications that have SSO with the current identity provider. This is a very important step because if you miss this one, you will have major problems.
Then you have to assess which of those applications can be migrated to the new identity provider or need some type of code modification. In general, code modifications are required for custom apps but there are some 3rd party SaaS apps which might require a different monthly plan to get custom IDP support.
Second, the analysis phase, where you analyze each application and come up with migration paths. This process length depends on how many applications have SSO with the current identity provider and how many are custom applications which require code changes.
Third, planning the migration. This is the part where you REALLY take your time. Plan for disaster. Set up a stark mirror environment with some identities in Google and some in AAD and test everything out. Even with everything in the green, still, assume the worst because you never know what changes or what appears till you do the migration.
When I planned for the migration, I asked some simple questions:
How do we migrate the user identities?
How do we “migrate” the passwords to Azure Active Directory?
How will we handle the custom apps?
How will we handle third-party apps?
Where do we draw a line and fallback?
How do we fall back to the old provider?
When is the point of no return?
These are some of the questions that you should think about and each of them should be tested properly in a “mirrored” environment because you cannot afford failure and fumble around when you’re doing the migration.
I will list some of the migration paths that you have available at hand which can be used or not, depending on your particular scenario. This is just a guideline and I won’t assume any responsibility for the environments your working on.
Let’s get started.
Solution 1 – Double hop SSO
This path works by setting up Azure Active Directory as the primary identity provider for Google. This solution allows you to perform a graceful transition of the SSO applications from Google identity to Azure Active Directory with minimal user impact and downtime if you have an on-premises environment with Active Directory.
If you have Active Directory then the solution requires you to install AADConnect, synchronize the identities to AAD and then perform the reverse SSO.
If you don’t have an on-premises footprint then you have the following solutions at hand:
Set up Google for SSO with auto provisioning in AAD and set up Self Service Password Reset.
Use a third party identity provider like OKTA to synchronize the identities and then use Self Service Password Reset.
Export the accounts from Google with the GAM tool (https://github.com/jay0lee/GAM) and import them in AAD with PowerShell
The solution gets tricky if you’re cloud only and need to synchronize the passwords from one provider to another. As far as I know, there’s no solution to cover this problem and this is why you need to communicate very transparently with your users.
Let’s say you solved the user and password problem, how do you authenticate to Google and SSO apps configured with Google? Simple, just follow this tutorial (https://docs.microsoft.com/en-us/azure/active-directory/saas-apps/google-apps-tutorial) Basically you will sign-in to Google using AAD and the same will go for the third party apps. Third party apps will go to Google and Google will go to Microsoft. When the authentification is done everything will go back in reverse and you’re authenticated.
Pros • Longer transition period. • Lower downtime while re-configuring applications. • One single password configured on-premises in Active Directory. (if applicable) • No immediate reconfiguration for custom SSO applications. • Lower impact on the IT Support team.
Cons • Double-hop identity might cause lag in authentication and confusion. • Redirect loops may happen under very slow or unstable internet connections. • Users will forget to set their passwords and will generate a little back & forth.
Solution 2 – Two passwords for Google and Microsoft
This process involves having two sets of credentials for Google and Microsoft. In this solution, users will have a password for Google and SSO applications and one password for AAD (Azure, E-mail, SharePoint, Office Suite, etc.). The problem here appears when you ask your users to have two passwords for different services.
Pros • Lower impact on users currently using SSO apps. • No double hop identity jumps between Google and AAD.
Cons • Users will have to use two sets of credentials (AAD, Google / SSO non-Microsoft apps) • User disruption when moving SSO apps from one identity provider to another.
Solution 3 – Hard Cutover
The hard-cutover process involves reconfiguring all the SSO applications to use Azure Active Directory as their primary identity provider. This solution implies that all the SSO applications that are configured to use Google as their primary identity provider are accounted for and documented.
Pros • One single identity provider between on-premises and cloud. • Full control over all the SSO / SAML applications.
Cons • Extremely high user impact. • Custom SSO applications will require reconfiguration. • High impact on the IT support team.
I highly recommend you take this approach as a last resort because you will face a lot of issues. This may work in small companies but when you have a lot of users then this becomes a huge issue.
Reverse federation might add a double hop authorization flow for applications that are configured to use Google as their primary identity provider. The flow requires an authentication/authorization from Microsoft that goes down to Google and the SSO application. In case of an unstable or very slow internet connection (<1mbps) authentication might fail.
Risk Mitigation • Authentication flow failures occur only on unstable or slow internet connections and will require a retry via an F5 in the browser or retry of the authentication flow via logging again to the application • Cookies that get corrupted during authentication flow will have to be cleared from the browser and the user will have to retry the authentication. • SSO applications might throw authorization errors if the user doesn’t exist in the Google or Microsoft Identity system. Verification has to be done to check that the user exists in both Google and Microsoft for the authentication flow to work.
Two passwords for Google and Microsoft
Risks • Users will not know exactly which service requires what password.
Risk Mitigation • Applications will require a re-authentication with the new system once they are migrated and will require the user to provide the credentials for the Microsoft Idp • Users will have to manage two sets of credentials: current set of Google credentials as before and a new set of credentials for AAD.
Risks • SSO application authorization failure. • SSO application configuration flow failures. • Re-authentication of SSO and other applications. • Custom applications failure.
Risk Mitigation • SSO applications will require to have users propagated correctly via the auto / manual provisioning flow. • SSO Applications will require to be properly configured for SSO with Azure Active Directory as per the documentation provided by Microsoft. • Re-authentication using the Microsoft provided credentials. • Custom applications will require a reconfiguration to be re-pointed to AAD.
I will stop this blog post here as I went on too far with it and it became a huge wall of text. I hope this blog post is useful for you and if you have any suggestions, please write down in the comments 🙂
On the 19th of April, we hosted the second edition of our premium cloud conference in Bucharest which we improved greatly since our first start and boy this year was fun 🙂 . The event lasted for one full day with two tracks (Business and Technical) with subjects about Cloud migration, AI, Containers and other fun stuff. This year we had over 250 participants which is a big improvement from last years 170 🙂
You can find the event photos here: https://www.facebook.com/search/str/conferinta+de+cloud/photos-keyword
My session was on Container Solutions in Azure
You’ve probably heard of containers by now, and I’m pretty sure that you’ve already used one or two in some development scenarios or even production. As to why use containers it is pretty simple to answer. Instead of virtualizing the whole hardware stack as in the case of a virtual machine, you just virtualize the operating system which will run atop of the OS kernel. Translation: They are smaller and faster! That being said, containers need to run somewhere, and in this session, we will be talking about what container solutions are in the Azure Cloud and how we can use them.
I hope you were at the event and had as much fun as I did. 🙂
Every time we wanted to deploy an application to Azure that needed to connect to a MySQL Database we had a small problem. The problem was that we either had to use a third party MySQL provider like ClearDB and pay extra fees for the service or option two, create and maintain our MySQL instance in a virtual machine. The problem with ClearDB was that performance was horrible even, prices were high, and the service was not neatly integrated into Azure. The problem with a Virtual Machine was that we had to install, configure, secure and maintain a MySQL instance. One instance of MySQL is almost OK to support, but if you need High-Availability with data replication and all the she-bang, then we would be dealing with a disaster waiting to happen. For example, configuring SQL Always-On is in some way simple to do because the mechanisms are already integrated into the product, but you don’t have that with MySQL Community Edition or MariaDB Community Edition. I tried once to configure HA on a MariaDB instance, and oh boy, there were some many problems I couldn’t wrap my head around.
Last year Microsoft announced the public preview of the MySQL and Postgre database as a service offering in Azure which provided us mainly with a SQL Database like experience. Well, recently Microsoft announced that the services as mentioned earlier are now Generally Available, so I took the time to write my experience with them from a management and performance standpoint.
What is Azure Database for MySQL?
The Azure Database for MySQL is a database offering based on MySQL Community Edition with built-in high availability, scalable, secure and point in time backups. From a management standpoint, we’re relieved of patching, security, HA, backup and so on; That duty is offloaded to Azure at no cost to you as a user. The best benefit that you get with the service is that you can dynamically scale it on demand.
Comparable to the Azure SQL Database offering, we have multiple tiers of performance with different price points depending on our needs. At this point we have the following tiers available:
This one should be only used in dev/test scenarios. This type of tier will not offer you predictable performance nor a high number of MySQL Connections
The number of Max Connections is 50 per core, and this tier will only allow you to have a maximum 2vcores and it runs on Standard storage (HDD)
This is the tier that should be used for production applications. The tier offers scalable and predictable I/Os. If we want to compare this to the Azure SQL Database offering, we can compare it to the Standard tier.
The number of Max Connections is dependant on the vCore count:
2 vCores – 300 Connections
4 vCores – 625 Connections
8 vCores – 1250 Connections
16 vCores – 2500 Connections
32 vCores – 5000 Connections
This is the highest tier available at the moment and the most expensive. This tier is great for high concurrency and fast transaction processing. As this tier is not cheap to start with, I suggest first testing if your application uses the added benefits of the tier.
The number of Max Connections is dependant on the vCore count:
2 vCores – 300 Connections
4 vCores – 625 Connections
8 vCores – 1250 Connections
16 vCores – 2500 Connections
32 vCores – 5000 Connections
*The Max Connection numbers presented above can change at any point so consult the Azure documentation before you size a database for an application
If your application is very chatty with the database server and you’re using too many connections you will receive the following error: ERROR 1040 (08004): Too many connections
This is a standard MySQL error. If you’re hitting that wall, then your options are to either scale up the Database or modify your application so that it doesn’t initiate that many TCP connections.
Another limitation that you will hit if you’re porting a legacy application to this service is that the engine doesn’t support MyISAM databases. If you’re using MyISAM, then you will have to convert it to InnoDB. The reason as to why MyISAM is not supported in this scenarios is because it’s not scalable and cannot work in distributed environments.
Converting the database can be simple, or it can be hard. The simple way is just to run “ALTER TABLE table_name ENGINE=InnoDB;” but you don’t get that many free lunches in life.
So Microsoft is announcing that it will add to it’s DMA (Database Migration Assistant) the possibility of migrating your on-premises / IaaS MySQL/Postgre instances to Azure Database for MySQL.
As the “server admin,” you do not have any super admin or DBA role privileges which means that modifying specific settings is not permitted. This is intended so that you don’t cause any issues with the database server and cause a service disruption by mistake. You have the possibility of importing databases using mysqlimport and mysqldump on the database server.
What we should know before using the service
As any PaaS offering, we have to understand that we are provisioning a database server and deploying databases in a shared environment. So depending on the plan we will be using, we will be affected in some way by the other databases that will be sitting on the same servers as we are. With that in mind, we have to know the limits of each tier that’s available to us and test the application while applying load.
If you’re getting the “Too many connections” error, then you might have to scale-up the database or re-write some code. Another factor that will affect the performance of our application is that the database server is not in the same virtual network or on the same VM as the application, so you have to take into account the network latency factor. The latency will not be huge, but if your application is expecting super fast responses because it found the database in memory, then you will have a significant issue.
Another factor that will cause problems to your application is transient errors. These types of errors occur naturally in a cloud environment because the cloud provider is dealing with millions of servers and failure is something pretty frequent. So these transient failures usually occur when your database was moved to another server, and the load balancer that’s handling your requests didn’t switch, and you will get a timeout. That timeout is very short, but if your application doesn’t have a retry mechanism like a circuit breaker, then you will get an exception.
How do I start?
Creating an Azure Database for MySQL is pretty simple. You can do it from the Portal or from the CLI.
From the portal you will have to go to Create a resource -> Type in “Azure MySQL” -> Select Azure Database for MySQL -> press create.
After you press create, you will be presented with a new blade asking you to fill out some parameters. After you fill out all the parameters shown in the screenshot below, you can select the pricing and performance tier.
From the CLI, you have to run the following commands:
#Create the RG
az groupcreate--name MYSQL-RG--location westeurope
#Add the RDBMS extension to the Cloud Shell
az extension add--name rdbms
#Create a General Purpose MySQL Server on a Gen5 server with 2 vCores, running MySQL version 5.7. The command --sku-name GP_Gen5_2 translates to General Purpose, Server Generation and number of cores.
az mysql server create--resource-groupMYSQL-RG--name azsqlserver--location westeurope--admin-useradminuser--admin-password<server_admin_password>--sku-nameGP_Gen5_2--version5.7
After you press create, you wait a few minutes for the server to be provisioned and after that, you ready to connect to it. What you need to know is that the firewall and SSL settings are enforced by default so you will have to add your IP to the whitelist so you can connect to it with MySQL Workbench, allow Azure services if you need an VM or App Service to connect to it and when you’re connecting your application, you will have to change the connection string to use encrypted connections otherwise you have to disable SSL.
You can dynamically scale the CPU / Storage based on your needs but you cannot change the pricing / performance tier after the server has been provisioned. Storage can only be increased and not lowered and you cannot change to LRS from GRS or vice versa for the Backup Redundancy Option.
In a previous blog post, I talked about how excellent the managed Kubernetes service is in Azure and in another blog post I spoke about Azure Container Instances. In this blog post, we will be combining them so that we get the best of both worlds.
We know that we can use ACI for some simple scenarios like task automation, CI/CD agents like VSTS agents (Windows or Linux), simple web servers and so on but it’s another thing that we need to manage. Even though that ACI has almost no strings attached, e.g. no VM management, custom resource sizing and fast startup, we still may want to control them from a single pane of glass.
ACI doesn’t provide you with auto-scaling, rolling upgrades, load balancing and affinity/anti-affinity, that’s the work of a container orchestrator. So if we want the best of both worlds, we need an ACI connector.
The ACI Connector is a virtual kubelet that get’s installed on your AKS cluster, and from there you can deploy containers just by merely referencing the node.
If you’re interested in the project, you can take a look here.
To install the ACI Connector, we need to cover some prerequisites.
The first thing that we need to do is to do is to create a service principal for the ACI connector. You can follow this document here on how to do it.
When you’ve created the SPN, grant it contributor rights on your AKS Resource Group and then continue with the setup.
I won’t be covering the Windows Subsystem for Linux or any other bash system as those have different prerequisites. What I will cover in this blog post is how to get started using the Azure Cloud Shell.
So pop open an Azure Cloud Shell and (assuming you already have an AKS cluster) get the credentials.
az aks get-credentials-gRG-nAKSNAME
After that, you will need to install helm and upgrade tiller. For that, you will run the following.
The reason that you need to initialize helm and upgrade tiller is not very clear to me but I believe that helm and tiller should be installed and upgraded to the latest version every time.
Once those are installed, you’re ready to install the ACI connector as a virtual kubelet. Azure CLI installs the connector using a helm chart. Type in the command below using the SPN you created.
az aks install-connector-g<AKS RG>-n<AKS name>--connector-nameaciconnector--location westeurope--service-principal<applicationID>--client-secret<applicationSecret>--os-typeboth
As you can see the in command from above, I typed both for the –os-type. ACI supports Windows and Linux containers so there’s no reason not to get both 🙂
After the install, you can query the Kubernetes cluster for the ACI Connector.
kubectl--namespace=default get pods-l"app=aciconnector-windows-virtual-kubelet-for-aks"# Windows
kubectl--namespace=default get pods-l"app=aciconnector-linux-virtual-kubelet-for-aks"# Linux
Now that the kubelet is installed, all you need to do is just to run kubectl -f create YAML file, and you’re done 🙂
If you want to target the ACI Connector with the YAML file, you need to reference a nodeName of virtual-kubelet-ACICONNECTORNAME-linux or windows.
- name: nginx
- containerPort: 80
You run that example from above and the AKS cluster will provision an ACI for you.
What you should know
The ACI connector allows the Kubernetes cluster to connect to Azure and provision Container Instances for you. That doesn’t mean that it will provision the containers in the same VNET as the K8 is so you can do some burst processing or those types of workloads. This is let’s say an alpha concept which is being built upon and new ways of using it are being presented every day. I have been asked by people, what’s the purpose of this thing because I cannot connect to it, but the fact is that you cannot expect that much from a preview product. I have given suggestions on how to improve it, and I suggest you should too.
Well that’s it for today. As always have a good one!
Having a disaster recovery plan is not something new. The thing that we need to be aware of is that outages and security breaches are becoming more and more common and they will not go away ever. The cloud brought the possibility of moving part of our datacenter workloads there and leverage the high availability of their solutions. By leveraging cloud solutions doesn’t exempt us from having DR plans and solutions in place for our business-critical application. Remember the Amazon S3 outage ? That caused a lot of problems for a lot of companies, and they were down until Amazon solved the problem.
The Amazon problem is just one issue with a bucket of the other problems. There we had a human error that caused the outage, but we have natural disasters like hurricanes that take out datacenter without any mercy. We have Harvey, Irma and Maria that hit the US pretty severely so even if cloud providers are pretty resilient to these types of natural disasters, we still have to protect our business-critical applications.
Traditional backup solutions run on servers, those servers require storage and depending on backup times and data retention the costs can go up tenfold. This is the main thing that happens on-premises. In the cloud, you would have the same type of solution, but this time you’re not buying physical servers instead you’re renting compute cycles from a provider. You still have to deal maintaining the system so that it works when such a problem happens. Azure has a service called Backup and Site Recovery that offers a “One-Click” backup and disaster recovery solution that doesn’t require any maintenance from you.
Protecting Azure VMs with ASR is very simple. You have to provision a Recovery Services Vault in the region you want to do DR and after that follow a couple of simple steps. The reason for creating the vault in a different area is that if you’re creating it in the same region where your VMs are, and the region goes down, you will have a problem and fortunately if you do it by mistake, you will get an error from Azure that you cannot do that.
The first thing you have to do is to create a Recovery Services vault. To do that, you have to create a new resource, and in the marketplace, search filed you have to write “Backup and Site Recovery (OMS)”. After you click on it, you will be asked for a name, a region where it should be deployed and of course a Resource Group. The deployment is done in a matter of seconds, so you don’t have to wait too long for the solution to be ready.
Once you have the recovery vault up and running, you have to enable replication for your IaaS VMs. In your Vault, you go to Site Recovery and press on Replicate Application.
In the source environment, you have the option of protection on-premises environments or Azure environments. Take note that this service is still in preview so it might have issues 🙂
Select the source location and source resource group.
The next step is to select the Virtual Machines that should get replicated to the DR region.
Once you’re at the next screen, you can choose the target location for DR. You have the possibility of deploying in different regions than the vault is located so you don’t have a vault limitation in this case but remember that you will be paying for transfer costs, but if you’re in a DR case, I don’t think those costs matter that much. You can do edit some settings like the target Resource Group, Network, Storage and Availability sets and you can also modify the replication policy.
After you’re done configuring your settings, press on Create Target Resources, located in the Step 3 blade and wait for them to be created without closing the blade and once they are done you will be able to enable replication and you’re done 🙂
Replication takes a while depending on the number of VMs you’re protecting so this is the point where you start doing something else.
Once the replication is done, you can now setup recovery plans and do test failovers and complete failovers from a single pane of glass.
That it’s. Simple no? If you’re not doing DR for your IaaS environment then I would seriously ask you to take ASR into consideration and see what can give you.
On the 13th of February, we hosted a Winter ITCamp Community event in Cluj-Napoca. At the event, we talked about containers, SOLID principles and Blockchain.
Winter ITCamp Community Event (Free)
Tuesday, Feb 13, 2018, 6:00 PM
The Office Bulevardul 21 Decembrie 1989, nr. 77 Cluj-Napoca, RO
78 ITCamp-ers Went
• ITCamp Community îți propune ca pe data de 13 Februarie să se întâlnească cu specialiști din IT din Cluj-Napoca, oferind un eveniment gratuit. Evenimentul este organizat de comunitate pentru comunitate și susținut în totalitate de Yonder. • Agenda: 18:00-18:10 – Networking and coffee 18:10-19:00 – Container solutions in Azure (Florin Loghiade) 19…
A lot of people showed up that were interested in containers, microservices, blockchain and SOLID principles.
We had a lot of fun at the event and the people that joined had a lot of questions 🙂
Container solutions in Azure – Florin Loghiade Abstract:
Container solutions in Azure: You’ve probably heard of containers by now, and I’m pretty sure that you’ve already used one or two in some development scenarios or even in production. As to why use containers it is pretty simple to answer. Instead of virtualizing the whole hardware stack as in the case of a virtual machine, you just virtualize the operating system which will run atop of the OS kernel. Translation: They are smaller and faster! That being said, containers need to run somewhere, and in this session, we will be talking about what container solutions are in the Azure Cloud and how we can use them.
SOLID for Everyone – Daniel Costea Abstract:
Presentation will show what are and how you can use these principles in a practical way, using C# language, following a series of refactoring steps on an unoptimized sample of code.
Azure Blockchain Service – myth or reality (Radu Vunvulea) Abstract:
This is a session dedicated to blockchain. We will talk about mining inside a cloud provider and why blockchain is so attractive to any company nowadays. In the second part of the session we will talk about a new service from Azure that is allowing us to use blockchain as a service (SaaS)
Visual Studio Team Services or VSTS is Microsoft’s cloud offering that provides a complete set of tools and services that ease the life of small teams or enterprises when they are developing software.
I don’t want to get into a VSTS introduction in this blog post, but what we need to know about VSTS is that it’s the most integrated CI/CD system with Azure. The beautiful part is that Microsoft has a marketplace with lots of excellent add-ons that extend the functionally of VSTS.
Creating a CI/CD pipeline in VSTS to deploy containers to Kubernetes is quite easy. I will show in this blog post a straightforward pipeline design to build the container and deploy it to the AKS cluster.
The prerequisites for are the following:
VSTS Tenant and Project – Create for free here with a Microsoft Account that has access to the Azure subscription
VSTS Task installed – Replace Tokens Task
Azure Container Registry
Before we even start building the VSTS pipeline, we need to get some connection prerequisites out of the way. To deploy containers to the Kubernetes cluster, we need to have a working connection with it.
Open a Cloud Shell in Azure and type in:
az aks get-credentials-gAKS_RG-nAKS_NAME
It will tell you that the current context is located in “/home/NAME/.kube/config.”
Now open the /home/NAME/.kube/config with nano or cat and paste everything from there in a notepad. You need that wall of text to establish the connection to the cluster using VSTS.
Let’s go to VSTS where we will create a service endpoint to our Kubernetes cluster.
At the project dashboard, press on the whell icon and press on services.
Press on the New Service Endpoint and select Kubernetes.
Paste in the details from the .kube/config file in the kubeconfig box and the https://aksdns
Create a repository and add the following files and contents to it:
*I know it would be easier to clone from my Github Repo but when I’m learning I like doing copying and pasting stuff in VSCode, analyse and then upload.
Path to publish = deploy.yaml
Artifact name = deploy
Artifact publish location = Visual Studio Team Service/TFS
Now go to triggers, select Continuous integration and check “Enable continuous integration” then press the arrow on Save & queue and press save.
The build has been defined; now we need to create a release.
Go to Build and Releases and press on Release
Press on the cross and then on the “Create release definition”
In the New Release Definition pane, select the “Deploy to Kubernetes Cluster template and press on Apply
Now that the template is pre-populated to deploy to the Kubernetes Cluster, you need to add an artifact, select the Build Definition and add it.
Now it’s time to enable Continuous deployment so press on the lightning bolt that’s located in the upper right corner of the artifact and enable the CD trigger.
Now go to the Tasks tab located near the Pipeline and modify the kubectl apply command. kubectl apply
Kubernetes Service Connection = Select the K8 connection that you created
Command = Apply
Use Configuration files = Checked
Configuration File = press on the three dots and reference the deploy.yaml or copy what is below.
Now press save, queue a new build and wait for the container to get deployed and when it’s done just type in the Azure Cloud Shell kubectl get services and the IP will pop.
So you finished configuring the CI/CD pipeline and deployed your first container to an AKS cluster. This might seem complicated at first but once you do this a couple of times, you will be a pro at it, and the problems you will face will be on how to make it more modular. I do similar things at clients most of the times when I’m automating application deployments for cloud-ready or legacy applications. This type of CI/CD deployment is quite easy to deploy, when you want to automate a full blow microservices infrastructure, then you will have a lot more tasks to do jobs. My most significant CI/CD pipeline consisted of 150 tasks that were needed to automate a legacy application.
What I would consider some best practices for CI/CD pipelines in VSTS or any other CI/CD tool is to never hard code parameters into tasks and make use of variables/variable groups. Tasks like the “Replace Tokens” one permit you to reference those variables so when one changes or you create one dynamically, they just get filled in the code. This is very useful when your release pipeline deploys to more than one environment, and you can have global variables and environment specific variables.
Coming from the infrastructure world, I would say that I had a bit of a hard time wrapping my head around how you would manage containers when they get out of control. When you’re playing around with 1-2 containers, that’s not a big deal, but when you’re getting in the hundreds, then that’s where the problems start. I as an infrastructure guy always ask the nasty questions as:
Where do I keep them?
How do I secure them?
How do I update them?
How do I protect myself from the 2 AM calls?
Containers are immutable images that work everywhere, but when you’re building a very complex application that runs on containers, you’re asking yourself “where do I put them?”. The answer to that question is a container orchestrator but which one? You just search, and you find out that there are multiple ones. If your operations are mostly in the cloud, you’re looking for container orchestrators in marketplace offering, and you find where you will find the Azure Container Service that provides you with deployment options for Docker Swarm, DC/OS and Kubernetes. The question that arises at that moment is “Which one should I pick?”
ACS just provides you with a consistent way of deploying those container orchestrators but in IaaS fashion. You will still have to do patch and security management. Kubernetes is considered a first tier product in Azure, and it’s the most integrated orchestrator in Azure. When you deploy containers in a Kubernetes cluster, you don’t have to allocate IPs or provision disks. The system calls Azure’s APIs and does that for you, out of the box without any extra work.
With all that in mind, Microsoft brought forth a new offering in preview called Azure Container Service (AKS) that builds from scratch a high available Kubernetes cluster which you don’t manage it entirely. The parts that are under your management are the agent nodes where your containers will sit. When you need to do scale-out operations, you just tell the system that you want to scale out, and it will do that by itself. Think of DSC (Desired State Configuration) or ARM Templates (Azure Resource Manager), you declare what you want, and the system proceeds in doing that.
Creating an AKS
Before you start creating an AKS cluster you need to create a service principal in your Azure Active Directory tenant and generate an SSH private key.
Creating an Azure Service Principal is just as easy as creating an SSH key. You can do that by following this article here
I generate SSH keys with Putty and you do that by following this article here
After you create the Service Principal, grant it contributor rights on the subscription otherwise, it will not be able to deploy disks, file shares or IPs in its Resource Group. For production scenarios, you will create the SPN, grant it contributor access and after deploying the AKS, you can use RBAC to grant it contributor access to the AKS RG. We have to do this workaround because there’s no RG to grant it permissions.
Save the Application ID, secret and SSH private key in a text file because we will use them later.
You have two simple options for creating an AKS cluster; Portal or the CLI.
From the Azure marketplace, you search for AKS and the Azure Container Service (AKS) preview will show up. Click on it and let’s follow the steps.
In the first phase we will have to give the cluster a name, a DNS prefix (if we want to), choose the Kubernetes version (preferably the latest one), select the subscription create an RG and location.
The next phase we will use the generated Service Principal and SSH key and paste them accordingly. The Node count means the number of agent nodes we will have available. This is not a hardcoded number so if we want to scale-out, then we will have this option without an issue. You can see from here that are not asked to specify the number of master nodes. This is the part that’s managed by Azure.
Once you’re done and the deployment finishes, you will have two new resource groups in your subscription. The resource group you referenced, in my case AKS-RG and a resource group named after the RG, cluster name and location MC_AKS-RG_lfa-aks_westeurope
The CLI way is much simpler. You pop up a Cloud Shell, or you can go to shell.azure.com and paste this in:
az aks create--resource-groupAKS-RG--name AKSCLUS--node-count5--generate-ssh-keys
This will quickly create an AKS cluster for you and give you the SSH Keys.
So which one is simpler? Apparently, the CLI way but do remember that we don’t always have access to everything in an Azure Subscription. If we do not have access to the Azure Active Directory tenant, then we won’t be able to create that Service Principal and somebody with the right permissions will have to give them to us.
I have a cluster, now what?
When I first started playing around with AKS, I tried the hard way of installing all the required tools so that I can manage it and to be honest I got bored fast. If you want to do this on your machine then starters, you need Azure CLI installed and connected to the subscription, and after that, you will need kubectl and helm for cluster management and package management. Once you’re done with that, then you can start working with it. I found that the best way around everything is either to use shell.azure.com or configured it in VSCode CloudShell VS Code
In the CLI you can type az aks get-credentials -n clustername -g RGName and it will save the credentials that will be used to connect to the cluster in the current context.
az aks get-credentials-nlfa-aks-gaks-rg
Once all that’s done, you can leverage kubectl to play around with the cluster
kubectl get nodes# current number of agent nodes
kubectl get pods# current number of pods
#Create a deployments
kubectl create-f./my-manifest.yaml# create container resources from a file
kubectl create-f./my1.yaml-f./my2.yaml# create container resources from multiple files
kubectl create-f./dir# create resources based on yaml files from a directory
kubectl create-fhttps://rawgithuburl# create resources from an URL
Creating a container is pretty simple. I create a deployment with kubectl create -f yaml file
- port: 6379
- name: master
image: k8s.gcr.io/redis:e2e # or just image: redis
- containerPort: 6379
- port: 6379
- name: slave
- name: GET_HOSTS_FROM
# If your cluster config does not include a dns service, then to
# instead access an environment variable to find the master
# service's host, comment out the 'value: dns' line above, and
# uncomment the line below:
# value: env
- containerPort: 6379
# if your cluster supports it, uncomment the following to automatically create
# an external load-balanced IP for the frontend service.
- port: 80
- name: php-redis
- name: GET_HOSTS_FROM
# If your cluster config does not include a dns service, then to
# instead access environment variables to find service host
# info, comment out the 'value: dns' line above, and uncomment the
# line below:
# value: env
- containerPort: 80
Then I type in kubectl get service –watch and wait for Azure to provision a public IP for the service I just created. This process can take a few seconds or a few minutes, this is the part where you depend on Azure 🙂
kubectl get service--watch
After the deployment is done, you will get a public IP address and access it.
Scaling up the deployment is straightforward. You use the command kubectl scale –replicas and deployment name and you scale up the deployment.
If you want to use the autoscaler, you need to have CPU request and limits defined in the yaml file.
Once your yaml file contains the requests and limits for the service that you want to enable autoscaling
The procedure for scaling out the cluster is similar to the pod scaling. You run the AZCLI command to increase the node numbers, and that’s it.
az aks scale--resource-group=aks-rg--name=lfa=aks--node-count10
Upgrading the cluster
Upgrading the cluster is just as simple as scaling-out, but the problem is that being a preview offering, you might meet some issues as I have. I for one couldn’t manage to upgrade any AKS offering from the CLI due to problems in either Azure CLI or the AKS offering. This is not a big problem at the moment because it’s in preview but be warned that this is not production ready, and if you deploy a critical business application on the cluster, you might have problems.
The upgrade process is pretty simple; you first have to run the AZCLI command to find out what version is available and then just run the upgrade command.
az aks get-upgrades--name lfa-aks--resource-groupaks-rg--output table# get the available upgrade
az aks upgrade--name lfa-aks-gaks-rg--kubernetes-version1.8.9# upgrade the cluster
Kubernetes may be unavailable during cluster upgrades.
Are you sure you want toperform thisoperation?(y/n):y
The AKS offering is pretty solid from what I played around with it, and the experience of deploying a cluster manually end-to-end is not a pleasant experience. ACS and AKS allow you to deploy container orchestrators in a snap and just get on with your life. My little gripe with AKS is that the agent nodes are on standard VMs and not VMSS (Virtual Machine Scale Sets) and I don’t quite understand why they chose this way of doing things. Service Fabric runs on VMSS, DC/OS runs on VMSS so I don’t see why Kubernetes would be a problem. Time will tell regarding this one.
There are some limitations at the moment, mostly to the availability of the offering and public IP limits. You might not be able to create an AKS cluster, and if that happens, you just try again and from a services standpoint, you’re limited to 10 IPs because of the basic load balancer limitation.
From a pricing standpoint, I must say that it’s the best thing you can get. You pay just for the VMs. You’re not paying for anything that’s on top of it which is a big plus when compared to the other cloud providers which bill you for the service as well. What you need to know when it comes to billing is that when you create and AKS cluster, be aware that Azure is provisioning three master-K8 VM nodes which you will not see, but you will pay for them.
We will see how AKS will grow, but from what I’m seeing, it’s going in the right direction.
You have some options in Azure when you want to have a financially backed SLA for your VM deployments. When you can go into a distributed model, you can get 99.95, when you can’t then you have the option of getting 99.9% SLA when you’re using Premium Disks. But what if I want more?
If you want more, then it’s going to cost you more but before we jump into solutions, let’s understand what the numbers mean and why we should care.
You probably heard of the N nines SLA; three nines, four nines, five, six. To explain what that means, down below we have an excellent table which illustrates to us what those numbers mean in actual downtime.
In Azure for IaaS deployment, we have to option of gaining a 99.9% and 99.95% SLA. 99.9% translates into an acceptable downtime of 8.45 hours per year while 99.95% translates in around 4.22 hours per year. Now does this mean that we will have 4 or 8 hours of downtime for all of our IaaS deployments? Of course not but it might happen, that’s why you need to take all the necessary precautions so that your business critical application stays online all the time. We didn’t have the option of receiving a financially backed SLA for single VMs until recently so this is a big plus.
Recently Microsoft announced to ignite the public preview of Availability Zones which boost the SLA number to 99.99%, lowering the downtime to around 52 minutes in a year. But what are they exactly?
Availability Zones are the actual datacenter in a single region. All regions start with three zones but you during this preview, you might not be able to deploy services to all of them. If we’re talking about West Europe, then this region has three data centers that are physically separated in all terms and purposes. In order for Microsoft to financially back you for 99.99% SLA all the datacenters in a region have different power, network, and cooling providers so that if something happens to said provider then you won’t have a full region downtime and they are also 30 KM apart from each other, so they are protected from physical faults as well.
With Availability Zones, they also released Zone aware SKUs for some services like the Standard Load Balancer and Standard Public IP. At the time of writing we have the possibility of deploying VMs, VMSS, Managed Disks and IPs in an Availability Zone and SQL DB, Cosmos DB, Web Apps and Application Gateway already span three zones.
If you want to benefit from the four nine SLA, then you either deploy directly into availability zones or you redeploy your VMs.
As you can see from the above diagram, you need to use services that span zones, and after that, you need to deploy them in pairs just as you would do with Availability Sets. You clone your deployments, implement them in different zones, and you benefit from the 99.99% SLA.
*Preview Service: You have no guaranteed SLA while this service is in a preview. Once it goes GA, you will receive a financially backed SLA.
Achieving the SLA.
We have a couple of SLA numbers in our head, let’s now understand how to obtain them.
99.9% SLA- Single VMs – All your single VM deployments have to be backed by premium storage. That means that both the OS and Data disks have to be SSDs. We cannot mix and match and still qualify for the financially backed SLA. The best candidates for single VMs are the ones running relational databases or systems that cannot run in a distributed model. I wouldn’t recommend running web servers in single VM; you have App Services for that.
99.95% SLA – Availability Sets – All your distributed systems should run in Availability Sets to benefit from the 99.95% SLA and compared to single VM deployments, it doesn’t matter if you’re running Standard or Premium storage on them. AV Sets work nicely for Web Servers or other types of applications that are stateless or keep their state somewhere else. If your application has to keep its state on the actual VM, then your options are limited to the Load Balancer which can be set to have Sticky Sessions, but you will have problems in the long run. For stateful applications, it’s best to keep their state in a Redis Cache, Database or Azure Files Shares. This type of deployment works very well for most apps out there.
99.99% SLA – Availability Zones – This is the strongest SLA you can get at this time for your IaaS VMs. Availability Zones are similar in concept to the Availability Set deployment; you need to be aware of what candidates you’re deploying to the zones from an application standpoint and also from a financial standpoint. I’m saying financial because you need to use zone spanning services like the Standard SKU for the Public Load Balancer and Public IP. The standard Load Balancer is not free as the basic one, you pay for the number of load balancing rules you have, and you also pay for the data processed by it.
Financially backed SLA
Now that we have a basic understanding of SLAs, we have to understand what financially backed means regarding any cloud provider. When they say that the SLAs are financially supported, they mean that if something on the provider’s side causes an SLA breach, they will reimburse the running costs of the VM when the downtime occurred.
The formula looks like this:
Multiple VMs in Availability Sets
Monthly Uptime % = (Maximum Available Minutes-Downtime) / Maximum Available Minutes X 100
Maximum Available Minutes – This is the total number of runtime minutes for two or more VMs in a month.
Downtime – This is the total number of minutes where there was no connectivity on any of the VMs the AV Set.
This means that if the Monthly Uptime percentage is lower than 99.95%, you can ask Microsoft to grant you service credits.
Single VMs with Premium Storage
Monthly Uptime % = (Minutes in the Month – Downtime) / Minutes in the Month X 100
Minutes in a Month – Total number of minutes in a month.
Downtime – Total number of downtime minutes from the Minutes in a Month metric.
This means that if the calculated Monthly Uptime percentage is lower than 99.9%, then you can Microsoft to grant you service credits.
You might ask; How do I know that I had an SLA breach?
Well, you need to measure the uptime of your application. In the end, you might not care if one VM from your Availability set is down for say 10 minutes, but you will care if somebody calls you when the Website is down. You have multiple options out there to measure the availability of your application like UptimeRobot, Monitis, Pingdom, etc. You also have the possibility of doing measurements in Azure with Azure Monitor, but you’re not getting application uptime, so you need the best of both worlds to have an accurate view of the situation. I configure both because I want to know when something happens to a VM, and I also want to know if the application is up and healthy. The reason is that if you’re using say VMs and PaaS services, you need to know which one caused the downtime and if it was a human error. Microsoft will not pay for your mistakes, so you need to have self-healing systems in place to avoid human error. There are a lot of Configuration Management systems out there, systems like DSC / Chef / Puppet which ensure you that your configuration didn’t fail. Azure has Desired State Configuration integrated into it for example which grants you the ability to enforce states on VMs based on a configuration manifest.
That being said, gaining a financially backed SLA in Azure is not rocket science. I hope you obtained some useful information from this post 🙂