ACE-8, GCP Associate Cloud Engineer – scaling, MIG, template, Load Balancers

ACE-8, GCP Associate Cloud Engineer – scaling, MIG, template, Load Balancers


Today, on The Certified Q&A, let’s tackle another question for the Google Cloud Platform Associate Cloud Engineer. It’s very important to the process of understanding Google Cloud, and passing the certification exam, that you go through the question and attempt answering it yourself first. So pause the video, work through the question, we’ll catch up in just a little while, and I’ll show you how I do it. In this scenario, you have a definition of an instance template, that contains a web application. You are asked to deploy the application, so that it can scale based on the HTTP traffic it receives. What should you do? Before we look at each of the options, let’s consider the key requirements. One, there is an instance template. So, there is a blueprint of what the machine should be. We are serving a web application, so not just static files, but some compute is going to happen, every time a request comes in. Thirdly, we want to be able to scale, based on the HTTP traffic. So, if we have low traffic, we do not want to incur a lot of cost on the VMs that we provision, but as the scale increases, as more and more traffic comes
up, we are okay with provisioning more machines and incurring that cost. So, looking at each of the options, let’s first try option A. Create a VM from the instance template. Create a custom image from the VM’s disk. Export the image to Cloud Storage. Create an HTTP Load Balancer, and add the Cloud Storage Bucket as its backend service. I have illustrated what happens in each of these steps. From the Compute Engine template, we create an instance of machine or instance of VM, and do many of those. We, then create an image of the disk, put it on Cloud Storage, and set that as a backend for the HTTPS Load Balancer. Now, will it work with the HTTPS Load Balancer? It is possible to connect HTTPS Load Balancer to Cloud Storage, or to a backend service, and allow for services to scale. However, we can see that this is a very roundabout method. There seems to be just too much manual work to be done, to make this happen. How are we going to scale this? We don’t have a direct way of scaling, just based on Cloud Storage, because we are not storing just static files, we also have to serve an application. Just setting the Load Balancer backend to Cloud Storage is not going to serve that purpose. Based on both of these considerations, I am going to eliminate this option. How about option B? To create a VM from the instance template. Then, create an App Engine application in Automatic Scaling mode, that forwards all traffic to the VM. Already we are thinking, what exactly does it mean to say, forward all traffic to ‘the VM’? It is not going to be just one VM. We want to be able to have multiple VMs. And, this solution suggests, that we have to create a VM from each of them. Now, App Engine is a Platform as a Service (PaaS). You do not manually provision VMs here. Instead, what App Engine does, is to take a particular machine type that you select, and then automatically scale that based on the number of requests that comes in, or other parameters that you set. You have three options in choosing the way to scale, which is automatic, basic, or manual. But none of them, have you provisioning VMs manually, and then assigning that to App Engine. So, App Engine plus creating a VM instance from an instance template, that’s just not possible. Given that, we can eliminate option B. Now, let’s look at option D, which starts off suggesting we create the necessary amount of instances required for peak user traffic based upon the instance template. I already have an issue with this, which is that, if you provision the number of machines, based on what is the maximum load that you’ll expect, you’re going to over provision the number of machines. Most of our cost, is going to come from these instances that are created. Therefore, every instance that you’ve provisioned but you’re not using is wasted money. Ideally, you want the number of machines that are provisioned to follow the Load that you receive. Therefore, provisioning to maximum load would not be the right approach. Another part that I have issue with is the suggestion that based on the instance template you create a VM, and then you create an Unmanaged Instance Group, and add the instances to that instance group. Let’s find a little more about Unmanaged Instance Groups. Comparing Managed and Unmanaged Instance groups, we see that, a Managed Instance group has homogeneous instances. Whereas an unmanaged one, has heterogeneous instances. The homogenous instance, in case of a Managed Instance Group, comes from an instance template. Whereas for the heterogeneous instances, you can create whatever you want. The managed instances offers autoscaling, autohealing, regional deployment and automatic updating, none of which is supported by the Unmanaged Instance Group. So, the Managed Instance group is suitable for highly available and scalable services, whereas the Unmanaged Instance Group is not. Considering all of that, provisioning the max number of instances would be over-provisioning and wasteful. Ideally scaling should be based on the load that is received while it is happening. Also, to create an Unmanaged Instance Group based on an instance template is not possible. And, Unmanaged Instance Groups are not suitable for automatic scaling. So, given all of these, we are going to eliminate option D. How about option C? Option C suggests that we create a Managed Instance Group, or a MIG based on the instance template. Then you configure autoscaling based on HTTP traffic and configure the instance group as a backend service of an HTTP Load Balancer. It seems to be looking right, because we already understand Managed Instance Group can be created based on an instance template. Given that it is based on the same instance template, whenever we scale, whenever we create more machines, or if we reduce the number of machines, we are working with exactly the same kind of instance. That is very useful for scalability. Since we haven’t looked at Load Balancers yet, let’s spend some time studying the various kinds of Load Balancers, and what they are used for. Load Balancing allows you to scale your app. As and when the number of requests increase, it is possible for you to route these instance to different places, and also have the application scale by creating new VMs based on certain parameters that you’ve set within the Load Balancer. You’re able to support heavy traffic, because scalability is there, because routing to local regions is available. With Load Balancing, you can also do health checks regularly, to ensure that only healthy VMs receive requests, whereas unhealthy instances can be removed. Allowing routing of traffic to the closest set of virtual machines also means that you not only have scale but you also have low latency. Let’s look a little more deeper and find out the different types of Load Balancers, their different categories, which one you should choose, and the best way to configure these backends. There are two sets of Load Balancers. One, Global Load Balancing and Regional Load Balancing. If your backends are distributed across multiple regions, then you should use Global Load Balancing. If you want a single point at which your requests would come in, and this request can come from anywhere in the world but that single IP address is visible all across the world, then you should continue to use Global Load Balancing. And, it is only Global Load Balancing that supports IPv6. Regional Load Balancing is used when all your backends are in one region, and it supports only IPv4. The Global Load Balancing options are the HTTP(S) Load Balancer, the SSL Proxy Load Balancer, and the TCP Proxy Load Balancer. Whereas, the Regional Load Balancers are the Internal TCP or UDP, the Internal HTTP(S), and the Network TCP or UDP. External Load Balancers will distribute traffic coming from the internet to inside your GCP network. Whereas, internal Load Balancers will distribute traffic only within your GCP network. So, that is the other way of categorizing Load Balancers. As with other decision-making processes, there is a flow chart that allows you to choose the correct Load Balancer option depending on the kind of traffic that you receive, and also depending on where you’re serving that traffic from. Now, in our web app requirement, it receives HTTP traffic. It does not necessarily say, whether it’s only internal or external, but that doesn’t matter, because it would work with any one of the HTTPS Load Balancers. Now, what are Managed Instance Groups (MIGs)? Managed Instance Groups have identical VMs. So, you create an instance template which defines what should be the blueprint for all the copies of this VM that you’re going to be making. When you can create instance of the same type, it becomes easy to Load Balance all the requests, and say any of these requests can go to any of these machines, because they all function the same way. Managed Instance Groups provide high availability and scalability, and it’s possible to also consistently update these VMs because they are all very similar. Because the Load Balancer can automatically provision new machines, depending on load, we can also have the number of provisioned machines follow the load changes. If there is greater load, you can increase the number of instances whereas when the load decreases, you can bring down the number of instances. So, you can have your cost movement follow the load, as opposed to an option that says, provision to max load. Now, how do we connect the Load Balancer to a backend instance group? If you go to network services, choose Load Balancing, you can then choose a backend configuration which allows you to select a backend service. In this backend service, one of the options, as I’ve shown in step four, is to choose an instance group. And in step 5, we can select the instance group that we have defined. You can have for the configuration within this, to say ‘Hey you know what, I do not want more than a particular number of instances’, which means that you can cap the cost for the automatic scaling. This option to use a template for the VM will make the instances consistent, and allow you to scale easily. And the Load Balancer can then increase or decrease the number of VMs, based on the request Load. Given that, we can qualify answer C as the current option which will take care of all the requirements in our question. With that, let us look at some of the key learnings from this question. It is important that you know the different types of Load Balancers. Are they serving regional traffic or are they serving global traffic? Is the traffic within your GCP VPC, the Virtual Private Cloud, or is it going to be outside also? What protocols do they support? Do you need HTTP and HTTPS? Do you need TCP or UDP? What is the relationship between an instance template, an instance, and the Managed Instance Group? You would create the instance template, and create an instance based on that, so you’ll all have similar copies, and you can add these to a Managed Instance Group. So, when you set the backend of a Load Balancer to a Managed Instance Group, it will consistently create only copies of that particular instance template. In that process I showed you few screenshots that showed you how to connect a Load Balancer to a scalable backend, which whould be a Managed Instance Group. You can also additionally look at what are the different kinds of backends that a Load balancer can send traffic to. This will be the Instance Group, the Network Endpoint Group, or a Storage Bucket. A Storage Bucket would be if you’re serving static data. I’ll leave you with a few references. Again, understand what Cloud Storage and Buckets are. Understand how to configure both a backend Bucket or a Managed Instance Group as the backend that a Load Balancer will send traffic to. You should also, surely study the flow chart that allows you to choose the right Load Balancer option, depending on the kind of traffic you receive, and the kind of Load that you’re serving whether that’s regional or global. If you’re interested in picking up loads more learning on Google Cloud, go ahead and subscribe, right away!

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *