CPU Overallocation and Poor Network Performance in vCD – Beware of Resource Pools
For the longest time all VMware administrators have been told that resource pools are not folders and that they should only be used under circumstances where the impact of applying the resource settings is fully understood. From my point of view I’ve been able to utilize resource pools for VM management without too much hassle since I first started working on VMware Managed Service platforms and from a managed services point of view they are a lot easier to use as organizational “folders” than vSphere folders themselves. For me, as long as the CPU and Memory Resources Unlimited checkbox was ticked nothing bad happened.
Working with vCloud Director however, resource pools are heavily utilized as the control mechanism for resource allocation, sharing and management. It’s still a topic that can cause confusion when trying to wrap ones head around the different allocation models vCD offers. I still reference blog posts from Duncan Epping and Frank Denneman written nearly seven years ago to refresh my memory every now and then.
Before moving onto an example of how overallocation or client undersizing in vCloud Director can cause serious performance issues it’s worth having a read of this post by Frank that goes through in typical Frank detail around what resource management looks like in vCloud Director.
Proper Resource management is very complicated in a Virtual Infrastructure or vCloud environment. Each allocation models uses a different combination of resource allocation settings on both Resource Pool and Virtual Machine level
Undersized vDCs Causing Network Throughput Issue:
The Allocation Pool model was the one that I worked with the most and it used to throw up a few client related issues when I worked at Zetttagrid. When using the Allocation Pool method which is the default model you are specifying the amount of resources for your Org vDC and also specifying how much of these resources are guaranteed. The guarantee means that a reservation will be set and that the amount of guaranteed resources is taken from the Provider vDC. The total amount of resources specified is the upper boundary, which is also the resource pool limit.
Because tenants where able to purchase Virtual Datacenters of any size there was a number of occasions where the tenants undersized their resources. Specifically, one tenant came to us complaining about poor network performance during a copy operation between VMs in their vDC. At first the operations team thought that is was the network causing issues…we where also running NSX and these VMs where also on a VXLAN segment so fingers where being pointed there as well.
Eventually, after a bit of troubleshooting we where able to replicate the problem…it was related to the resources that the tenant had purchased or lack thereof. In a nutshell because the allocation pool model allows the over provisioning or resources not enough vCPU was purchased. The vDC resource pool had 1000Mhz of vCPU with a 0% reservation but he had created 4 dual vCPU VMs. When the network copy job started it consumed CPU which in turn exhausted the vCD CPU allocation.
What happened next can be seen in the video below…
With the resource pool constrained ready time is introduced to throttle the CPU which in turn impacts the network throughput. As shown in the video when the resource pool has the the unlimited button checked the ready goes away and the network throughput returns to normal.
Again, its worth checking out the impact on the network throughput in the video as it clearly shows what happens what tenants underprovision or overallocate their Virtual Datacenters in vCloud Director. Outside of vCloud Director it’s also handy to understand the impact of applying reservations on Resource Pools in terms of VM compute and networking performance.
It’s not always the network!