vSphere's Virtual CPUs - Avoiding the vCPU to pCPU ratio trap

2011 was a year where despite the economic constraints everything Big was seemingly good; Big Data, Big Clouds, Big VMs etc. Caught in the industry’s lust for this excess, 2011 was also the year I lost count of how many overprovisioned resources to ‘Big’ Production VMs I witnessed. More often than not this was a typical reaction from System Admins trying to alleviate their fears of potential performance problems to important VMs. It was the year where I began to hear justifications such as “yes we are overprovisioning our production VMs..but apart from the cost savings, overallocating our available underlying resources to a VM isn’t a bad thing, in fact it allows it to be scalable”. Despite this 2011 was also the year where I lost count of the amount of times I had to point out that sometimes overprovisioning a VM does lead to performance problems - specifically when dealing with Virtual CPUs.
VMware refers to CPU as pCPU and vCPU. pCPU or ‘physical’ CPU in its simplest terms refers to a physical CPU core i.e. a physical hardware execution context (HEC) if hyper-threading is unavailable or disabled. If hyperthreading has been enabled then a pCPU would consitute a logical CPU. This is because hyperthreading enables a single processor core to act like two processors i.e. logical processors. So for example, if an ESX 8-core server has hyper-threading enabled it would have 16 threads that appear as 16 logical processors and that would constitute 16 pCPUs.
As for a virtual CPU (vCPU) this refers to a virtual machine’s virtual processor and can be thought of in the same vein as the CPU in a traditional physical server. vCPUs run on pCPUs and by default, virtual machines are allocated one vCPU each. However, VMware have an add-on software module named Virtual SMP (symmetric multi-processing) that allows virtual machines to have access to more than one CPU and hence be allocated more than one vCPU. The great advantage of this is that virtualized multi-threaded applications can now be deployed on multi vCPU VMs to support their numerous processes. So instead of being constrained to a single vCPU, SMP enables an application to use multiple processors to execute multiple tasks concurrently, consequently increasing throughput. So with such a feature and all the excitement of being ‘Big’ it was easily assumed by many that taking advantage of such a feature by provisioning additional vCPUs could only ever be beneficial – but if only it was that simple.

The typical examples I faced entailed performance problems that were either being blamed on the Storage or the SAN and not CPU constraints especially as overall CPU utilization for the ESX server that hosted the VMs would be reported as low. Using Virtual Instruments’ VirtualWisdom I was able to quickly conclude that the problem was not at all related to the SAN or Storage but the hosts themselves. By being able to historically trend and correlate the vCenter, SAN and Storage metrics of the problematic VMs on a single dashboard it was apparent that the high number of vCPUs to each VM was the cause. This was indicated by a high reading of what is termed the 'CPU Ready' metric.
To elaborate, CPU Ready is a metric that measures the amount of time a VM is ready to run against the pCPU i.e. how long a vCPU has to wait for an available core when it has work to perform. So while it’s possible that CPU utilization may not be reported as high, if the CPU Ready metric is high then your performance problem is most likely related to CPU. In the instances that I saw, this was caused by customers assigning four vCPUs and in some cases eight to each Virtual Machine. So why was this happening?
VirtualWisdom Dashboard indicating high CPU Ready

Well firstly the hardware and its physical CPU resource is still shared. Coupled with this the ESX Server itself also requires CPU to process storage requests and network traffic etc. Then add the situation that sadly most organizations still suffer from the ‘silo syndrome’ and hence there still isn’t a clear dialogue between the System Admin and the Application owner. The consequence being that while multiple vCPUs are great for workloads that support parallelization but this is not the case for applications that don’t have built in multi-threaded structures. So while a VM with 4 vCPUs will require the ESX server to wait for 4 pCPUs to become available, on a particularly busy ESX server with other VMs this could take significantly longer than if the VM in question only had a single vCPU.

To explain this further let’s take an example of a four pCPU host that has four VMs, three with 1 vCPU and one with 4 vCPUs. At best only the three single vCPU VMs can be scheduled concurrently. In such an instance the 4 vCPU VM would have to wait for all four pCPUs to be idle. In this example the excess vCPUs actually impose scheduling constraints and consequently degrade the VM’s overall performance, typically indicated by low CPU utilization but a high CPU Ready figure. With the ESX server scheduling and prioritising workloads according to what it deems most efficient to run, the consequence is that smaller VMs will tend to run on the pCPUs more frequently than the larger overprovisioned ones. So in this instance overprovisioning was in fact proving to be detrimental to performance as opposed to beneficial. Now in more recent versions of vSphere the scheduling of different vCPUs and de-scheduling of idle vCPUs is not as contentious as it used to be. Despite this, the VMKernel still has to manage every vCPU, a complete waste if the VM’s application doesn’t use them!

To ensure your vCPU to pCPU ratio is at its optimal level and that you reap the benefits of this great feature there are some straightforward considerations to make. Firstly there needs to be dialogue between the silos to fully understand the application’s workload prior to VM resource allocation. In the case of applications where the workload may not be known, it’s key to not overprovision virtual CPUs but rather start with a single vCPU and scale out as and when is necessary. Having a monitoring platform that can historically trend the performance and workloads of such VMs is also highly beneficial in determining such factors. As mentioned earlier CPU Ready is a key metric to consider as well as CPU utilization. Correlating this with Memory and Network statistics, as well as SAN I/O and Disk I/O metrics enables you to proactively avoid any bottlenecks and correctly size your VMs and hence avoid overprovisioning. This can also be extended in considering how many VMs you allocate to an ESX Server and in ensuring that its physical CPU resources are sufficient to meet the needs of your VMs.  As businesses’ key applications become virtualized it’s an imperative that whether they are old legacy single threaded workloads or new multi threaded workloads the correct vCPU to pCPU ratio is allocated. In this instance size isn’t always everything it’s what you do with your CPU that counts.