The SANMAN: vSphere 5 & VAAI Demand Radical Changes to Storage Arrays

The launch of vSphere 5 and its new storage related features will set the precedent for a complete rethink on how a new datacenter’s storage infrastructure should be designed and deployed. vSphere 5’s launch is not only an unabashed attempt at cornering every single aspect of the server market but is also a result for the growing need for methodical scalability that merges the I.T. silos and consequently combines the information of applications, servers, SANs and storage into a single comprehensive stack. In an almost ironic shift back towards the original principles of mainframe, VMware’s importance has already influenced vendors such as EMC with their VMAX and HDS with their VSP in adopting a scale out as opposed to scale up approach to Storage. With this direction being at the forefront of most storage vendors’ roadmaps for the foreseeable future it subsequently dictates a criterion far beyond the storage capacity requirement I.T. departments are traditionally used to. With such considerations the end of the traditional Storage Array could be sooner than we think as a transformation takes place on the makeup of future datacenters.

With the emergence of the Private Cloud, Storage Systems are already no longer being considered or accepted by a majority of end users as ‘data banks’ but rather ‘on demand global pools’ of processors and volumes. End users have finally matured into accepting dynamic tiering, thin / dynamic provisioning, wide striping, SSDs etc. as standard for their storage arrays as they demand optimum performance. What will dictate the next phase is that the enterprise storage architecture will no longer be accepted as a model for unused capacity but rather one that facilitates further consolidation and on demand scalability. Manoeuvres towards this direction have already taken place as some vendors have taken to adopting Sub-Lun tiering (in which LUNs are now merely containers) and 2.5 inch SAS disks. The next key shift towards this datacenter transformation is the new integration of the server and storage stack as brought about with vSphere 5’s initiatives, such as VAAI, VASA, SDRS, Storage vMotion etc.

Leading this revolution is Paul Maritz, who upon becoming the head of VMware, set his vision clear: rebrand VMware from a virtualization platform to the essential element of the Cloud. In doing that VMware needed to showcase their value beyond the Server Admin’s milieu. Introduced with vSphere 4.1, vSphere API Array integration (VAAI) was the first major shift towards the comprehensive Cloud vision that incorporated the Storage stack. Now with further enhancements in vSphere 5, VAAI compatibility will be an essential feature for any Storage Array.

VAAI has several features named primitives aimed at driving this change forward. There are several primitives associated with the vSphere API Array integration (VAAI) with the first being the Full Copy primitive. Essentially with VMware when copying of data occurs whether via VM cloning or Storage vMotion, the ESX server becomes responsible for first having to read every single block that has to be copied and then writing it back to the new location. Of course this adds a fundamental load and effect on the ESX server. So for example when deploying a 100GB VM from a template the entire 100 GB will have to be read by the vSphere host and then subsequently written requiring a total of 200 GB of I/O.

Instead of this host intensive process the Full Copy primitive operates by issuing a single SCSI command called the XCOPY. The XCOPY is sent for a contiguous set of blocks which in essence is a command to the storage to do the copying of each block from one logical block address to another. So hence a significant load is taken off the ESX server and instead put upon the Storage array that can more than easily deal with the copy operations resulting in very little I/O between the host and array.

The Second Primitive, Block Zeroing is also related to the virtual machine cloning process which in essence is merely a file copy process. When a VM is copied from one datastore to another it would copy all of the files that make up that VM. So for a 100 GB virtual disk file with only 5 GB of data, there would be blocks that are full as well as empty ones with free space i.e. where data is yet to be written. Any cloning process would entail not just IOPS for the data but also numerous repetitive SCSI commands to the array for each of the empty blocks that make up the virtual disk file.

Block zeroing instead removes the need for having to send these redundant SCSI commands from the host to the array. By simply informing the storage array of which blocks are zeros the host offloads the work to the array without having to send commands to zero out every block within the virtual disk.

The third VAAI primitive is named Hardware Lock Assist. The original implementation of VMFS used SCSI reservations to prevent data corruption when several servers shared a LUN. What typically occurs without VAAI in such situations though, are what are termed SCSI reservation conflicts. Being a normal part of the SCSI protocol, SCSI reservations occur to give exclusive access to a LUN so that competing devices do not cause data corruption. This especially used to occur during VMotion transfers and metadata updates of earlier versions of ESX and caused serious performance problems and poor response times as devices had to wait for access to the same LUN. Although these reservations are typically less than 1ms, many of them in rapid succession can cause a performance plateau with VMs on that datastore.

Hardware-Assisted Locking in essence is the elimination of this LUN-level locking based on SCSI reservations. How it works is that initially the ESX server will make a first read on the lock. If the lock is free, the server will then send a Compare and Swap command that is not only the lock data that the server wants to place into the lock but also the original free contents of the lock. The storage array will then read the lock again and compare the current data in the lock to what is in the Compare And Write command. If they are found to be the same, new data will be written into the lock. All of this is treated as a single atomic operation and is applied at the block level (not the LUN level) making parallel VMFS updates possible.

With the advent of vSphere 5, enhancements with VAAI have also come about for infrastructures deploying array-based thin-provisioning. Thin-provisioned LUNs have brought many benefits but also challenges such as the monitoring of space utilization and the reclamation of dead space.
To counter this the VAAI Thin Provisioning primitive has an Out-of-Space Condition which monitors the space usage on thin-provisioned LUNs. Via advanced warning users can prevent themselves from catastrophically running out of physical space.

Another aspect of this primitive is what is termed Dead Space Reclamation (originally coined as SCSI UNMAP). This provides the ability to reclaim blocks of thin-provisioned LUNs whenever virtual disks are deleted or migrated to different datastores.

Previously when a deletion of a snapshot, a VM or a Storage vMotion took place the VMFS would delete a file and sometimes leading to the filesystem reallocating pointers instead of issuing a SCSI WRITE ZERO command to zero out the blocks. This would lead to blocks previously used by the virtual machine being still reported as in use, resulting in the array providing incorrect information with regards to storage consumption.

With vSphere 5 this has subsequently changed as now instead of a SCSI WRITE ZERO command or reallocation of VMFS pointers, a SCSI UNMAP command is used. This in fact enables the array to release the specified Logical Block Address back to the pool, leading to better reporting and monitoring of disk space consumption as well as the reclamation of unused blocks.

Back in 2009, Paul Maritz boldly proclaimed the dawning of ‘a software mainframe’. With vSphere 5 and VAAI, part of that strategy is aiming at transforming the role of the Storage Array from a monolithic box of capacity to an off-loadable virtual resource of processors for ESX servers to perform better. Features such as SRM 5.0, SDRS, VASA (details on these to follow soon in upcoming blogs) are aimed at further enhancing this push. At VMworld 2011, Maritz proudly stated that a VM Is born every 6 seconds and that there are more than 20 million VMs in the world with more than 5.5 vmotions per second. With such serious figures, whether you’re deploying VMware or not, it’s a statement that would be foolish to ignore when budgeting and negotiating for your next Storage Array.