vSphere's Virtual CPUs - Avoiding the vCPU to pCPU ratio trap


2011 was a year where despite the economic constraints everything Big was seemingly good; Big Data, Big Clouds, Big VMs etc. Caught in the industry’s lust for this excess, 2011 was also the year I lost count of how many overprovisioned resources to ‘Big’ Production VMs I witnessed. More often than not this was a typical reaction from System Admins trying to alleviate their fears of potential performance problems to important VMs. It was the year where I began to hear justifications such as “yes we are overprovisioning our production VMs..but apart from the cost savings, overallocating our available underlying resources to a VM isn’t a bad thing, in fact it allows it to be scalable”. Despite this 2011 was also the year where I lost count of the amount of times I had to point out that sometimes overprovisioning a VM does lead to performance problems - specifically when dealing with Virtual CPUs.
VMware refers to CPU as pCPU and vCPU. pCPU or ‘physical’ CPU in its simplest terms refers to a physical CPU core i.e. a physical hardware execution context (HEC) if hyper-threading is unavailable or disabled. If hyperthreading has been enabled then a pCPU would consitute a logical CPU. This is because hyperthreading enables a single processor core to act like two processors i.e. logical processors. So for example, if an ESX 8-core server has hyper-threading enabled it would have 16 threads that appear as 16 logical processors and that would constitute 16 pCPUs.
As for a virtual CPU (vCPU) this refers to a virtual machine’s virtual processor and can be thought of in the same vein as the CPU in a traditional physical server. vCPUs run on pCPUs and by default, virtual machines are allocated one vCPU each. However, VMware have an add-on software module named Virtual SMP (symmetric multi-processing) that allows virtual machines to have access to more than one CPU and hence be allocated more than one vCPU. The great advantage of this is that virtualized multi-threaded applications can now be deployed on multi vCPU VMs to support their numerous processes. So instead of being constrained to a single vCPU, SMP enables an application to use multiple processors to execute multiple tasks concurrently, consequently increasing throughput. So with such a feature and all the excitement of being ‘Big’ it was easily assumed by many that taking advantage of such a feature by provisioning additional vCPUs could only ever be beneficial – but if only it was that simple.

The typical examples I faced entailed performance problems that were either being blamed on the Storage or the SAN and not CPU constraints especially as overall CPU utilization for the ESX server that hosted the VMs would be reported as low. Using Virtual Instruments’ VirtualWisdom I was able to quickly conclude that the problem was not at all related to the SAN or Storage but the hosts themselves. By being able to historically trend and correlate the vCenter, SAN and Storage metrics of the problematic VMs on a single dashboard it was apparent that the high number of vCPUs to each VM was the cause. This was indicated by a high reading of what is termed the 'CPU Ready' metric.
To elaborate, CPU Ready is a metric that measures the amount of time a VM is ready to run against the pCPU i.e. how long a vCPU has to wait for an available core when it has work to perform. So while it’s possible that CPU utilization may not be reported as high, if the CPU Ready metric is high then your performance problem is most likely related to CPU. In the instances that I saw, this was caused by customers assigning four vCPUs and in some cases eight to each Virtual Machine. So why was this happening?
VirtualWisdom Dashboard indicating high CPU Ready

Well firstly the hardware and its physical CPU resource is still shared. Coupled with this the ESX Server itself also requires CPU to process storage requests and network traffic etc. Then add the situation that sadly most organizations still suffer from the ‘silo syndrome’ and hence there still isn’t a clear dialogue between the System Admin and the Application owner. The consequence being that while multiple vCPUs are great for workloads that support parallelization but this is not the case for applications that don’t have built in multi-threaded structures. So while a VM with 4 vCPUs will require the ESX server to wait for 4 pCPUs to become available, on a particularly busy ESX server with other VMs this could take significantly longer than if the VM in question only had a single vCPU.

To explain this further let’s take an example of a four pCPU host that has four VMs, three with 1 vCPU and one with 4 vCPUs. At best only the three single vCPU VMs can be scheduled concurrently. In such an instance the 4 vCPU VM would have to wait for all four pCPUs to be idle. In this example the excess vCPUs actually impose scheduling constraints and consequently degrade the VM’s overall performance, typically indicated by low CPU utilization but a high CPU Ready figure. With the ESX server scheduling and prioritising workloads according to what it deems most efficient to run, the consequence is that smaller VMs will tend to run on the pCPUs more frequently than the larger overprovisioned ones. So in this instance overprovisioning was in fact proving to be detrimental to performance as opposed to beneficial. Now in more recent versions of vSphere the scheduling of different vCPUs and de-scheduling of idle vCPUs is not as contentious as it used to be. Despite this, the VMKernel still has to manage every vCPU, a complete waste if the VM’s application doesn’t use them!

To ensure your vCPU to pCPU ratio is at its optimal level and that you reap the benefits of this great feature there are some straightforward considerations to make. Firstly there needs to be dialogue between the silos to fully understand the application’s workload prior to VM resource allocation. In the case of applications where the workload may not be known, it’s key to not overprovision virtual CPUs but rather start with a single vCPU and scale out as and when is necessary. Having a monitoring platform that can historically trend the performance and workloads of such VMs is also highly beneficial in determining such factors. As mentioned earlier CPU Ready is a key metric to consider as well as CPU utilization. Correlating this with Memory and Network statistics, as well as SAN I/O and Disk I/O metrics enables you to proactively avoid any bottlenecks and correctly size your VMs and hence avoid overprovisioning. This can also be extended in considering how many VMs you allocate to an ESX Server and in ensuring that its physical CPU resources are sufficient to meet the needs of your VMs.  As businesses’ key applications become virtualized it’s an imperative that whether they are old legacy single threaded workloads or new multi threaded workloads the correct vCPU to pCPU ratio is allocated. In this instance size isn’t always everything it’s what you do with your CPU that counts.

Exchange Completion Time - SAN Storage Performance Redefined

Roll back several years and certain vendors had you believe that Fibre Channel was dead and that the future would be iSCSI. A few years later and certain vendors were then declaring that Fibre Channel was dead again and that the future was FCoE. So while this blog is not a iSCSI vs FC or FC vs FCoE comparison list (there’s plenty of good ones out there and both iSCSI or FCoE each have immense merit), the point being made here is that Fibre Channel unlike Elvis really is alive and well. Moreover Fibre Channel still remains the protocol of choice for most Mission Critical Applications despite the FUD that surrounds its cost, manageability and future existence. Most Storage folk who run Enterprise class infrastructures are advocates of Fibre Channel not only because of its high performance connectivity infrastructure but also due to its reliability, security and scalability. Incredibly this is all with the majority of Fibre Channel implementations being vastly under utilized, poorly managed (due to lack of visibility) and running at a far from optimized state due to the constant day to day operations of most SAN Storage administrators. Indeed if Storage folk were empowered with a metric that could enable them to gain a better insight and understanding of their SAN Storage’s performance and utilization the so called impending death of Fibre Channel may have to take an even further rain check. Well that metric does exist; cue what is termed the “Exchange Completion Time.”

It’s now common for me to visit customer environments that run Fibre Channel SANs yet have various factions that complain they are suffering performance issues due to lack of bandwidth or throughput, whether that's server, VM, Network or Storage teams. In every single instance FC utilization has actually been incredibly low with peaks of 10% at the most and that's with 4GB/s environments not 8GB/s! At worst there may be an extremely busy backup server that singlehandedly causes bottlenecks and creates the impression that the whole infrastructure is saturated but even these occasions are often rare. What seems to be the cause of this misconception is the lack of clarity between what is deemed throughput and what is an actual cause of bottlenecks and performance slow downs i.e. I/O latency.

Sadly (and I am the first to admit that I was also once duped), Storage folk have been hoodwinked into accepting metrics that just aren’t sufficient to meet their requirements. Much like the folklore and fables of Santa Claus that are told to children during Christmas, storage administrators, architects and engineers have also been spun a yarn that MB/s and IOPS are somehow an accurate determination of performance and design considerations. In a world where application owners, server and VM admins are busily speaking the language of response times, Storage folk are engrossed in a foreign vocabulary that revolves around RAID levels, IOPS and MB/s and then numerous calculations to try and correlate the two languages together. But what if an application owner requested Storage with a 10ms response time that the Storage Administrator could then allocate with a guarantee of that performance? That would entail the Storage engineer not just looking at a one dimensional view from the back end of the Storage Array but one that incorporated the comprehensive transaction time i.e. from the Server to the Switch port to the LUN. That would mean considering the Exchange Completion Time.

To elaborate, using MB/s as a measurement of performance is almost akin to how people used to count cars as a measurement of road traffic. Harking back to my days as a student and before all of the high tech cameras and satellites that now monitor road traffic, I was ‘lucky’ enough to have a job of counting the amount of cars that went through Trafalgar Square at lunchtime. It was an easy job, I'd see five cars and I'd click five times but this was hardly accurate as when there was a traffic jam and all of the lanes were occupied I was still clicking five cars. Here also lies the problem with relying on MB/s as a measurement of performance. As with the counting car situation a more accurate way would have been to instead watch each single car and measure it's time from its origin to its destination. In the same vein, to truly measure performance in a SAN Storage infrastructure you need to measure how long a transaction takes from being initiated by the host, received by the storage and acknowledged back by the host in real-time as opposed to averages. This is what is termed the Exchange Completion Time.

While many storage arrays have tools that provide information on IOPS and MB/s to get a better picture of a SAN Storage environment and it’s underlying latency it's also key to consider the amount of Frames per second. In Fibre Channel a Frame is comparable to a word, a Sequence a sentence and an Exchange the conversation. A Standard FC Frame has a Data Payload of 2112 bytes i.e. a 2K payload. So for example an application that has an 8K I/O will require 4 FC Frames to carry that data portion. In this instance this would equate to 1 IOP being 4 Frames and subsequently 100 IOPS of the same size equating to 400 Frames. Hence to get a true picture of utilization looking at IOPS alone is not sufficient because there exists a magnitude of difference between particular applications and their I/O size with some ranging from 2K to even 256K. With backup applications the I/O sizes can be even larger. Hence it's a mistake to not take into consideration the amount of Frames/sec when trying to measure SAN performance or if trying to identify whether data is being passed efficiently. For example even if you are witnessing a high throughput in MB/s you may be missing the fact that there is a minimum payload of data and the Exchange (conversation) is failing to complete. This is often the case when there’s a slow draining device, flapping SFP etc. in the FC SAN network where instead of data frames causing the traffic you have a number of management frames dealing with issues such as logins and logouts, loss of sync or some other optic degradation or physical layer issue. Imagine the scenario, a Storage Administrator is measuring the performance of his infrastructure or troubleshooting a performance issue and is seeing lots of traffic via MB/s – unaware that many of the environment’s transactions are actually being cancelled across the Fabric!

This lack of visibility into transactions has also led to many storage architects being reluctant to aggressively use lower tiers of storage as poor I/O performance is often attributed to the storage arrays when often bottlenecks in the storage infrastructure are actually the root cause. Measuring performance via Exchange Completion Times enables measurement and monitoring of storage I/O performance, hence ensuring that applications can be correlated and assigned to their most cost- effective storage tier without sacrificing SLAs. With many Storage vendors adopting automated tiering within their arrays some would feel this challenge has now been met. The reality of automated tiering though is that LUNs or sub-LUNs are only dynamically relocated to different tiers based on the frequency of data access i.e. frequently accessed is more valuable so should reside on a higher tier and infrequently accessed data should be moved to lower tiers. So while using historical array performance and capacity data may seem a sufficient way to tier, it’s still too simplistic and lacks the insight for more optimized tiering decisions. Such an approach may have been sufficient to determine optimum data placement in the days of DAS when the I/O performance bottleneck was disk transfer rate but in the world of SANs and shared storage to look just at external transfer rates between SSD, Fibre Channel or SATA drives is a detached and inaccurate way to measure the effect of SAN performance on an application’s response time. For example congestion/problems in the SAN can result in severely degraded response times or cancelled transactions that fail to be acknowledged by the back end of the array. Furthermore incorrect HBA queue depths, the difference between sequential and random requests, link and physical layer errors all have an impact on response times and in turn application latency. By incorporating the Exchange Completion Time metric i.e. measuring I/O conversations across the SAN infrastructure into your tiering considerations, tiering can now accurately be based on comprehensive real time performance as opposed to device specific views.

Monitoring your FC SAN Storage environment in a comprehensive manner that incorporates the SAN fabric and provides metrics such as the Exchange Completion Time rapidly changes FC SAN troubleshooting from a reactive to proactive exercise. It also enables Server, Storage and Application administrators to have a common language of ‘response times’ thus eliminating any potential silos. With the knowledge of application I/O latency down to the millisecond, FC SAN Storage administrators can quickly be transformed from the initial point of blame to the initial point of resolution, while also ensuring optimum performance and availability of your mission critical data.

Demystifying the Cloud: IaaS, PaaS, SaaS, MaaS, CaaS & XaaS


Generally IT folk, whether in Storage, Virtualization, Change Management or Project Management love the use of acronyms and synonyms to express key concepts amongst each other. What other industry would allow an individual to spurt a line such as "Have SOX seen the BCP and CAB approval for our VDC's DR SAN and will this then be added to the CMDB by CoB today?" without immediately flinching or bringing in a logopaedics specialist for help. More often than not, IT folk have also used these synonyms and acronyms as smokescreens to prevent outsiders from realizing "well this IT stuff is actually quite easy to understand and quite straightforward".


Hence no surprise that when the seemingly simple concept of Cloud Computing took off, so did the emergence of an abundance of acronyms and synonyms reaping a new breed of I.T. professionals who were the only ones that could correctly understand them, i.e., ‘The Cloud Specialist'.  Despite this, the beauty of the Cloud (or as most people are starting to realise the synonym for the Internet) is that it not only encompasses the IT industry and their business demands but also the average end user who's only experience with IT is their iPhone and its App Store. So while EMC's extensive airport advertising may have initially confused a lot of tourists into thinking that the ‘Journey to the Cloud' was a slogan for an up and coming budget airline, the general public are certainly now becoming aware of ‘The Cloud'. End users are now bombarded with Clouds from Microsoft claiming that Windows 7 is your ‘Path to the Cloud', Pizza Restaurants offering free access to ‘the Cloud' and Apple iPhone owners having iCloud enforced upon them (no comment on the security issues of your email contacts and personal photos being uploaded to Apple's database).  So while the idea of Public, Private and Hybrid Clouds become more familiar and understood even amongst the masses, it's with surprise that I often find people within the IT industry who are still unaware or unsure of Cloud Service acronyms such as IaaS, PaaS, SaaS, Maas, Caas or Xaas.


To understand why there are so many acronyms with the Cloud, it is important to appreciate that the Cloud has a number of services which each of these classify. The first of these, IaaS (Infrastructure as a Service) is when the consumer does not deal with the infrastructure, instead the responsibility of the equipment is outsourced to the Service Provider. The Service Provider not only owns the equipment but will also be responsible for its running and maintenance, where the consumer will be charged on a ‘pay as you use' basis. IaaS is often offered as a horizontally integrated service that includes not only the server and storage but also the connectivity domains. For example while the consumer may deploy and run their own applications and operating systems, the Iaas provider would typically provide the replication, backup and archiving (Storage), the powerful computing requirements (Server) or the network load balancing and firewalls (Connectivity domains).
PaaS provides the capability for consumers to have applications deployed without the burden and cost of buying and managing the hardware and software.  In other words these are either consumer created or acquired web applications or services that are entirely accessible from the Internet. Usually created with programming languages and tools supported by the service provider these web applications enable the consumer to have control over the deployed applications and in some circumstances the application-hosting environment but without the complexity of the infrastructure i.e. the servers, operating systems or storage. Offering a quick time to market and services that can be provisioned as an integrated solution over the web, PaaS facilitates immediate business requirements such as application design, development and testing at a fraction of the normal cost.
Software as a service (SaaS) is the ability for a consumer to use on demand software that is provided by the service provider via a thin client device e.g. a web browser over the Internet. With SaaS the consumer has not only no management or control of the infrastructure such as the storage, servers, network, or operating systems, but also no control over the application's capabilities. Culled from what were originally referred to as (ASPs) Application Service Providers, SaaS is a quick and efficient delivery model for key business applications such as customer relationship management (CRM), enterprise resource planning (ERP), HR and payroll.


Monitoring as a Service (MaaS) is at present still an emerging piece of the Cloud jigsaw but an integral one for the future. In the same way that businesses realised that their infrastructure and key applications required monitoring tools that would ensure the proactive elimination of any downtime risks, Monitoring as a Service provides the option to offload a large majority of those costs by having it run as a service as opposed to a fully invested in house tool. So for example by logging onto a thin client or central web based dashboard which is hosted by the service provider, the consumer can monitor the status of their key applications regardless of location. Add the advantages of an easy set up and purchasing process and MaaS could be a key pay as you use model for the de-risking of applications that are initially being migrated to the Cloud.


Communication as a Service (CaaS), enables the consumer to utilize Enterprise level VoIP, VPNs, PBX and Unified Communications without the costly investment of purchasing, hosting and managing the infrastructure. With the service provider responsible for the management and running of these services also, the other advantage the consumer has is that they needn't require their own trained personnel, bringing significant OPEX as well as CAPEX costs.


Finally XaaS or ‘anything as a service' is the delivery of IT as a Service through hybrid Cloud computing and is a reference to either one or a combination of Software as a Service (SaaS), Infrastructure as a Service (IaaS), Platform as a Service (PaaS) Communications as a service (CaaS) or monitoring as a service (Maas). XaaS is quickly emerging as a term that is being readily recognized as services that were previously separated on either private or public Clouds are becoming transparent and integrated.


So as the term ‘The Cloud' finally breaks into the minds of the masses and takes meaning, the next phase will be to take the numerous services that are offered by the Cloud, mature them and enable consumers to fully understand their benefits. From Enterprise to SMB to end users, Cloud Services will inevitably bring immense benefits and cost savings. All that is now required is for consumers to know what all those unnecessarily complicated acronyms mean!

vSphere 5 & VAAI Demand Radical Changes to Storage Arrays


The launch of vSphere 5 and its new storage related features will set the precedent for a complete rethink on how a new datacenter’s storage infrastructure should be designed and deployed. vSphere 5’s launch is not only an unabashed attempt at cornering every single aspect of the server market but is also a result for the growing need for methodical scalability that merges the I.T. silos and consequently combines the information of applications, servers, SANs and storage into a single comprehensive stack. In an almost ironic shift back towards the original principles of mainframe, VMware’s importance has already influenced vendors such as EMC with their VMAX and HDS with their VSP in adopting a scale out as opposed to scale up approach to Storage. With this direction being at the forefront of most storage vendors’ roadmaps for the foreseeable future it subsequently dictates a criterion far beyond the storage capacity requirement I.T. departments are traditionally used to. With such considerations the end of the traditional Storage Array could be sooner than we think as a transformation takes place on the makeup of future datacenters.

With the emergence of the Private Cloud, Storage Systems are already no longer being considered or accepted by a majority of end users as ‘data banks’ but rather ‘on demand global pools’ of processors and volumes. End users have finally matured into accepting dynamic tiering, thin / dynamic provisioning, wide striping, SSDs etc. as standard for their storage arrays as they demand optimum performance. What will dictate the next phase is that the enterprise storage architecture will no longer be accepted as a model for unused capacity but rather one that facilitates further consolidation and on demand scalability. Manoeuvres towards this direction have already taken place as some vendors have taken to adopting Sub-Lun tiering (in which LUNs are now merely containers) and 2.5 inch SAS disks. The next key shift towards this datacenter transformation is the new integration of the server and storage stack as brought about with vSphere 5’s initiatives, such as VAAI, VASA, SDRS, Storage vMotion etc.

Leading this revolution is Paul Maritz, who upon becoming the head of VMware, set his vision clear: rebrand VMware from a virtualization platform to the essential element of the Cloud. In doing that VMware needed to showcase their value beyond the Server Admin’s milieu. Introduced with vSphere 4.1, vSphere API Array integration (VAAI) was the first major shift towards the comprehensive Cloud vision that incorporated the Storage stack. Now with further enhancements in vSphere 5, VAAI compatibility will be an essential feature for any Storage Array.

VAAI has several features named primitives aimed at driving this change forward. There are several primitives associated with the vSphere API Array integration (VAAI) with the first being the Full Copy primitive. Essentially with VMware when copying of data occurs whether via VM cloning or Storage vMotion, the ESX server becomes responsible for first having to read every single block that has to be copied and then writing it back to the new location. Of course this adds a fundamental load and effect on the ESX server. So for example when deploying a 100GB VM from a template the entire 100 GB will have to be read by the vSphere host and then subsequently written requiring a total of 200 GB of I/O.


Instead of this host intensive process the Full Copy primitive operates by issuing a single SCSI command called the XCOPY. The XCOPY is sent for a contiguous set of blocks which in essence is a command to the storage to do the copying of each block from one logical block address to another. So hence a significant load is taken off the ESX server and instead put upon the Storage array that can more than easily deal with the copy operations resulting in very little I/O between the host and array.

The Second Primitive, Block Zeroing is also related to the virtual machine cloning process which in essence is merely a file copy process. When a VM is copied from one datastore to another it would copy all of the files that make up that VM. So for a 100 GB virtual disk file with only 5 GB of data, there would be blocks that are full as well as empty ones with free space i.e. where data is yet to be written. Any cloning process would entail not just IOPS for the data but also numerous repetitive SCSI commands to the array for each of the empty blocks that make up the virtual disk file.

Block zeroing instead removes the need for having to send these redundant SCSI commands from the host to the array. By simply informing the storage array of which blocks are zeros the host offloads the work to the array without having to send commands to zero out every block within the virtual disk.

The third VAAI primitive is named Hardware Lock Assist. The original implementation of VMFS used SCSI reservations to prevent data corruption when several servers shared a LUN. What typically occurs without VAAI in such situations though, are what are termed SCSI reservation conflicts. Being a normal part of the SCSI protocol, SCSI reservations occur to give exclusive access to a LUN so that competing devices do not cause data corruption. This especially used to occur during VMotion transfers and metadata updates of earlier versions of ESX and caused serious performance problems and poor response times as devices had to wait for access to the same LUN. Although these reservations are typically less than 1ms, many of them in rapid succession can cause a performance plateau with VMs on that datastore.


Hardware-Assisted Locking in essence is the elimination of this LUN-level locking based on SCSI reservations. How it works is that initially the ESX server will make a first read on the lock. If the lock is free, the server will then send a Compare and Swap command that is not only the lock data that the server wants to place into the lock but also the original free contents of the lock. The storage array will then read the lock again and compare the current data in the lock to what is in the Compare And Write command. If they are found to be the same, new data will be written into the lock. All of this is treated as a single atomic operation and is applied at the block level (not the LUN level) making parallel VMFS updates possible.

With the advent of vSphere 5, enhancements with VAAI have also come about for infrastructures deploying array-based thin-provisioning. Thin-provisioned LUNs have brought many benefits but also challenges such as the monitoring of space utilization and the reclamation of dead space.
To counter this the VAAI Thin Provisioning primitive has an Out-of-Space Condition which monitors the space usage on thin-provisioned LUNs. Via advanced warning users can prevent themselves from catastrophically running out of physical space.

Another aspect of this primitive is what is termed Dead Space Reclamation (originally coined as SCSI UNMAP). This provides the ability to reclaim blocks of thin-provisioned LUNs whenever virtual disks are deleted or migrated to different datastores.

Previously when a deletion of a snapshot, a VM or a Storage vMotion took place the VMFS would delete a file and sometimes leading to the filesystem reallocating pointers instead of issuing a SCSI WRITE ZERO command to zero out the blocks. This would lead to blocks previously used by the virtual machine being still reported as in use, resulting in the array providing incorrect information with regards to storage consumption.

With vSphere 5 this has subsequently changed as now instead of a SCSI WRITE ZERO command or reallocation of VMFS pointers, a SCSI UNMAP command is used. This in fact enables the array to release the specified Logical Block Address back to the pool, leading to better reporting and monitoring of disk space consumption as well as the reclamation of unused blocks.

Back in 2009, Paul Maritz boldly proclaimed the dawning of ‘a software mainframe’. With vSphere 5 and VAAI, part of that strategy is aiming at transforming the role of the Storage Array from a monolithic box of capacity to an off-loadable virtual resource of processors for ESX servers to perform better. Features such as SRM 5.0, SDRS, VASA (details on these to follow soon in upcoming blogs) are aimed at further enhancing this push. At VMworld 2011, Maritz proudly stated that a VM Is born every 6 seconds and that there are more than 20 million VMs in the world with more than 5.5 vmotions per second. With such serious figures, whether you’re deploying VMware or not, it’s a statement that would be foolish to ignore when budgeting and negotiating for your next Storage Array.

Silos will prevent Tier 1 Apps reaching the Cloud

On a recent excursion to a tech event I had the pleasure of meeting a well-known ‘VM Guru’, (who shall remain nameless). Having read some of this individual’s material I was excited and intrigued to know his thoughts on how he was tackling the Storage challenges related to VMware especially with Fibre Channel SANs.

“Storage, that’s nothing to do with me, I’m a VirtGuy”, he proudly announced.

To which I retorted, “yes but if there are physical layer issues in your SAN fabric, or poorly configured Storage etc. it will affect the performance of your Virtual Machines and their applications, hence surely you also need some visibility and understanding beyond your Server’s HBAs?”

Seemingly annoyed with the question, he answered, “Why? I have SAN architects and a Storage team for that, it’s not my problem. I told you I’m a VirtGuy, I have my tools so I can check esxtop, vCenter etc…” as he then veered off into glorious delusions of grandeur of how he’d virtualized more servers than I’d had hot dinners. As fascinating as it was to hear him, it was at this point that my mind was side tracked into realizing that despite all the industry talk of ‘unified platforms’, ‘Apps, Servers & Storage as a Service’ i.e. the Cloud, the old challenge of bridging the gap between silos still had a long way to go.

Let’s face it Virtualization and the Cloud have brought unprecedented benefits but they’ve also brought challenges. One such challenge that is dangerously being
overlooked is that of the silos that exist within most IT infrastructures. Indeed it’s the silos that have led to the new phenomenon that is coined as, ‘The Virtual Stall’. The Virtual Stall was never an issue several years ago as Virtualization was happily adopted by Application owners to consolidate many of their ‘Crapplications’ that meant little or nothing to them and certainly didn’t carry the burden of a SLA. Storage teams were none the wiser as VM admins requested large capacities of storage for their VMFS and despite the odd performance problem no one was too bothered as these VMs rarely hosted Tier 1 Apps. With the advent of VDI, large VM backups and critical applications such as Exchange and SQL being virtualized, the ordeal of maintaining performance took root, resulting in the inevitable ‘blame game’ between silos. Fast forward to today and despite all the talk of Private Clouds, the fear factor of potential performance degradation resulting in the virtualization of mission critical applications has led to the ‘Virtual Stall’.

Business and Management have been convinced of the benefits of consolidation, reduction in data foot print, power/coolin
g etc. that they initially saw with the virtualization of their low tier applications. This has led them to want more of the same for higher end applications leading to what many organiz
ations are terming a ‘VMware First’ policy. Under pressure from them the silo of the application owners still don’t have a true understanding of server virtualization and hence are reluctant for their Tier 1 apps to be migrated from their physical platforms. At best they may accept two mission critical VMs on a physical server. Under pressure to prove the Application owners wrong and maintain the performance of virtualized applications, the silo of the VMware administrators will often over-provision from their pool of Memory, CPU and storage resources. Furthermore the VM Admin silo also lack a real understanding of Storage and at best will think in terms of capacity for their VMFS stores, while Storage Admin will think in terms of IOPS. As this lack of understanding and communication between the silos exists and grows so too do the challenges of making the most of the benefits of server virtualization.

One of the key mistakes is that it’s often over looked that whether on a virtualized or non-virtualized platform, application performance is heavily affected by its underlying storage infrastructure. The complexity of correctly configuring storage in accordance to application demands can range from deciding the right RAID level, number of disks per LUN, array cache sizes to the correct queue depth and fan-in / fan-out ratio. These and other variables can drastically influence how I/O loads are handled and ultimately how applications respond. With virtualized environments the situation is no different, with Storage related problems often being the cause of most VMware infrastructure mis-configurations that inadvertently affect performance.

Even with the option of Raw Device Mapping, the alternative for VMware storage configuration, VMFS is often the most preferred due to its immediate advantages in terms of provisioning and zoning. In this method several Virtual machines are able to access the same LUN or a pool of LUNs. This becomes far more simplistic as opposed to a one to one mapping ratio that is required for each LUN for each Virtual Machine with the RDM option. Additionally this makes backups far easier as the VMFS for the given Virtual Machines need only be dealt with instead of numerous individual LUNs that are mapped to many Virtual Machines. VMFS volumes can be as big as 2TB and with the concatenation of additional partitions which are termed VMFS extents, this can then be as large as 64TB i.e. 32 extents. With a Storage Admin unaware of such distinctions within VMware, it’s easy to also be unaware of the best practices with extents, such as creating these on new physical LUNs to facilitate additional LUN queues or throughput congestion. Coupled with this, if extents are not assigned the same RAID and disk type you quickly fall into a quagmire of horrendous performance problems. In fact it can be pointed out that the majority of VMware performance problems are in fact initiated at the beginning of the provisioning process or even earlier at the design phase and are a result of the distance between the silos.

As mentioned already application owners will pressure VM administrators to overprovision Memory and CPU to avoid any potential application slowdowns, while the VM administrator will falsely think along the lines of capacity for their VMFS in terms of Storage. At best a VM Admin may request the RAID level and the type of Storage e.g. 15K RPM FC disks but it is here that the discrepancy arises for the Storage administrator. The Storage Admin, used to provisioning LUNs on the basis of application requirements, will instead not be thinking of capacity but rather in terms of IOPS and RAID levels. Eventually though as there is no one to one mapping and the requested LUN is to be merely added to a VMFS, the storage administrator, not wishing to be the bottleneck of the process, will proceed to add the requested LUN to the pool. Herein is also the source of a lot of eventual performance problems as overtly busy LUNs begin to affect all of their aligned virtual machines as well as those that share the same datastore. Moreover if the LUN is part of a very busy RAID group on the backend of the storage array, such saturated I/O will impact all of the related physical spindles and hence all of the LUNS they share. What needs to be appreciated is that the workload of individual applications presented to individual volumes will be significantly different to that of multiple applications being consolidated onto a single VMFS volume. The numerous I/Os of multiple applications alone even if sequential, will push the Storage array to deal with these numerous requests as random, thus requiring different RAID level, LUN layout, cache capacity etc. considerations than those for individual applications.

Once these problems exist there is a customary troubleshooting procedure that VM and Storage administrators often follow which take from the metrics found in vCenter, esxtop, vscsiStats, IOMeter, Solaris IOSTAT, PerfMON and the Array management tool. This somewhat laborious process usually includes measuring the effective bandwidth and resource consumption between the VM and storage, moving and using other paths between the VMs and storage and even reconfiguring cache and RAID levels. To have even got to this point days if not weeks would have been spent in checking for excessive LUN and RAID group demands, understanding the VMFS LUN layout on the backend of the storage’s physical spindles, investigating the array’s front end, cache and processor utilization as well as bottlenecks on the ESX host ports. Some may even go to the lengths of playing around with the Queue Depth settings, which without an accurate insight is at best a guessing game based on rule of thumb. Despite all of these measures there is still no guarantee that this will identify or eliminate the performance issues, leaving VMware to be erroneously blamed as the cause or that the application is ‘unfit’ to be virtualized. Ironically so many of these problems could have been proactively avoided had there been a better understanding and communication between the silos in the design and provision phase.

While it could be argued that Application, Server / VM and Storage teams all have their own expertise and should stick to what they know, in today’s unified Cloud-driven climate remaining in a bat cave of ignorance justified by the knowledge that you’re an expert in your own field is nothing short of disastrous. Application owners, VMware and Storage Admin have to sit and communicate with each other and destroy the first barrier erected by silos i.e. knowledge sharing. This does not require that a Storage Admin set up a DRS cluster or a VM Admin start provisioning LUNs but what it does mean is that as projects roll out a common understanding of the requirements and the challenges be understood. As the technology brings everything into one stack with vStorage APIs, VAAI and terminology such as orchestration that describe single management panes which allow you to provision VMs and their Storage with a few clicks, the need for the ‘experts’ of their field to sit and share their knowledge has never been greater. Unless the challenge of breaking the silos is addressed we could be seeing Kate Bush’s premonition of Cloudbursting sooner than we think.


The True Optimum Queue Depth for VMware / vSphere

An array’s Queue Depth in its most basic terms is the physical limit of exchanges that can be open on a storage port at any one time. The Queue Depth setting on the HBA will specify how many exchanges can be sent to a LUN at one time. Generally most VM Admins leave their Queue Depth settings at the manufacturer’s default with only the requirement to facilitate a small number of I/O intensive VMs/servers leading them to make an increase. The risk with changing or in fact not changing Queue Depths to their optimum can have severe detrimental effects on performance where any outstanding I/O queuing can cause bottlenecks. For example if Queue Depth settings are set too high the Storage ports will quickly become overrun or congested leading to poor application and VM performance or even worse data corruption or loss. Alternatively if Queue Depth settings are set too low, the Storage ports become underutilized thus leading to poor SAN efficiency. On the other hand should the Queue Depth be correctly optimized, performance of VMs and their corresponding LUNs can be vastly improved, hence the requirement for a methodology to accurately determine this is an imperative.

Generally VM Admins use esxtop to check for I/O Queue Depths and latency with the QUED column showing the queuing levels. With VirtualWisdom though, end users are now empowered with the only platform that can measure real-time aggregated queue depth regardless of storage vendor or device i.e. in a comprehensive manner that takes into consideration the whole process from Initiator to Target to LUN. VirtualWisdom’s unique ability to do this ensures accurately that storage ports are optimized for maximum application health, performance, and SAN efficiency.

So to begin with it is important to prevent the storage port from being over-run by considering both the number of servers that are connected to it as well as the number of LUNs it has available. By knowing the number of exchanges that are pending at any one time it is possible to manage the storage Queue Depths.

In order to properly manage the storage Queue Depths one must consider both the configuration settings at the host bus adapter (HBA) in a server and the physical limits on the storage arrays. It is important to determine what the Queue Depth limits are for each storage array. All of the HBAs that access a storage port must be configured with this limit in mind. Some HBA vendors allow setting HBA and LUN level Queue Depths, while some allow HBA level setting only.

The default value for the HBA can vary a great deal by manufacturer and version and are often set higher than what is optimal for most environments. If you set the queue depths too low on the HBA it could significantly impair the HBA’s performance and lead to under utilization of the capacity on the storage port (i.e. underutilizing storage resources). This occurs both because the network will be underutilized and the storage system will not be able to take advantage of its caching and serialization algorithms that greatly improve performance. Queue Depth settings on HBAs can also be used to throttle servers so that the most critical servers are allowed greater access to the necessary storage and network bandwidth.

To deal with this the initial step should be to baseline the Virtual environment to determine which servers already have their optimal settings and which ones are either set too high or too low. Using VirtualWisdom real time Queue Depth utilization can be reported for a given period. Such a report will show all of the initiators and the maximum queue depths that were recorded during the recording period. This table can be used as a method to compare the settings on the servers to the relative values of the applications that they support. The systems that are most critical should be set to higher Queue Depths than those that are less critical, however Queue Depth settings should still be within the vendor specified range. Unless Storage ports have been dedicated to a server, VirtualWisdom often shows that optimum Queue Depth settings should be between the ranges of 2-8, despite industry defaults tending to be between 32-256. To explain this further, VirtualWisdom can drill off a report that can show in descending order the Maximum Pending Exchanges and their corresponding initiators and server names. The Maximum Pending Exchanges are not only the maximum number of exchanges pending during the interval being recorded but also the exchanges that were opened in previous intervals that have not yet closed.

So for example if a report such as this was produced for 100 ESX servers it’s important to consider whether your top initiators are hosting your highest priority applications and whether your initiators with low queue depth settings are hosting your lowest priority applications. Once the appropriate Queue Depth settings have been determined, an alarm can be created for any new HBAs that are added to the environment, especially any HBA that violates the assigned Queue Depth policy.

Once this is established the VirtualWisdom dashboard can be then be used to ensure that the combined Pending Exchanges from all of the HBAs are well balanced across the array and SAN fabric.

Making SAN Cheaper than NAS

Are you making the most of your FC SAN?

There is a common myth that FC SAN is expensive, difficult to manage and troubleshoot. Coupled with this there are heavily marketed agendas to move customers away from FC SAN to new and allegedly more cost effective solutions.
But what if there was a way to…

• Reduce the amount of physical adapters, FC cables, SAN ports and Storage ports while concurrently improving your application response time, availability and SLAs?

• Simplify server FC I/O provision, enabling a more agile, scalable and dynamic deployment model?

• Gain the insight that on average most FC SAN and Storage ports are greatly underutilized averaging 5-10%, with only a minority of them needing their full bandwidth and therefore enabling you to drive higher levels of resource utilization and performance out of an already deployed solution?

• Know your FC SAN Storage related problems before they occur enabling you to transform your environment from a reactive to proactive one?

• Reduce the time it takes for troubleshooting your FC SAN Storage environment from days to minutes?

• Reduce the OPEX and CAPEX of your existent FC SAN Storage infrastructure such that it’s deemed not only the most reliable and secure but also the most cost effective solution of your environment?

I'm absolutely delighted to have the opportunity to discuss and answer some of these issues with ex-EMC vSpecialist and VM Guru Stephen Spellicy, in an upcoming webinar on the 30th of March. i welcome all to partake in the Q and A session and discussion as both Stephen and I showcase the way for a complete solution to optimize and consolidate your FC connectivity while ensuring the reliability and performance you have come to know and love with Fibre Channel. Find out how to also improve resource utilization, deployment and ongoing management of your FC connectivity while easily accessing and analyzing critical application-to-storage transaction data that allows you to pinpoint latency down to the millisecond.

Registration is free and available here:
http://info.virtualinstruments.com/webinar-fc-san-myths.html

In the meantime here's an insight into the exciting new technology that is IOV:

CRC Errors, Code Violation Errors, Class 3 Discards & Loss of Sync - Why Storage isn't Always to Blame!


Storage is often

automatically pinpointed as the source of all problems. From System Admins, DBAs, Network guys to Application owners, all are quickly ready to point the figure at SAN Storage given the slightest hint of any performance degradation. Not really surprising though, considering it’s the common denominator amongst all silos. On the receiving end of this barrage of accusation is the SAN Storage team, who are then subjected to hours of troubleshooting only to prove that their Storage wasn’t responsible. On this circle goes until there reaches a point when the Storage team are faced with a problem that they can’t absolve themselves of blame, even though they know the Storage is working completely fine. With array-based management tools still severely lacking in their ability to pinpoint and solve storage network related problems and with server based tools doing exactly that i.e. looking at the server, there really is little if not nothing available to prove that the cause of latency is a slow draining device such as a flapping HBA, damaged cable or failing SFP. Herein lies the biggest paradox in that 99% of the time when unidentifiable SAN performance problems do occur, they are usually linked to trivial issues such as a failing SFP. In a 10,000 port environment, the million dollar question is ‘where do you begin to look for such a miniscule needle in such a gargantuan haystack?’

To solve this dilemma it’s imperative to know what to look for and have the right tools to find them, enabling your SAN storage environment to be a proactive and not a reactive fire-fighting / troubleshooting circus. So what are some of the metrics and signs that should be looked for when the Storage array, application team and servers all report everything as fine yet you still find yourself embroiled in performance problems?

Firstly to understand the context of these metrics / signs and the make up of FC transmissions, let’s use the analogy of a conversation. Firstly the Frames would be considered the words, the Sequences the sentences and an Exchange the conversation that they are all part of. With that premise it is important to first address the most basic of physical layer problems, namely Code Violation Errors. Code Violation Errors are the consequence of bit errors caused by corruption that occur in the sequence – i.e. any character corruption. A typical cause of this would be a failing HBA that would eventually start to suffer from optic degradation prior to its complete failure. I also recently experienced at one site Code Violation Errors when several SAN ports had been left enabled after their servers had been decommissioned. Some might think what’s the problem if they have nothing connected to them? In fact this scenario was creating millions of Code Violation Errors causing a CPU overhead on the SAN switch and subsequent degradation. With mission critical applications connected to the same SAN switch, performance problems became rife and without the identification of the Code Violation Errors could have led to weeks of troubleshooting with no success.

The build up of Code Violation Errors become even more troublesome as they eventually lead to what is referred to as a Loss of Sync. A Loss of Sync is usually indicative of incompatible speeds between points and again this is typical of optic degradation in the SAN infrastructure. For example if an SFP is failing, its optic signal will degrade and hence will not be at for example the 4Gbps it’s set at. Case point: a transmitting device such as a HBA is set at 4Gbps while the receiving end i.e. the SFP (unbeknownst to the end user) has degraded down to 1Gbps. Severe performance problems will occur as the two points constantly struggle with their incompatible speeds. Hence it’s an imperative to be alerted of any Loss of Sync as ultimately they are also an indication of an imminent Loss of Signal i.e. when the HBA or SFP are flapping and are about to fail. This leads to the nightmare scenario of an unplanned path failure in your SAN storage environment and worse still a possible outage if failover cannot occur.

One of the biggest culprits and a sure-fire hit to resolving performance problems is to look for what are termed CRC errors. CRC Errors usually indicate some kind of physical problem within the FC link and are indicative of code violation errors that have led to consequent corruption inside the FC data frame. Usually caused by a flapping SFP or a very old / bent / damaged cable, once CRC errors are acknowledged by the receiver, the receiver would reject the request leaving the Frame having to be resent. For example as an analogy imagine a newspaper delivery boy, who while cycling to his destination loses some of the pages of the paper prior to delivery. Upon delivery the receiver would request for the newspaper to be redelivered with the missing pages. This would entail the delivery boy having to cycle back to find the missing pages and bring back the newspaper as a whole. In the context of a CRC error a Frame that should typically take only a few milliseconds to deliver could take up to 60 seconds in being rejected and resent. Such response times can be catastrophic to a mission critical application and it’s underlying business. By gaining an insight into CRC errors and their root cause one can immediately pinpoint which bent cable or old SFP is responsible and proactively replace them long before they start to cause poor application response times or even worse a loss to your business.

The other FC SAN gremlin is what is termed a Class 3 discard. Of the various services of data transport defined by the Fibre Channel ANSI Standard, the most commonly used is Class 3. Ideal for high throughput, Class-3 is essentially a datagram service based on frame switching and is a connectionless service. Class 3’s main advantage comes from not giving an acknowledgement that a frame has been rejected or busied by a destination device or Fabric. The benefits of this are that it firstly significantly reduces the overhead on the transmitting device and secondly allows for more bandwidth availability for transmission which would otherwise be reduced. Furthermore the lack of acknowledgements removes the potential delays between devices caused by round-trips of information transfers. As for data integrity, Class 3 Flow control has this handled by higher-level protocols such as TCP due to Fibre Channel not checking the corrupted or missing frames. Hence any discovery of a corrupted packet by the higher-level protocol on the receiving device instantly initiates a retransmission of the sequence. All of this sounds great until the non-acknowledgement of rejected frames starts to also bring about Class 3’s disadvantage. This is that inevitably a Fabric will become busy with traffic and will consequently discard frames, hence the name Class 3 discards. Due to this the receiving device’s higher-level protocol’s subsequent request for retransmission of sequences will then degrade the device and fabric throughput.

Another indication of Class 3 discards are zoning conflicts where a frame has been transmitted and cannot reach a destination, hence concluding in the SAN initiating a Class 3 discard. This is caused by either legacy or zoning mistakes where for example a decommissioned Storage system was not unzoned from a server or vice versa leading to continuous frames being discarded and degraded throughput as sequences are retransmitted. This then results in performance problems, potential application degradation and automatic finger pointing at the Storage System for a problem that can’t automatically be identified. By resolving the zoning conflict and spreading the load of the SAN throughput across the right ports, the heavy traffic or zoning issues which cause the Class 3 discards can be quickly removed bringing immediate performance and throughput improvements. By gaining an insight into the occurrence and amount of Class 3 discards, huge performance problems can be quickly remediated before they occur and thus another reason as to why the Storage shouldn’t automatically be blamed.

These are just some of the metrics / signs to look for which can ultimately save you from weeks of troubleshooting and guessing. By first acknowledging these metrics, identifying when they occur and proactively eliminating them, the SAN storage environment will quickly evolve and transform into a healthy, proactive and optimized one. Furthermore by eliminating each of these issues you also empower yourself by eliminating their consequent problems such as application slowdown, poor response times, unplanned outages and long drawn out troubleshooting exercises which eventually lead to fingerpointing fights. Ideally what will occur is a paradigm shift where instead of application owners complaining to the Storage team, the Storage team will proactively identify problems prior to their existence. Here lies the key to making the ‘always blaming the Storage’ syndrome a thing of the past.

Undressing Victoria – Could Hitachi's New VSP Rock the EMC boat?



Back in 2004 HDS launched the USP, which was then followed by the great but not so radically different USP-V in 2007. Within that same time frame, HDS’ main rival in the Enterprise Storage market EMC, busily went about launching the Symmetrix DMX-3, then the DMX-4 and most recently the VMAX. Launching so-called revolutionary features such as FAST, (which HDS had been doing previously for years i.e. Tiered Storage Manager) EMC’s marketing machine quickly created an atmosphere wherein the Storage World became obsessed with all things ‘V’ namely VSphere, VMAX and VPLEX. With marketing so powerful that it extended to international airport posters advertising EMC’s ability to ‘take you to the Private Cloud’, you could easily forgive Hitachi for possibly becoming complacent and content with being a company renowned by the masses for just making great vacuum cleaners. Well thank goodness, after three years in the making, a codename of Victoria, a semi-decent marketing campaign and a ‘V’ to its final name, HDS have at last launched the new VSP Enterprise array…and yes it’s been worth the wait.

Marketed as a 3D scaling storage system, it was pleasing to realize that it wasn’t a reference to the tinted glasses needed to look at its rather revolting vomit green cabinet. (So yes it certainly can’t compare to the Knight Rider looks of the VMAX and probably won’t be appearing in an episode of ‘24’). Aesthetics aside and more importantly though the 3D refers to the terms scale up, scale out and scale deep. What HDS mean by this is that you can scale up by adding more resources to the VSP system, you can scale out by adding more disk blocks, host connections and Virtual Storage Directors, and you can scale deep by virtualising external heterogenous arrays behind the VSP. From this premise it’s also evident that HDS are looking at the VSP to be the foundational block of the recently announced but yet to be released cloud platform, the UCP.

While the scale deep is an old tradition that HDS have mastered for years, it’s easy to note that the scale out and Virtual Storage Directors terms bear more than a passing resemblance to the concept introduced by EMC’s VMAX. With four Virtual Storage Directors in each system and with four cores within each Virtual Storage Director the VSP houses a total of 16 cores. Essentially the masterminds of the machine, Virtual Storage Directors are responsible for managing the VSPs internal operations such as mapping and partitioning. The VSP can then be expanded into a mammoth system of 32 Cores by combining two VSP systems using the PCIe Hitachi Data Switch, scaling up to 2048 drives with a Terabyte of cache. So while an EMC aficionado may immediately point out that the VMAX can offer 128 cores, which dwarfs the VSP’s 32 Cores, it’s worth remembering that with Storage Virtualization the number of cores that can potentially be housed behind the VSP are in the hundreds.

Another point, there is no equivalent to the USPVM - the mini-me USPV which couldn’t scale up to the size of its big brother. Instead the VSP starts as a single pair of Virtual Storage Directors with no internal storage that can act as a pure virtualization platform to homogenize externally attached multi-vendor arrays. With such a proposition, one can just imagine the quivering of DS8000s, VMAXs, Clariions and EVAs ‘confettied’ within datacenters now faced with the prospect of being marginalized as a portion of the potential 255PB of LUNs that can sit behind the VSP’s Directors.

Of course this is also a great sales pitch to eventually get the same VSP stacked up later with internal storage that can range from 256 SSDs in either STEC’s 200GB 2.5-inch or 400GB 3.5-inch format as well as up to 2.5PB of 3.5-inch SATA drives. Add to that HDS have taken the pioneering route of adopting the capability to house up to 1.2 PB of SAS 2.5-inch drives. Yes, that’s right the HDS VSP has a SAS backend and it’s ready to have a 6Gbps SAS interface. While I’m no fan of SSDs sitting on the backend of a Storage system behind a RAID controller, processors, SAN switches etc. (can’t wait for DRM to hit the mainstream market), nevertheless a full duplex SAS backend is a definite improvement in taking advantage of the IOPs and throughput capability of SSDs. With up to 128 paths out to the disks and solid state drives, HDS are calling this switching fabric the Grid Switch Layer. Of course when you add in the idea of 2.5 inch drives using less power, increase in IOPS due to a higher spindle count and a reduction of one less cabinet on your datacenter floor, you suddenly see a nice ROI figure being mustered up by your local HDS account manager. Expect EMC and co. to follow suit.

Also gone are those somewhat prehistoric battery backups that resided in the USPV and were legacy from the USP. Instead you will find between the aforementioned Grid Switch layer and the back end enclosures that the VSP hosts an extra layer of cache. This feature eliminates the need for the old battery backups. Instead the Virtual Storage Director’s data is stored in this cache and de-staged to solid state memory in the event of a power loss etc. hence ensuring data protection. It’s a simple idea but a welcome one for field engineers who can vouch for the pain of having to replace one of those battery packs. Indeed other legacy complications have been reduced due to the fact that the Control Memory (still responsible for all the metadata of the VSP’s operations) is now located on Virtual Storage Director boards and DIMMs, removing the requirement of separate dedicated Shared Memory and Control Memory boards.

Furthermore despite having borrowed the VMAX concept of coupling engines as well as using Intel processors for their Virtual Storage Directors, HDS have still retained a unique stamp by forsaking the Rapid IO interconnects chosen by EMC for their much more familiar Star Fabric architecture. So unlike EMC’s complete overhaul of their Direct Matrix architecture, HDS have maintained their non-blocking crossbar architecture switch to the back-end while having their global cache shared amongst multiple controllers. This familiar HDS method is the internal network of the VSP that manages its data via the Drives, Virtual Storage Directors, BEDs, FEDs and Cache.

So while HDS have inadvertently acknowledged EMC’s insight to go the Intel route they’ve also seemingly taken a leaf out of VMware’s DRS book by having custom I/O routing ASICs. Point being that on both the FEDs and BEDs of the VSP, data accelerator ASICs designed by Hitachi themselves, have now been built for managing the I/O traffic. Unlike the USPV where the ACP and CHP processors were tied to particular ports, the VSP instead makes a resource pool of CPU from which the ASICs can then assign to any front end or back end port that requires them at any given time. Personally I think this is a fantastic idea and step forward as it quickly eliminates a lot of the performance tuning that was previously required to get the same effect. With such a VMware-esque feature it’s somewhat ironic then that the VSP doesn’t yet support VAAI, although news is that it’s coming very soon.

Another ground-breaking step and one I’m most excited about is the VSP’s new Sub Lun Tiering feature. Using the now (thanks partly to Marc Farley’s terrific YouTube rant) infamous HDS 42MB page size, new policy based tiering will instead work on the page level instead of the LUN. Hence as a particular page becomes more or less active or “hot”, the VSP will automatically upgrade or downgrade the tier for that page only, regardless of whether it’s on external or internal storage. The objective here is pretty clear – an attempt to optimize your usage of SSDs so you can justify buying more of them. Also ironically what was once considered HDS’ Achilles heel with regards to storage efficiency, the 42MB page size now works out to be ideal. Imagine the nightmares of a smaller page size - valuable Storage Processors’ CPU utilized in the desperate search for numerous 50Kb page sizes that heat up and need to be moved up to tier 0; not a pretty thought. As this feature is sure to be emulated by other vendors it will be interesting to see what page sizes they’ll be coming up with.

Also speaking of other vendors, HP who recently achieved the takeover of the year with their purchase of 3PAR has also launched the VSP albeit with a much nicer cabinet and the OEM moniker of P9500. What is interesting here is that the P9500 (VSP) is clearly a higher range platform than the InServ arrays and if indications are correct HP have no intention of disbanding their EVA range (reports have already surfaced of an EVA now called P6000). So with the OEM deal still intact, HP currently has every intention of also marketing and pushing forward the VSP / P9500. Indeed while at a meeting at one of HP’s headquarters during the week of the P9500’s release I was delightfully told of the P9500 amazing APEX functionality. APEX sounded incredible as I was told of an application-level QOS control, which would give Pillar’s similar feature a run for their money. Strange then that I hadn’t heard of any such feature during the HDS launch. Upon further reading of APEX, it was explained that mission-critical data could be given bandwidth priority over less important data. It was then I suddenly realized something familiar. This was nothing but a remarketed version of HDS’ Server Priority Manager’s functionality which had been around for years (you’ve probably never heard of it because of HDS’ poor marketing but it’s actually very good). In fact the only uniqueness of APEX is that for HPUX platforms it does indeed allow the prioritization of CPU, cache and storage resources. So not really that significant a differentiator from the VSP especially if you don’t run HPUX (and to be honest I think they’d have more success pitching how much nicer their cabinet looks). Nonetheless, differentiators or not, the addition of the P9500 to HP’s storage portfolio will only add further credence to their growing status of a Storage powerhouse.

Another welcome addition / change is the replacement of the demonically slow Storage Navigator management GUI in place for a much faster and greener looking GUI. HDS have also announced a whole new refurbishment of their Command Suite software. As well as being quicker and more user friendly there’s also better integration with VMware allowing you to manage storage for Virtual machines. A welcome change for a SRM that often looked and performed in an outdated manner that was not befitting of the array (I still have nightmares of carving up LDEVs on the USP pre Quick Format days).

So with new features still to be released such as integration with VMware’s VAAI, support for FCOE and primary deduplication, the VSP has come a long way from its predecessor the USPV. Taking the best from their competitors and integrating it with their own way of doing things is not a new concept for HDS and with the VSP they certainly have done that. But HDS now have a genuinely new product which surpasses the minor gap filled in between the USP and USPV that successfully incorporates its characteristic tools such as dynamic provisioning and virtualization with bleeding edge technology such as Sub Lun Tiering. There will be inevitable criticisms from competitors. There will be inevitable squabbles between the vendors. There will be inevitable comparisons between arrays. One thing’s for sure though, expect a lot of the VSP’s new features to be incorporated in other upcoming arrays pretty soon, Hitachi or not. In the words of Simon Cowell, “Glad to see them back in the game!”


N.B. I received a great explanation and post about APEX from Calvin Zito - also known as HPStorageGuy. He clarifies the fact that there is more of a distinction than what was originally posted by me - or in his words "Bottom line, there is no HDS equivalent of APEX" (-:
Here's the link: http://h30507.www3.hp.com/t5/Around-the-Storage-Block-Blog/Application-Performance-Extender-setting-the-record-straight/ba-p/83533#feedback-success

'Well Ours Goes to 8' - Why Going 8Gbps From 4Gbps Doesn't Necessitate Double the Bandwidth

A wise man once told me that if there were a major car crash further up the highway, having a faster car would only get me to the accident quicker. Obvious right? Not so it seems when the wisdom of these words is applied to the analogy of the growing number of SAN infrastructures currently upgrading from 4Gbps to 8Gps. ‘Faster means quicker, means better’ is the commonly heard sales pitch used to seduce vulnerable IT Directors who dream of ‘a guaranteed performance improvement that would solve the headache of their ever slowing applications’. Sadly though for many of those that bit the 8Gbps apple, the significant improvement never came and like a culprit with no shame the same voices returned claiming that this was the fault of the outdated servers, HBAs and storage systems which also now needed to be upgraded. So down the 8Gbps road they went which now extended from the fabric all the way to the server platform, but still no significant improvement and if so certainly not one that could justify such a heavy investment. Like any infrastructure, being unaware of the SAN inevitably means that any unseen problems caused by error statistics such as CRC errors, physical link errors, protocol errors, code violations, class 3 discards etc. (i.e. the car crash) would remain, regardless of whether you get there at 4Gbps or 8Gbps. So how could such a simple concept be lost amongst the numerous 4Gbps to 8Gbps upgrades that are now taking place across the SAN stratosphere?

The main reason is that there are clearly several seemingly instant advantages with the 8Gbps standard. Having one byte consisting of 8 bits, giving you a potential 800 MB per second gives you the immediate impression that you are able to potentially double the transmission of your data within the same single cable. Logic would then dictate that with both SAN switches and storage systems having 8Gbps ports, you also now have the freedom to double the number of hosts to a single storage port without the fear of any performance impact. Logic would also conclude that extra bandwidth would be a blessing in a virtual environment where dozens of VMs scramble for a limited number of ports while blade servers subsequently struggle to house the physical space for their growing HBA demands. Couple this with the ever-nearing cost equivalence to their 4Gbps component counterparts and such advantages become unavoidable choices for end users.

Indeed it’s the drive for ‘more throughput’ in this virtualisation era that has really kicked the 8Gbps juggernaut into top gear. Pre-virtualisation world, (which surprisingly wasn’t even that long ago yet already seems like an aeon) the relationship between server, application, SAN and storage were straightforward and one-dimensional. A single host with one application would connect to a dual redundant SAN fabric that in turn would be mapped to a single LUN. Today everything has multiplied, with a single physical server hosting numerous virtual servers and applications being connected to several storage interfaces and numerous LUNs.

Solutions such as N_Port ID Virtualization NPIV and N_Port Virtualization (NPV) have gone even further by enabling the virtualization of host and switch ports. Now via NPIV your single HBA can be termed an N_Port and consequently register multiple WWPNs and N_Port ID numbers. So now what was once just a single physical server can now house numerous virtual machines each with their own Port IDs, which in turn allows them to be independently zoned or mapped to LUNs. On the switch side, NPV presents the switch port as an NPIV host to the other switches. Hence expanding a SAN can be rapidly deployed without the burden of worrying about multiple domain IDs.

So while the case to upgrade to 8Gbps is on the offset quite compelling, further analysis would show that this isn’t necessarily the case. Reality and not logic shows that a lot of the aforementioned advantages have been related to ‘guess work’ and assumptions. Moreover and ironically the rush to 8Gbps is actually causing more problems than were previously existent within data centers unbeknownst to the majority of end users due to their inability to soundly monitor what’s happening in the SAN. To begin with if we revisit the concept of FC bit rates and their constant increase from 2Gbps to 4Gbps and now 8Gbps, one should be aware that the consequence is a proportionally decreasing bit period. Hence this now shrunken window of data requires an even more robust physical infrastructure than before and becomes even more susceptible to potential errors - think Michael Schumacher driving his Ferrari top speed on the same public road Morgan Freeman took Miss Daisy.

While you may not have had performance issues with 4Gbps, by upgrading to 8Gbps and its greater sensitivity to light budget you instantly expose yourself to more bit stream errors, bit-error rates and multiple retries i.e. delays, disruption and performance degradation of your mission critical applications. Of course this isn’t always the case but when FC cables are bent at 70 degrees or more, quality of optical transceivers / in-line connectors are not upgraded or small specks of dirt reside on the face of optical cable junctions, your environment suddenly becomes doubly susceptible to jitters and major errors on your SAN fabric. Factors which were previously transparent at 4Gbps become significant performance degraders in the highly sensitive mould of 8GBps.

So as organizations upgrade to 8Gbps without having taken these factors into consideration, we see countless troubleshooting and even HBA replacements as there is no real insight into these transmission errors from current SRM tools. Orange OM2 fiber-optic cables may get replaced for aqua OM3 fiber-optic cables and SFP transceivers swapped for SFP+ transceivers leaving administrators thinking they’ve solved the problem. Worst of all though such fire-fighting tactics often lead to a temporary elimination of performance problems, only to then without any explicable reason rear their ugly head like a persistent zombie from a horror flick that refuses to die.

Given the recent revelations in the industry that SAN fabrics are being over-provisioned on average by at least a factor of 5 times, there clearly is little reason for most companies to upgrade to 8Gbps. When all of your applications are receiving the bandwidth that only 5% of your applications actually need, going straight to 8Gbps leads to even poorer configuration and further waste. This scenario becomes even more complicated given the fact that server virtualization has led administrators to over- provision their SAN infrastructure in a fear that they can’t accommodate their bandwidth requirements. Also with an increase of SSDs being deployed in the majority of enterprise infrastructures, going up to 8Gbps seems a natural way of making the most of their expensive disk investment. Problem is that having SSDs running on upgraded yet over-provisioned links which already suffer from jitters may give some performance improvement over their mechanical disk counterparts but are hardly running at optimum levels.

To solve such a dilemma and gain the true benefits of an 8Gbps upgrade it’s important to have an instrument which captures both directions of every SCSI I/O transaction from start to finish on every link carrying your business-critical data. In a recent discussion with IBM’s DS8000 specialist Jens Wissenbach, it was agreed that the solution of deploying TAPs on all the key links within the data center is the only way to truly detect the number of light levels, signal quality, throughput metrics, latency and response times, as well as protocol violations. With such real-time visibility into your FC infrastructure the administrator can quickly determine if any of the applications are in actual need of an excess to 4Gbps or where in fact the performance problems are coming from whether that be a bent cable, a speck of dirt or an outdated SFP.

TAPs such as those provided by the company Virtual Instruments, will soon be the natural replacement for patch panels across all enterprise data centers. But their role could also be the tool that allows end users to provision their SAN links to properly accommodate their SSD and VMware requirements without over-provisioning and being blinded by performance degradation that is beyond the scope of their SRM tool. So as Fibre Channel vendors are planning to start rolling out 16Gbps products for next year and with the news that the standard for 32Gbps Fibre Channel is already being worked on, it’s imperative that such upgrades take place with the correct preparation so as to maximize the benefits of such an investment.