Undressing Victoria – Could Hitachi's New VSP Rock the EMC boat?

Posted by Archie Hendryx on Sunday, November 21, 2010

Back in 2004 HDS launched the USP, which was then followed by the great but not so radically different USP-V in 2007. Within that same time frame, HDS’ main rival in the Enterprise Storage market EMC, busily went about launching the Symmetrix DMX-3, then the DMX-4 and most recently the VMAX. Launching so-called revolutionary features such as FAST, (which HDS had been doing previously for years i.e. Tiered Storage Manager) EMC’s marketing machine quickly created an atmosphere wherein the Storage World became obsessed with all things ‘V’ namely VSphere, VMAX and VPLEX. With marketing so powerful that it extended to international airport posters advertising EMC’s ability to ‘take you to the Private Cloud’, you could easily forgive Hitachi for possibly becoming complacent and content with being a company renowned by the masses for just making great vacuum cleaners. Well thank goodness, after three years in the making, a codename of Victoria, a semi-decent marketing campaign and a ‘V’ to its final name, HDS have at last launched the new VSP Enterprise array…and yes it’s been worth the wait.

Marketed as a 3D scaling storage system, it was pleasing to realize that it wasn’t a reference to the tinted glasses needed to look at its rather revolting vomit green cabinet. (So yes it certainly can’t compare to the Knight Rider looks of the VMAX and probably won’t be appearing in an episode of ‘24’). Aesthetics aside and more importantly though the 3D refers to the terms scale up, scale out and scale deep. What HDS mean by this is that you can scale up by adding more resources to the VSP system, you can scale out by adding more disk blocks, host connections and Virtual Storage Directors, and you can scale deep by virtualising external heterogenous arrays behind the VSP. From this premise it’s also evident that HDS are looking at the VSP to be the foundational block of the recently announced but yet to be released cloud platform, the UCP.

While the scale deep is an old tradition that HDS have mastered for years, it’s easy to note that the scale out and Virtual Storage Directors terms bear more than a passing resemblance to the concept introduced by EMC’s VMAX. With four Virtual Storage Directors in each system and with four cores within each Virtual Storage Director the VSP houses a total of 16 cores. Essentially the masterminds of the machine, Virtual Storage Directors are responsible for managing the VSPs internal operations such as mapping and partitioning. The VSP can then be expanded into a mammoth system of 32 Cores by combining two VSP systems using the PCIe Hitachi Data Switch, scaling up to 2048 drives with a Terabyte of cache. So while an EMC aficionado may immediately point out that the VMAX can offer 128 cores, which dwarfs the VSP’s 32 Cores, it’s worth remembering that with Storage Virtualization the number of cores that can potentially be housed behind the VSP are in the hundreds.

Another point, there is no equivalent to the USPVM - the mini-me USPV which couldn’t scale up to the size of its big brother. Instead the VSP starts as a single pair of Virtual Storage Directors with no internal storage that can act as a pure virtualization platform to homogenize externally attached multi-vendor arrays. With such a proposition, one can just imagine the quivering of DS8000s, VMAXs, Clariions and EVAs ‘confettied’ within datacenters now faced with the prospect of being marginalized as a portion of the potential 255PB of LUNs that can sit behind the VSP’s Directors.

Of course this is also a great sales pitch to eventually get the same VSP stacked up later with internal storage that can range from 256 SSDs in either STEC’s 200GB 2.5-inch or 400GB 3.5-inch format as well as up to 2.5PB of 3.5-inch SATA drives. Add to that HDS have taken the pioneering route of adopting the capability to house up to 1.2 PB of SAS 2.5-inch drives. Yes, that’s right the HDS VSP has a SAS backend and it’s ready to have a 6Gbps SAS interface. While I’m no fan of SSDs sitting on the backend of a Storage system behind a RAID controller, processors, SAN switches etc. (can’t wait for DRM to hit the mainstream market), nevertheless a full duplex SAS backend is a definite improvement in taking advantage of the IOPs and throughput capability of SSDs. With up to 128 paths out to the disks and solid state drives, HDS are calling this switching fabric the Grid Switch Layer. Of course when you add in the idea of 2.5 inch drives using less power, increase in IOPS due to a higher spindle count and a reduction of one less cabinet on your datacenter floor, you suddenly see a nice ROI figure being mustered up by your local HDS account manager. Expect EMC and co. to follow suit.

Also gone are those somewhat prehistoric battery backups that resided in the USPV and were legacy from the USP. Instead you will find between the aforementioned Grid Switch layer and the back end enclosures that the VSP hosts an extra layer of cache. This feature eliminates the need for the old battery backups. Instead the Virtual Storage Director’s data is stored in this cache and de-staged to solid state memory in the event of a power loss etc. hence ensuring data protection. It’s a simple idea but a welcome one for field engineers who can vouch for the pain of having to replace one of those battery packs. Indeed other legacy complications have been reduced due to the fact that the Control Memory (still responsible for all the metadata of the VSP’s operations) is now located on Virtual Storage Director boards and DIMMs, removing the requirement of separate dedicated Shared Memory and Control Memory boards.

Furthermore despite having borrowed the VMAX concept of coupling engines as well as using Intel processors for their Virtual Storage Directors, HDS have still retained a unique stamp by forsaking the Rapid IO interconnects chosen by EMC for their much more familiar Star Fabric architecture. So unlike EMC’s complete overhaul of their Direct Matrix architecture, HDS have maintained their non-blocking crossbar architecture switch to the back-end while having their global cache shared amongst multiple controllers. This familiar HDS method is the internal network of the VSP that manages its data via the Drives, Virtual Storage Directors, BEDs, FEDs and Cache.

So while HDS have inadvertently acknowledged EMC’s insight to go the Intel route they’ve also seemingly taken a leaf out of VMware’s DRS book by having custom I/O routing ASICs. Point being that on both the FEDs and BEDs of the VSP, data accelerator ASICs designed by Hitachi themselves, have now been built for managing the I/O traffic. Unlike the USPV where the ACP and CHP processors were tied to particular ports, the VSP instead makes a resource pool of CPU from which the ASICs can then assign to any front end or back end port that requires them at any given time. Personally I think this is a fantastic idea and step forward as it quickly eliminates a lot of the performance tuning that was previously required to get the same effect. With such a VMware-esque feature it’s somewhat ironic then that the VSP doesn’t yet support VAAI, although news is that it’s coming very soon.

Another ground-breaking step and one I’m most excited about is the VSP’s new Sub Lun Tiering feature. Using the now (thanks partly to Marc Farley’s terrific YouTube rant) infamous HDS 42MB page size, new policy based tiering will instead work on the page level instead of the LUN. Hence as a particular page becomes more or less active or “hot”, the VSP will automatically upgrade or downgrade the tier for that page only, regardless of whether it’s on external or internal storage. The objective here is pretty clear – an attempt to optimize your usage of SSDs so you can justify buying more of them. Also ironically what was once considered HDS’ Achilles heel with regards to storage efficiency, the 42MB page size now works out to be ideal. Imagine the nightmares of a smaller page size - valuable Storage Processors’ CPU utilized in the desperate search for numerous 50Kb page sizes that heat up and need to be moved up to tier 0; not a pretty thought. As this feature is sure to be emulated by other vendors it will be interesting to see what page sizes they’ll be coming up with.

Also speaking of other vendors, HP who recently achieved the takeover of the year with their purchase of 3PAR has also launched the VSP albeit with a much nicer cabinet and the OEM moniker of P9500. What is interesting here is that the P9500 (VSP) is clearly a higher range platform than the InServ arrays and if indications are correct HP have no intention of disbanding their EVA range (reports have already surfaced of an EVA now called P6000). So with the OEM deal still intact, HP currently has every intention of also marketing and pushing forward the VSP / P9500. Indeed while at a meeting at one of HP’s headquarters during the week of the P9500’s release I was delightfully told of the P9500 amazing APEX functionality. APEX sounded incredible as I was told of an application-level QOS control, which would give Pillar’s similar feature a run for their money. Strange then that I hadn’t heard of any such feature during the HDS launch. Upon further reading of APEX, it was explained that mission-critical data could be given bandwidth priority over less important data. It was then I suddenly realized something familiar. This was nothing but a remarketed version of HDS’ Server Priority Manager’s functionality which had been around for years (you’ve probably never heard of it because of HDS’ poor marketing but it’s actually very good). In fact the only uniqueness of APEX is that for HPUX platforms it does indeed allow the prioritization of CPU, cache and storage resources. So not really that significant a differentiator from the VSP especially if you don’t run HPUX (and to be honest I think they’d have more success pitching how much nicer their cabinet looks). Nonetheless, differentiators or not, the addition of the P9500 to HP’s storage portfolio will only add further credence to their growing status of a Storage powerhouse.

Another welcome addition / change is the replacement of the demonically slow Storage Navigator management GUI in place for a much faster and greener looking GUI. HDS have also announced a whole new refurbishment of their Command Suite software. As well as being quicker and more user friendly there’s also better integration with VMware allowing you to manage storage for Virtual machines. A welcome change for a SRM that often looked and performed in an outdated manner that was not befitting of the array (I still have nightmares of carving up LDEVs on the USP pre Quick Format days).

So with new features still to be released such as integration with VMware’s VAAI, support for FCOE and primary deduplication, the VSP has come a long way from its predecessor the USPV. Taking the best from their competitors and integrating it with their own way of doing things is not a new concept for HDS and with the VSP they certainly have done that. But HDS now have a genuinely new product which surpasses the minor gap filled in between the USP and USPV that successfully incorporates its characteristic tools such as dynamic provisioning and virtualization with bleeding edge technology such as Sub Lun Tiering. There will be inevitable criticisms from competitors. There will be inevitable squabbles between the vendors. There will be inevitable comparisons between arrays. One thing’s for sure though, expect a lot of the VSP’s new features to be incorporated in other upcoming arrays pretty soon, Hitachi or not. In the words of Simon Cowell, “Glad to see them back in the game!”

N.B. I received a great explanation and post about APEX from Calvin Zito - also known as HPStorageGuy. He clarifies the fact that there is more of a distinction than what was originally posted by me - or in his words "Bottom line, there is no HDS equivalent of APEX" (-:

Here's the link: http://h30507.www3.hp.com/t5/Around-the-Storage-Block-Blog/Application-Performance-Extender-setting-the-record-straight/ba-p/83533#feedback-success

'Well Ours Goes to 8' - Why Going 8Gbps From 4Gbps Doesn't Necessitate Double the Bandwidth

Posted by Archie Hendryx on Saturday, October 09, 2010

A wise man once told me that if there were a major car crash further up the highway, having a faster car would only get me to the accident quicker. Obvious right? Not so it seems when the wisdom of these words is applied to the analogy of the growing number of SAN infrastructures currently upgrading from 4Gbps to 8Gps. ‘Faster means quicker, means better’ is the commonly heard sales pitch used to seduce vulnerable IT Directors who dream of ‘a guaranteed performance improvement that would solve the headache of their ever slowing applications’. Sadly though for many of those that bit the 8Gbps apple, the significant improvement never came and like a culprit with no shame the same voices returned claiming that this was the fault of the outdated servers, HBAs and storage systems which also now needed to be upgraded. So down the 8Gbps road they went which now extended from the fabric all the way to the server platform, but still no significant improvement and if so certainly not one that could justify such a heavy investment. Like any infrastructure, being unaware of the SAN inevitably means that any unseen problems caused by error statistics such as CRC errors, physical link errors, protocol errors, code violations, class 3 discards etc. (i.e. the car crash) would remain, regardless of whether you get there at 4Gbps or 8Gbps. So how could such a simple concept be lost amongst the numerous 4Gbps to 8Gbps upgrades that are now taking place across the SAN stratosphere?

The main reason is that there are clearly several seemingly instant advantages with the 8Gbps standard. Having one byte consisting of 8 bits, giving you a potential 800 MB per second gives you the immediate impression that you are able to potentially double the transmission of your data within the same single cable. Logic would then dictate that with both SAN switches and storage systems having 8Gbps ports, you also now have the freedom to double the number of hosts to a single storage port without the fear of any performance impact. Logic would also conclude that extra bandwidth would be a blessing in a virtual environment where dozens of VMs scramble for a limited number of ports while blade servers subsequently struggle to house the physical space for their growing HBA demands. Couple this with the ever-nearing cost equivalence to their 4Gbps component counterparts and such advantages become unavoidable choices for end users.

Indeed it’s the drive for ‘more throughput’ in this virtualisation era that has really kicked the 8Gbps juggernaut into top gear. Pre-virtualisation world, (which surprisingly wasn’t even that long ago yet already seems like an aeon) the relationship between server, application, SAN and storage were straightforward and one-dimensional. A single host with one application would connect to a dual redundant SAN fabric that in turn would be mapped to a single LUN. Today everything has multiplied, with a single physical server hosting numerous virtual servers and applications being connected to several storage interfaces and numerous LUNs.

Solutions such as N_Port ID Virtualization NPIV and N_Port Virtualization (NPV) have gone even further by enabling the virtualization of host and switch ports. Now via NPIV your single HBA can be termed an N_Port and consequently register multiple WWPNs and N_Port ID numbers. So now what was once just a single physical server can now house numerous virtual machines each with their own Port IDs, which in turn allows them to be independently zoned or mapped to LUNs. On the switch side, NPV presents the switch port as an NPIV host to the other switches. Hence expanding a SAN can be rapidly deployed without the burden of worrying about multiple domain IDs.

So while the case to upgrade to 8Gbps is on the offset quite compelling, further analysis would show that this isn’t necessarily the case. Reality and not logic shows that a lot of the aforementioned advantages have been related to ‘guess work’ and assumptions. Moreover and ironically the rush to 8Gbps is actually causing more problems than were previously existent within data centers unbeknownst to the majority of end users due to their inability to soundly monitor what’s happening in the SAN. To begin with if we revisit the concept of FC bit rates and their constant increase from 2Gbps to 4Gbps and now 8Gbps, one should be aware that the consequence is a proportionally decreasing bit period. Hence this now shrunken window of data requires an even more robust physical infrastructure than before and becomes even more susceptible to potential errors - think Michael Schumacher driving his Ferrari top speed on the same public road Morgan Freeman took Miss Daisy.

While you may not have had performance issues with 4Gbps, by upgrading to 8Gbps and its greater sensitivity to light budget you instantly expose yourself to more bit stream errors, bit-error rates and multiple retries i.e. delays, disruption and performance degradation of your mission critical applications. Of course this isn’t always the case but when FC cables are bent at 70 degrees or more, quality of optical transceivers / in-line connectors are not upgraded or small specks of dirt reside on the face of optical cable junctions, your environment suddenly becomes doubly susceptible to jitters and major errors on your SAN fabric. Factors which were previously transparent at 4Gbps become significant performance degraders in the highly sensitive mould of 8GBps.

So as organizations upgrade to 8Gbps without having taken these factors into consideration, we see countless troubleshooting and even HBA replacements as there is no real insight into these transmission errors from current SRM tools. Orange OM2 fiber-optic cables may get replaced for aqua OM3 fiber-optic cables and SFP transceivers swapped for SFP+ transceivers leaving administrators thinking they’ve solved the problem. Worst of all though such fire-fighting tactics often lead to a temporary elimination of performance problems, only to then without any explicable reason rear their ugly head like a persistent zombie from a horror flick that refuses to die.

Given the recent revelations in the industry that SAN fabrics are being over-provisioned on average by at least a factor of 5 times, there clearly is little reason for most companies to upgrade to 8Gbps. When all of your applications are receiving the bandwidth that only 5% of your applications actually need, going straight to 8Gbps leads to even poorer configuration and further waste. This scenario becomes even more complicated given the fact that server virtualization has led administrators to over- provision their SAN infrastructure in a fear that they can’t accommodate their bandwidth requirements. Also with an increase of SSDs being deployed in the majority of enterprise infrastructures, going up to 8Gbps seems a natural way of making the most of their expensive disk investment. Problem is that having SSDs running on upgraded yet over-provisioned links which already suffer from jitters may give some performance improvement over their mechanical disk counterparts but are hardly running at optimum levels.

To solve such a dilemma and gain the true benefits of an 8Gbps upgrade it’s important to have an instrument which captures both directions of every SCSI I/O transaction from start to finish on every link carrying your business-critical data. In a recent discussion with IBM’s DS8000 specialist Jens Wissenbach, it was agreed that the solution of deploying TAPs on all the key links within the data center is the only way to truly detect the number of light levels, signal quality, throughput metrics, latency and response times, as well as protocol violations. With such real-time visibility into your FC infrastructure the administrator can quickly determine if any of the applications are in actual need of an excess to 4Gbps or where in fact the performance problems are coming from whether that be a bent cable, a speck of dirt or an outdated SFP.

TAPs such as those provided by the company Virtual Instruments, will soon be the natural replacement for patch panels across all enterprise data centers. But their role could also be the tool that allows end users to provision their SAN links to properly accommodate their SSD and VMware requirements without over-provisioning and being blinded by performance degradation that is beyond the scope of their SRM tool. So as Fibre Channel vendors are planning to start rolling out 16Gbps products for next year and with the news that the standard for 32Gbps Fibre Channel is already being worked on, it’s imperative that such upgrades take place with the correct preparation so as to maximize the benefits of such an investment.

Cloud Wars: The 3PAR Strikes Back

Posted by Archie Hendryx on Monday, September 06, 2010

A long time ago (November 2009 to be precise), in a Cloud far far away, the Rebel Alliance of EMC, Cisco and VMware joined forces to form what are now dubbed Acadia and the VCE coalition. Soon after came the launches of VBlocks 0, 1 and 2 each respectively incorporating the EMC Celerra, Clariion or VMax with a stack of Cisco blades and switches and a layer of VMware virtualization to suit. Marketed as ‘best of breed’ and ‘ready configured to client specifications’ it was an immediate launch pad for any customer looking to deploy a private cloud with a single pane of management. Whether it was an already mature virtualization infrastructure looking to quickly expand or an organization wanting to enter the virtualization stratosphere with minimum fuss, in house training etc. the VBlock was quickly carving itself a significant market share. Hence no surprise that EMC’s VP of Global Marketing and CTO Chuck Hollis can hardly hide his glee in interviews at the current lack of direct competition to the VCE’s destiny to rule the Cloud’s Galactic Empire. But one should never underestimate the power of the HP side, now counterstriking in their salaciously exciting sweeping aside of Dell and consequent takeover of the brilliant storage platforms of 3PAR. The Cloud Wars begun they have.

At $33 a share, 3Par now belong to HP. So despite being the household name that they are and arguably the world’s biggest technology company in sales terms, HP will only now be entering the storage market with a product worthy of their clout. HP EVA administrators may beg to differ but any end user who’s had to go through the nightmare of doing a firmware upgrade on a HP EVA will certainly join me in the chorus of disapproval at the very mention of ‘Enterprise’ within the EVA name. A good modular storage system – yes, enterprise – certainly not and hence why HP have had an OEM deal with HDS to rebrand their USPV and USPVM range as XP24000 and XP12000 respectively albeit with some microcode changes. It was a nice arrangement but surely all of this will now change with the acquisition of 3PAR.

3PAR is one of those classic examples of a company with pioneering developments and products but an inability to turn in a profit. 3PAR have arguably led the way with their developments with their Utility Storage range; architectures that provide a multi-tenant platform on which service providers can deliver both virtualized and scalable enterprise IT as a utility service, in other words Software as a Service (SaaS) and Infrastructure as a Service (IaaS) models. Then there’s their InServ storage server range, which constitutes models such as the T400 and T800 which directly compete with other Enterprise arrays such as the EMC VMax and the HDS USPV, or should I say the HP XP24000. With an operating system software suite named InForm they have a management pane that scales across both their high end T-class and modular F-class range i.e. the F200 and F400. A much better transition model than the MSA, EVA to XP range currently being offered by HP.

Another pioneering achievement for 3PAR was their revolutionary thin provisioning, a mechanism now adopted by practically every storage vendor on the market. Therein though emerged the ironic twist of fate as the very technology which gave them a major edge over their competitors was the same one that companies such as EMC and HDS adopted, marketed and created the allusion of being their very own. So now with the increased sales channels which come from being under the HP umbrella, the 3PAR range can finally penetrate a customer base worthy of its quality product and showcase its technology firsthand.

Another issue is that the new technology that will come with 3PAR will also enable HP to offer something far greater and beneficial for their existent client base than their current storage offerings. I personally now fear greatly for the future of HDS having already suffered the blow of the end of the OEM agreement with SUN and now what inevitably will be the end of their relationship with HP. It makes no business sense for HP to continue with HDS and their XP range for several reasons. While as great as the USPV is, nothing has happened or changed for several years while HP’s competitors such as IBM, NetApp and EMC continue to develop and enhance their storage system range. Due to this HP’s high end range has also stagnated and in a market where customers are being told of the wonders of a VMax and VPlex, it’s a hard sale to keep plugging a product such as the XP/USPV which hasn’t changed for more than four years. Step in the 3PAR R&D team and you have gurus who are adapting and changing with the times bringing some of the latest and greatest developments in storage – are HP really going to wait around for the USPV2 while EMC corner the market left, right and centre?

It’s the new developments that will also give HP an added edge in the market and bring the 3PAR technology at the forefront while also allowing HP to bring a direct competitive model to the VBlock. Just recently 3PAR announced their latest integration of their InServwith VMware’s vSphere4 to quickly build cloud infrastructures for their shared, virtualized utility service offerings. 3PAR have also made no secret of their Utility Storage range being specifically designed for virtual datacenters and the delivery of IT as a service. With the addition of their Adaptive Optimization, their Autonomic Storage Tiering application and tight integration with VMware’s vStorage, 3PAR are already a storage company looking at embracing the Cloud phenomenon. 3Par's virtual storage solutions will not only position HP as a one-stop storage solution, but also carries the massive potential of an integrated HP blade / VMware virtualization / 3PAR Storage Cloud platform that directly targets the VCE coalition’s VBlock.

So while it may be considered by some that HP’s takeover of 3PAR is a satisfactory result for EMC, with no 3PAR product threatening their Dell deals, I would strongly beg to differ. The HP 3PAR takeover is a statement of intent. No need for HDS and the XP range. No more playing second fiddle in the storage market to EMC. No more allowing VPLEX to run free reign when they now have a virtualized and autonomically managed cloud storage system. And no more allowing the VBlock to corner the Cloud market. Arise HP, the Force is strong with you.

Monitoring the SAN shine with Virtual Instruments

Posted by Archie Hendryx on Tuesday, August 31, 2010

It was about three months ago that one of my friends had informed me he was leaving HDS to join a company named Virtual Instruments. ‘Virtual Instruments?’ I asked myself, trying to fathom if I’d have heard of them before only to realize that I had once seen a write up on their SAN monitoring solution, which was then termed NetWisdom. I was then inadvertently asked to mention Virtual Instruments in one of my blogs - nice try pal but I had made it clear several times before to vendors requesting the same that I didn’t want my blog to become an advertising platform. Despite this though I was still intrigued by what could have persuaded someone to leave a genuinely stable position at HDS to a company I hadn’t really had much exposure to myself. Fast forward a few months, several whitepapers and numerous discussions and I find myself writing a blog about the very said company.

Simple fact is it’s rare to find a solution or product in the storage and virtualization market that can truly be regarded as unique. More often than not most new developments fall victim to what I term the ‘six month catch up’ syndrome in which a vendor brings out a new feature only for its main competitor to initially bash it and then subsequently release a rebranded and supposedly better version six months later. The original proponents of thin provisioning, automated tiered storage, deduplication, SSD flash drives etc. can all pay testament to this. It is hence why I have taken great interest in a company that currently occupies a niche in the SAN monitoring market and as yet doesn’t seem to have worthy competitor, namely Virtual Instruments.

My own experience of Storage monitoring has always been a pain in the sense that nine times out of ten it was a defensive exercise in proving to the applications, database or server guys that the problem didn’t lie with the storage. Storage most of the time is fairly straightforward, wherein if there are any performance problems with the storage system they’ve usually stemmed from any immediate change that may have occurred. For example provision a write intensive LUN to an already busy RAID group and you only have to count the seconds before your IT director rings your phone on the verge of a heart attack at how significantly his reporting times have increased. But then there was always the other situation when a problem would occur with no apparent changes having been made. Such situations required the old hat method of troubleshooting supposed storage problems by pinpointing whether the problem was between the Storage and the SAN fabric or between the Server and the SAN but therein dwelled the Bermuda Triangle at the centre of it all i.e. the SAN. Try to get a deeper look into the central meeting point of your Storage Infrastructure and to see what real time changes have occurred on your SAN fabric and you’d subsequently enter a labyrinth of guesses and predictions.

Such a situation occurred to me when I was asked to analyze and fix an ever-slowing backup of an Oracle database. Having bought more LTO4 tapes, incorporating a destaging device, spending exorbitant amounts of money on man days for the vendor’s technical consultants, playing around with the switches buffer credits and even considering buying more FC disks, the client still hadn’t resolved the situation. Now enter yours truly into the labyrinth of guesses and predictions. Thankfully I was able to solve the issue by staying up all night and running a Solaris IOSTAT, while simultaneously having the storage system up on another screen. Eventually I was able to pinpoint (albeit with trial and error tactics) the problem to rather large block sizes and particular LUNs that were using the same BEDs and causing havoc on their respected RAID groups. With several more sleepless nights to verify the conclusion, the problem was finally resolved. Looking back surely there was a better, cost effective and more productive way to have solved this issue. There was but I just wasn’t aware of it.

Furthermore ask any Storage guy that’s familiar with SAN management/monitoring software such as HDS’ Tuning Manager, EMC’s ControlCenter, HP’s Storage Essentials and their like and they’ll know full well that despite all the SNIA SMI-S compliancy they still fail to provide metrics beyond the customary RAID group utilization, historic IOPS/sec, cache hit rate, disk response times etc. in other words from the perspective of the end-user there really is little to monitor and hence troubleshoot. Frustratingly such solutions still fail to provide performance metrics from an application to storage system view and thus also fail to allow the end user to verify if they are indeed meeting the SLAs for that application. Put this scenario in the ever growing virtual server environment and you are further blinded by not knowing the relation between the I/Os and the virtual machines from which they originated.

Moreover Storage vendors don’t seem to be in a rush to solve this problem either and the pessimist in me says this is understandable when such a solution would inevitably lead to a non-procurement of unnecessary hardware. With a precise analysis and pinpointing of performance problems/degradation and you have the consequent annulment of the haphazard ‘let’s throw some more storage at it’, ‘let’s buy SSDs’ or ‘let’s upgrade our Storage System’ solutions that are currently music to the ears of storage vendor sales guys. So amidst these partial viewing vendor provided monitoring tools, which lack that essential I/O transaction-level visibility, Virtual Instruments (VI) pushes forth it’s solution, which boldly claims to encompass the most comprehensive monitoring and management of end-to-end SAN traffic. From the intricacies of a virtual machine’s application to the Fibre Channel cable that’s plugged into your USPV, VMax etc. VI say they have an insight. So looking back had I had VI’s ability to instantly access trending data on metrics such as MB/sec, CRC errors, log ins and outs etc. I could have instantly pinpointed and resolved many of the labyrinth quests I had ventured through so many times in the past.

Looking even closer at VI, there are situations beyond the SAN troubleshooting syndrome in which it can benefit an organization. Like most datacenters if you have one of the Empire State Building-esque monolithic storage systems it is more than likely being under utilized with the majority of its residing applications not requiring the cost and performance of such a system. So while most organizations are aware of this and look to saving costs by tiering their infrastructure onto cheaper storage via the alignment of their data values to the underlying storage platform, it’s seldom a seen reality due to the headaches and lack of insight related to such operations. Tiering off an application onto a cheaper storage platform requires the justification from the Storage Manager that there will be no performance impact to the end users but due to the lack of precise monitoring information, many are not prepared to take that risk. In an indirect acknowledgement to this problem, several storage vendors have looked at introducing automated tiering software for their arrays which in essence merely looks at the LUN utilization before migrating them to either higher-performance drives or cheaper SATA drives. In reality this is still a rather crude way of tiering an infrastructure when you consider it ignores SAN fabric congestion or improper HBA queue depths. In such a situation a monitoring tool that tracks I/Os across the SAN infrastructure without being pigeonholed to a specific device is axiomatic in the enablement of performance optimization and the consequent delivery of Tier I SLAs with cheaper storage – cue VI and their VirtualWisdom 2.0 solution.

In the same way that server virtualisation exposed the under utilization of physical server CPU and Memory, the VirtualWisdom solution is doing the same for the SAN. While vendors are more than pleased to further sell more upgraded modules packed with ports for their enterprise directors, it is becoming increasingly apparent that most SAN fabrics are significantly over-provisioned with utilization rates often being less than 10%. While many SAN fabric architects seem to overlook fan in ratios and oversubscription rates in a rush to finish deployments within specified project deadlines, underutilized SAN ports are now an ever-increasing reality that in turn bring with them the additional costs of switch and storage ports, SFPs and cables.

Within the context of server virtualisation itself, which has undoubtedly brought many advantages with it, one irritating side affect has been the rapid expansion of FC traffic to accommodate the increased number of servers going through a single SAN switch port and the complexity now required to monitor it. Then there’s the virtual maze which starts with applications within the Virtual Machines that are in turn running on multi-socket and multi-core servers, which are then connected to a VSAN infrastructure only to finally end up on storage systems which also incorporate virtualization layers whether that be with externally attached storage systems or thinly-provisioned disks. Finding an end-to-end monitoring solution in such a cascade of complexities seems an almost impossibility. Not so it seems for the team at Virtual Instruments.
Advancing upon the original NetWisdom premise, VI’s updated Virtual Wisdom 2.0 has a virtual software probe named ProbeV. The ProbeV collects the necessary information from the SAN switches via SNMP and on a port to port basis metrics on information such as the number of frames and bytes are collated alongside potential faults such as CRC errors synchronization loss, packet discards or link resets /failures. Then via the installation of splitters (which VI name TAPs - Traffic Access Points) between the storage array ports and the rest of the SAN, a percentage of the light from the fibre cable is then copied to a data recorder for playback and analysis. VI’s Fibre-Channel probes (ProbeFCXs) then analyze every frame header, measuring every SCSI I/O transaction from beginning to end. This enables a view of traffic performance whether related to the LUN, HBA, read/write level, or application level, allowing the user to instantly detect application performance slowdowns or transmission errors. The concept seems straightforward enough but it’s a concept no one else has yet been able to put in practice, despite growing competition from products such as Akorri's BalancePoint, Aptare's StorageConsole or Emulex's OneCommand Vision.

Added to this VI’s capabilities can also provide a clear advantage in preparing for a potential virtualization deployment or dare I fall for the marketing terminology – a move to the private cloud. Lack of insight of performance metrics has evidently led to the stagnation of the majority of organizations virtualising their tier 1 applications. Server virtualization has reaped many benefits for many organizations, but ask those same organizations how many of them have migrated their IO intensive tier 1 applications from their SPAARC based physical platforms to an Intel based virtual one and you’re probably looking at a paltry figure. The simple reason is risk and fear of performance degradation, despite logic showing that a virtual platform with resources set up as a pool could potentially bring numerous advantages. Put this now in the context of a world where cloud computing is the new buzz as more and more organizations look to outsource many of their services and applications and you then have even fewer numbers willing to launch their mission critical applications from the supposed safety and assured performance of the in-house datacenter to the unknown territory of the clouds. It is here where VirtualWisdom 2.0 has the potential to be absolutely huge in the market and at the forefront of the inevitable shift of tier 1 applications to the cloud. While I admittedly I find it hard to currently envision a future where a bank launches it’s OLTP into the cloud based on security issues alone, I’d be blinkered to not realize that there is a future where some mission-critical applications will indeed take that route. With VirtualWisdom’s ability to pinpoint virtualized application performance bottlenecks in the SAN, it’s a given that the consequences will lead to an instantly significant higher virtual infrastructure utilization and subsequent ROI.

The VI strategy is simple in that by recognizing I/O as the largest cause of application latency, VirtualWisdom’s inclusion of baseline comparisons of I/O performance, bandwidth utilization and average I/O completions comfortably provide the necessary insight fundamental to any major virtualization or cloud considerations an organization may be planning for. With its ProbeVM, a virtual software probe that collects status from VMware servers via vCenter, the data flow from virtual machine through to the storage system can be comprehensively analyzed with historical and real-time performance dashboards leading to an enhanced as well as accurate understanding of resource utilization and performance requirements. With a predictive analysis feature based on real production data the tool also provides the user the ability to accurately understand the effects of any potential SAN configuration or deployment changes. With every transaction from Virtual Machine to LUN being monitored, latency sources can quickly be identified whether it’s from the SAN or the application itself, enabling a virtual environment to be easily diagnosed and remedied should any performance issues occur. With such metrics at their disposal and the resultant confidence given to the administrator, the worry of meeting SLAs could quickly become a thing of the past while also rapidly hastening the shift towards tier 1 applications being on virtualized platforms. So despite growing attention being given to other VM monitoring tools such as Xangati or Hyperic, they’re solutions still lack the comprehensive nature of VI.

The advantages to blue-chip, big corporate customers are obvious and as their SAN and virtual environments continue to grow, an investment into a VirtualWisdom solution should soon become compulsory for any end of year budget approval. In saying that though, the future of VI also quite clearly lies beyond the big corporates with benefits which include the enablement of an organization to have real- time proactive monitoring and alerting, consolidation, preemptive analysis of any changes within the SAN or Virtual environment and comprehensive trend analysis of application, host HBA, switches, virtualization appliances, storage ports and LUN performance. Any company therefore looking to either consolidate their costly over-provisioned SAN, accelerate troubleshooting, improve their VMware server utilization & capacity planning, implement a tiering infrastructure or migrate to a cloud would find the CAPEX improvements that come with VirtualWisdom a figure too hard to ignore. So while Storage vendors don’t seem to be in any rush to fill this gap, they too have an opportunity to undercut their competitors by working alongside VI by promoting its benefits as a complement to their latest hardware, something which EMC, HDS, IBM and most recently Dell have cottoned on to having signed an agreement to sell the VI range as part their portfolio. Despite certain pretenders claiming to take its throne, FC is certainly here to stay for the foreseeable future. If the market/customer base is allowed to fully understand and recognize its need, then there’s no preventing a future when just about every SAN fabric comes part and parcel with a VI solution ensuring its optimal use. Whether VI eventually get bought out by one of the large whales or continue to swim the shores independently, there is no denying that companies will need to seriously consider the VI option if they’re to avoid drowning in the apprehensive nature of virtual infrastructure growth or the ever increasing costs of under-utilized SAN fabrics.

VDI – A Vulnerably Dangerous Investment or A Virtual Dream Inclusion?

Posted by Archie Hendryx on Saturday, August 21, 2010

PCs are part of everyday life in just about every organization. First there’s the purchase of the hardware and the necessary software followed by an inventory recorded and maintained by the IT department. Then normal procedure would dictate that the same IT department would then install all required applications before delivering them physically to the end user. Then over a period of time the laptop/PC would be maintained by the IT department with software updates, patches, troubleshooting etc. to ensure full utilization of employees. Once the PC/laptop becomes outdated, the IT department is then tasked with the monotonous task of removing the hardware, deleting sensitive data and removing any installed applications to free up licenses. All of this is done to enable the whole cycle to be repeated all over again. So in this vicious circle, there are obvious opportunities to better manage resources and save unnecessary OPEX & CAPEX costs, one such solution being virtual desktops.

Having witnessed the financial rewards of server virtualization, enterprises are now taking note of the benefits and usage of virtualization to support their desktop workloads. Consolidation, centralization are now no longer buzz words which were once used for marketing spin but are instead tangible realities for IT managers who initially took that unknown plunge into what was then the deep mystical waters of virtualization. Now they’re also realizing that by enabling thin clients the cost of their endpoint hardware is also significantly driven down by the consequent lifespan extension of existing PCs. Indeed the future of endpoint devices is one that could revolutionize their existent IT offices – a future of PC/laptop-less office desks replaced by thin client compatible portable iPads? Anything is now possible.

There’s also no doubting that VDI brings with it even further advantages one being improved security. With data always being administered via the datacenter rather than from the vulnerability of an end user’s desktop, risks of data loss or theft are instantly mitigated. No longer can sensitive data potentially walk out of the company’s front doors. Also with centralized administration, data can instantly be protected from scenarios where access needs to be limited or copying needs protection. For example a company that has numerous outsourcers / contractors on site can quickly set their data and application access to be specified or even turned off. Indeed there is nothing stopping an organization in setting up ‘a contractor’ desktop template which can be provisioned instantly and then decommissioned the moment the outsourced party’s contract expires.

By centralizing the infrastructure, fully compliant backup policies can also become significantly easier. With PCs and hard drives constantly crashing leading to potential data loss, the centralized virtual desktop has an underlying infrastructure which is continuously backed up. Additionally with the desktop instance not being bound to the PC’s local storage but instead stored in the server, recovery from potential outages are significantly quicker with even the option of reverting the virtual desktops back to their last known good states. Imagine the amount of work the customary employees that constantly bombard the IT helpdesk with countless “help I’ve accidentally deleted my hard drive” phone calls could actually get done now, not to mention the amount of time it will free up for your IT helpdesk team. In fact you might even end up with an IT helpdesk that gets to answer the phone instead of taking you straight to voicemail.

Additionally an IT helpdesk team would also be better utilized with the centralized, server-based approach allowing for both the maintenance of desktop images and specific user data all without having to visit the end user’s office. Hence with nothing needing to be installed on the endpoint, deployment becomes incredibly faster and easier with VDI than the traditional PC desktop deployment. This can also be extended to the laborious practice of having to individually visit each desktop to patch applications, provision and decommission users, as well as upgrade to newer operating systems. By removing such activities, the OPEX savings are more than substantial.

OPEX savings can also be seen with the added benefit of optimizing the productivity of highly paid non-technical end users by avoiding them having to needlessly maintain their desktop applications and data. Furthermore the productivity of employees can also be improved significantly by a centralized control of which applications are used by end users and a full monitoring of their usage, so long gone should be the days of employees downloading torrents or mindlessly chatting away on social networks during working hours. Even the infamously slow start up time of Windows which has consequently brought with it the traditional yet unofficial morning coffee/cigarette break can be eradicated with the faster Windows boot up times found with VDI. Even lack of access to an employee’s corporate PC can no longer be used as an excuse to not log in from home or elsewhere remotely when required – a manager’s dream and a slacker’s nightmare.

So with all these benefits, where lies the risk or obstacle to adopting a VDI infrastructure for your company? Well as with most technology there rarely exists a one solution fits all scenario and VDI is no different. Prior to any consideration for VDI, a company must first assess their infrastructure and whether VDI could indeed reap these benefits or alternatively possibly cause it more problems.

One of the first issues to look for is whether the organization has a high percentage of end users which manipulate complex or very large files. In other words if a high proportion of end users are constantly in need of using multimedia, 2D or 3D modeling applications, or VOIP, than VDI should possibly be reconsidered for a better managed desktop environment. The performance limitations that came about with server-based computing platforms such as Microsoft's Terminal Services with regards to bandwidth, latency and graphics capabilities are still fresh in the mind of many old school IT end users and without the correct pre-assessment those old monsters could rear their ugly head. For example an infrastructure that has many end users using high performance / real time applications should think carefully before going down the VDI route regardless of what the sales guys claim.

Despite this though if having taken all this into consideration and realizing your environment is suited to a VDI deployment the benefits and consequent savings are extensive despite the initial expenditure. As for which solution to take this leads to another careful consideration and one that needs to be investigated beyond the usual vendor marketing hype.

Firstly when it comes to server virtualization, there currently is no threatening competition (certainly not in the Enterprise infrastructure) to VMware’s VSphere 4. In the context of desktop virtualization though, the story has been somewhat different. Citrix’s XenDeskTop for those who’ve deployed it certainly know that it has better application compatibility than VMview 3. Add to the problems of multimedia freeze framing that would often occur with the VMview 3 solution and Citrix looked to have cornered a market in the virtual sphere which initially seemed destined to be monopolized by VMware. Since then VMware have hit back with VMview 4 which brought in the vastly improved PCOIP display protocol which dwarfs their original RDS protocol and simplified their integration with Active Directory and overall installation of the product, but in performance terms XenDeskTop still has an edge. So it comes as no surprise that rumours are rife that VMWorld 2010 which is soon to take place in a couple of weeks will be the launching pad for VMview 4.5 and a consequent onslaught on the Citrix VDI model. Subsequent retaliation is bound to follow from Citrix who seemed to have moved their focus away from the server virtualization realm in favour of the VDI milieu which can only be better for the clients that they are aiming for. Already features such as Offline Desktop, which allow end users to download and run their virtual desktops offline and then later resynchronize with the data center are being developed beyond the beta stage.

So the fact remains that quickly provisioning desktops from a master image and instantly administering policies, patches and updates without affecting user settings, data or preferences is an advantage many will find hard to ignore. So while VDI has still many areas for improvement, depending on your infrastructure it may already be an appropriate time to reap the rewards of its numerous benefits.

VSphere 4 still leaves Microsoft Hyper V-entilating

Posted by Archie Hendryx on Tuesday, August 03, 2010

When faced with a tirade of client consultations and disaster recovery proposals/assessments, you can’t help but be inundated with opportunities to showcase the benefits of server virtualization and more specifically VMware’s Site Recovery Manager. It’s a given that if an environment has a significant amount of applications running on X86 platforms, then virtualization is the way to go not just for all the consolidation and TCO savings but for the ease in which high availability, redundancy and business continuity can be deployed. Add to that the benefit of a virtualized disaster recovery solution that can easily be tested, failed over or failed back. With what was once a complex procedure, testing can now be done via a simple GUI based recovery plan. Thus one should consequently see the eradication of trepidation that often existed in testing out how full proof an existent DR procedure actually was. Long gone should be the days of the archaic approach of the 1000 page Doomsday Book-like disaster recovery plans which the network, server and storage guys had to rummage through during a recovery situation, often becoming a disaster within itself. Hence then there really is little argument to not go with a virtualized DR site and more specifically VMware’s Site Recovery Manager, but not so it seems if you’ve been cornered and inculcated by the Microsoft Hyper V Sales team.

Before I embark further, let’s be clear that I am not an employee or sales guy for VMware - I’m just a techie at heart who loves to showcase great technology. Furthermore let it go on record that I’ve never really had a bone of contention with Microsoft before – their Office products are great, Exchange still looks fab and I still run Windows on my laptop (albeit on VMware Fusion). I even didn’t take that much offense when I recently purchased Windows 7 only to realize that it was just a well marketed patch for the heir to the disastrous Windows ME throne i.e. Windows Vista. I also took it with a pinch of salt that Microsoft were falsely telling customers that Exchange would run better on local disks as opposed to the SAN in an attempt to safeguard themselves from the ongoing threat of Google Apps (a point well exposed and iterated on David Vellante’s Wikibon article, “Why Microsoft has it’s head up it’s DAS”). Additionally my purchase of Office 2010 in which I struggled to fathom the significant difference between Office 2007, still didn’t irk me that much. What has turned out to be the straw that broke the camel’s back though is the constant claims Microsoft are making that Hyper-V is somehow an equally good substitute to VMware and consequently pushing customers to avoid a Disaster Recovery Plan that includes Site Recovery Manager. So what exactly are the main differences between the two hypervisors and why is it that I so audaciously refuse to even consider Hyper-V as an alternative to VSphere 4?

Firstly one of the contentions often faced with virtualizing is the notion that some applications don’t perform well if at all when on a virtualized platform. This is true when put in the context of Hyper V, which currently limits the number of vCPUs to only 4. That’s pretty much a no go for CPU thirsty applications leading to an erroneous idea that a large set of applications should be excluded from virtualization. This is not the case when put in the VSphere 4 context where guests can have up to 8 cores of vCPUs. In an industry which is following a trend of CPUs scaling up by adding cores instead of increasing clock rates, the future of high-end x86 servers provides a vast potential for just about any CPU hungry application to run on a virtualized platform – something VSphere 4 is already taking the lead in.

Then there’s the management infrastructure in which Hyper V uses software named Systems Center (SC) and more specifically the Systems Center Virtual Machine Manager (SCVMM), whereas the VSphere4 equivalent is named vCenter Server. With Hyper-V being part of a complete Microsoft virtualization solution, System Center is generally used to manage Windows Server deployments. The System Center Virtual Machine Manager on the other hand not only manages Hyper-V-hosted guests but also Virtual Server, VMware Server and VMware ESX and GSX guests. Ironically this can then also be extended to managing vMotion operations between ESX hosts, (perhaps an inadvertent admission from Microsoft that vMotion wipes the floor off their equivalent Live Migration). Compared to vCenter Server which can either be a physical or virtual machine this comes across as somewhat paltry when VSphere 4 now offers the ability to allow multiple vCenter servers to be linked together and controlled from a single console, enabling a consolidated management of thousands of Virtual Machines and several Datacenters. Add to this the functionality that vCenter Server provides a search-based navigation tool that enables the finding of virtual machines, physical hosts and other inventory objects based on a user defined criteria and you have the ability to quickly find unused virtual machines or resources in the largest of environments all through a single management pane.

Taking the linked management capabilities of vCenter further, VSphere 4 also offers what they term the vNetwork Distributed Switch. Previously for an ESX server a virtual network switch was provisioned and managed and configured. With the vNetwork Distributed Switch, virtual switches can now span multiple ESX servers while also allowing the integration of third-party distributed switches. For example the Cisco Nexus 1000v is the gateway for the network gurus to enter the world of server virtualization and take the reins of the virtual network which were previously being run by VM system admins. Put this in the context of multiple vCenter Servers in the new linked mode and end users have the capability to not only manage numerous virtual machines but also the virtual network switches. In an Enterprise environment where there are hundreds of servers and thousands of virtual machines, what previously would have been a per-ESX switch configuration change can now be done centrally and in one go with the vNetwork Distributed Switch. Hyper V as of yet has no equivalent.

That broad approach has also pushed VMware to not only incorporate the network guys into their world, but also the security and backup gurus. With VSphere 4’s VMSafe, VMware have now enabled the use of 3rd party security products within their Virtual Machines. An avenue for the security guys to at last enter the virtual matrix they previously had little or no input in. Then there’s the doorway that VSphere 4 has opened for backup gurus such as Veeam to plug into virtual machines and take advantage of the latest developments such as Change Block Tracking and vStorage APIs bringing customers a more sophisticated and sound approach to VM backups. Hyper V still has no VMsafe equivalent and certainly no Change Block Tracking.

Furthermore as Microsoft flaunt Hyper V’s latest developments, scrutiny shows that they are merely features that have been available on VMware for several years and even then still don’t measure up in terms of performance. Point in case being Hyper V’s rather ironically titled ’Quick Motion’. For high availability and unplanned downtime protection Hyper-V clusters have a functionality that restarts Virtual Machines on other cluster nodes if a node fails. With ‘Quick Motion’ a Virtual Machine is then moved between cluster hosts. Where it fails though is in its inability to do the action instantly as is the case with VMware’s vMotion and HA features. This hardly exudes confidence in Hyper V when a potential move that can take several seconds leaves you exposed to the risk of a network connection failure which consequently results in further unplanned downtime. Subsequently Quick Motion’s inability to seamlessly move Virtual Machines across physical platforms results in downtime requirements for any potential server maintenance. This is certainly not the case with VMware and vMotion wherein server maintenance requiring downtime is a thing of the past.

Moreover so seamless is the vMotion process that the end user has no idea that his virtual machine has just crossed physical platforms while they were inputting new data. This leads us to Hyper V’s reaction and improved offering now termed Live Migration which Microsoft claim is now on a par with vMotion. Upon further inspection this still isn’t the case as the amount of vMotion operations that can be simultaneously done between physical servers is still far more limited with Hyper V. Additionally while Hyper V claims to be gaining ground, VMware in return have shot even further ahead with VSphere4’s Storage vMotion capabilities which allows ‘on the fly’ relocation of virtual disks between the storage resources within their given cluster. So as VMware advances and fine tunes its features such as Distributed Resource Scheduler, Distributed Power Management (DPM), Thin Provisioning, High Availability (HA) etc., Hyper V is only just announcing similar functions.

Another issue with Hyper-V is that it’s simply an add-on of Windows Server which relies on a Windows 2008 parent partition i.e. it’s not a bare metal hypervisor as virtual machines have to run on the physical system’s operating system, (something akin to VMware’s Workstation). Despite Microsoft’s claims that the device drivers have low latency access to the hardware, thus providing a hypervisor-like layer that runs alongside the full Windows Server software, in practical terms those that have deployed both Hyper V and VMware can testify the performance stats are still not comparable. One of the reasons for this is that VMware have optimized their drivers with the hardware vendors themselves unlike Hyper V which sadly is stuck in the ‘Windows’ world.

This leads to my next point that with VSphere 4 there is no reliance on a general operating system and the various operating systems that are now supported by VMware continues to grow. Microsoft on the other hand, being the potential sinking ship that she is in the Enterprise Datacenter have tried to counter this advantage with marketing Hyper V as being able to run on a larger variety of hardware configurations. One snag they don’t talk about so much is that it has to be a hardware configuration that is designed to support Windows. Ironic when one of the great things about virtualization is that Virtual Machines with just about any operating system can now be run together on the same physical server, sharing pools or resources – not so for Microsoft and Hyper V who desperately try to corner customers to remain on a made-for-PC operating system that somehow got drafted into datacenters. Question now is how many more inevitable reboots will it take on a Windows Enterprise Server before IT managers say enough is enough?

Then there are some of the new features that were introduced in VSphere 4 which still have failed to take similar shape in the Hyper V realm. For example VMDirectPath I/O which allows device drivers in virtual machines to bypass the virtualization layer and access the physical resources directly – a great feature for workloads that need constant and frequent access to I/O devices.

There’s also the Hot-Add features wherein a virtual machine running Windows 2000 or above can have its network cards, SCSI adaptors, sound cards, CD-ROMs added or removed while still powered on. They even go further by letting your Win 2003 or above VM hot add memory or CPU and even extend your VMDK files – all while the machine is still running. There’s still nothing ‘hot’ to add from the Hyper V front.

Also instead of the headache inducing complexities that come with Microsoft’s Cluster Service, VSphere 4 comes with Fault tolerance – a far easier alternative for mission critical applications that can’t tolerate downtime or data loss. By simply creating a duplicate virtual machine on a separate physical host and via vLockstep technology to ensure consistency of data, VSphere 4 offers a long awaited and straightforward alternative to complex clustering that further enhances the benefits of virtualization. No surprise then that currently the Microsoft Hyper V sales guys tend to belittle it as no great advantage.

Another VSphere 4 feature which also holds great benefits and is non-existent in Hyper V is that of Memory overcommitment. This feature allows the allocation of more RAM to virtual machines than is physically available on the physical host. Via techniques such as Transparent page sharing, virtual machines can share their common code thus leading to significant savings in the all too common situation of having to add more memory to an existent server which equates to more than the cost price of the server.

So while Hyper V has also recently caught up with a Site Recovery Manager equivalent with the Citrix Essentials for Hyper V package, it’s still doing just that i.e. playing catch up. One of the main arguments for Hyper V is that it’s free or nearly free but again that’s the marketing jargon that fails to elaborate that you have to buy a license for a Windows Server first and hence help maintain the dwindling lifespan of Microsoft within the Datacenter. Another selling point that Hyper V had was that they were better aimed for small to medium sized businesses due to their cheaper cost….the recent announcement of VSphere 4.1 may now also put bed to that claim. So like all great empires, collapses are imminent and while I don’t believe Microsoft are going to the I.T. Black Hole, they certainly don’t look like catching up with VMware in the ever emerging and growing market of virtualization.

NetApp Justifies Storage Efficiency Tag with Primary Deduplication

Posted by Archie Hendryx on Friday, June 04, 2010

The pendulum has shifted. We are in an era in which Storage Managers are in the ascendancy while vendors must shape up to meet customer demands in order to survive the current economic plight. Long gone are the days of disk happy vendors who could easily shift expensive boxes of FC disks or Account Managers who boasted their huge margins at the selling of skyscraper storage systems to clients who faced an uphill struggle to meet their

constantly growing storage demands. With responses such as thin/dynamic/virtual provisioning arrays and automated storage tiering, vendors have taken a step towards giving customers solutions that will enable them to use more of what they already have as well as utilise cheaper disks. Another such feature now starting to really prick the conscience of vendors as customers become more savvy is that of primary deduplication or the more aptly termed ‘data reduction’. So as this cost saving surge continues some vendors have cheekily tried to counteract it with sales pitches for exorbitantly priced Flash SSDs (which promise 10 times performance yet shamelessly sit on the back end of Stor

age systems dependent on the latency of their BEDs and RAID controllers) as a means to keep margins up. But not the WAFL kings NetApp….

Mention deduplication and you most likely think of backup environments where redundant data is eliminated, leaving only one copy of the data and an index of the duplicated data should it ever be required for restoration. With only the unique data stored the immediate benefits of deduplication are obvious from a reduction in backup storage capacity, power, space and cooling requirements to reduction in the amount of data sent across the WAN for remote backups, replication and disaster recovery. Not only that, deduplication savings has also shifted the backup paradigm from tape to disk allowing quicker restores and reduced media handling errors (and yes I have made it no secret of giving kudos to Data Domain in this respect). Shift this concept now to primary storage though and you have a different proposition with different challenges and advantages.

Primary storage is accessed or written to constantly therefore necessitating that any deduplication process must be fast enough to eliminate any potential overhead or delay to data access. Add to the equation that unlike backup data the amounts of duplicate data are not in the same proportion as that found in Primary storage and you also have a lesser yield in deduplication ratios. Despite this though, NetApp have taken Primary deduplication by the horns and are offering genuine data reduction that extends beyond the false marketing of archiving and tiering being data reduction techniques when in fact all they are is the shoving of data onto different platforms.

Most vendors on the ‘data reduction’ bandwagon have gone with file level deduplication which looks at the file system itself replacing identical files with one copy and links for the duplicate files. Hence there is no requirement for the file to be decompressed or reassembled upon end user request due to the same data merely having numerous links. Therefore the main advantage is that data access should be without any added latency. In real terms though this minimalist approach doesn’t produce data reduction ratios that yield anything significant for the user to be particularly excited about.

On the flip side what is referred to as sub file level deduplication has an approach familiar to those who already use deduplication for their backups. Using the hash based technology; files are first broken into chunks. Each chunk of data is then assigned a unique identification, whereupon duplicated identifications of chunks are replaced with a pointer to the original chunk. Such an approach brings the added advantage of discovering duplicate patterns in random places irregardless of how the data is saved. With the addition of compression end users can also significantly reduce the size of chunks. Of course this also adds the catch 22 situation of deduplication achieving better efficiency with smaller chunks, while compression is more effective with larger chunks. Hence why NetApp have yet to incorporate compression alongside their sub level deduplication. Despite this though NetApp are showing results that when put in a virtual context are more than impressive.

One of the first major vendors to incorporate primary data deduplication, NetApp is comfortably verifying their ‘storage efficiency’ selling tag when put in the context of server and desktop virtualisation. One of the many benefits of VMware (or other server virtualisation platforms) is their ability to rapidly deploy new virtual machines from stored templates. Each of these VM templates includes a configuration file and several virtual disk files. It is these virtual disk files that include the operating system, common applications and patch systems or updates and it is these that are constantly duplicated each time a cloned VM is deployed. Imagine now a deployment of 200 like for like VMs and then put NetApp’s primary deduplication process wherein multiple machines end up sharing the same physical blocks in a FAS system and you’ve got some serious reduction numbers and storage efficiency. With reduction results of 75% to 90%, NetApp’s advantage comes from their long established snapshot-magic producing WAFL (write anywhere file level) technology. With its in built CRC checksum for each block of data store, the WAFL already has block-based pointers. By running the deduplication at scheduled times all checksums are examined, with the filer doing a block-level comparison of blocks if any of the checksums match. If a match is identified, then one of the WAFL block-based pointers simply replaces the duplicated block. Due to the scheduled nature of the operation occurring during quiet periods, the performance impact is also not that intrusive giving the NetApp solution significant storage savings especially when similar operating systems and applications are grouped into the same datastores. Add to the mix that NetApp’s PAM (Performance Accelerator Module) is also dedupe-aware, common block reads are quickly satisfied from cache bringing even faster responses by not having to search through every virtual disk file (VMDK). NetApp also ‘go further, faster’ so to speak with the addition of their flex clone technology which rapidly deploys VM clones which are also prededuplicated.

So while arguments may be raised that NetApp’s sub level deduplication suffers from the physical layer constraints of WAFL’s 4KB block size or their lack of compression, the truth is that they have deliberately avoided such alternatives. If they’d have opted for using sliding block chunking where a window is passed along the file stream to seek out a more naturally occurring internal file or added compression algorithms, the overhead that would come with such additions would render most of the advantages of primary

dedupe worthless. Yes, Ocarina and Storwize have appliances that compress and uncompress data as it’s alternatively stored and read but what performance overhead do such technologies have when hundreds of end users concurrently access the same email attachment? As for Oracle’s Solaris ZFS file system sub level deduplication which is yet to see the light of day one wonders how much hot water it will get Oracle into should it turn out to be a direct rip off of the NetApp model.

Bottom line is as long as the primary deduplication model you employ gives you the reduction numbers worth the inevitable overhead then it's more than a beneficial cost saving feature. Furthermore while I’m the first to admit that NetApp certainly have their flaws but when it comes to primary deduplication and consequent data reduction they really are making your storage more efficient.