Monitoring the SAN shine with Virtual Instruments

Posted by Archie Hendryx on Tuesday, August 31, 2010

It was about three months ago that one of my friends had informed me he was leaving HDS to join a company named Virtual Instruments. ‘Virtual Instruments?’ I asked myself, trying to fathom if I’d have heard of them before only to realize that I had once seen a write up on their SAN monitoring solution, which was then termed NetWisdom. I was then inadvertently asked to mention Virtual Instruments in one of my blogs - nice try pal but I had made it clear several times before to vendors requesting the same that I didn’t want my blog to become an advertising platform. Despite this though I was still intrigued by what could have persuaded someone to leave a genuinely stable position at HDS to a company I hadn’t really had much exposure to myself. Fast forward a few months, several whitepapers and numerous discussions and I find myself writing a blog about the very said company.

Simple fact is it’s rare to find a solution or product in the storage and virtualization market that can truly be regarded as unique. More often than not most new developments fall victim to what I term the ‘six month catch up’ syndrome in which a vendor brings out a new feature only for its main competitor to initially bash it and then subsequently release a rebranded and supposedly better version six months later. The original proponents of thin provisioning, automated tiered storage, deduplication, SSD flash drives etc. can all pay testament to this. It is hence why I have taken great interest in a company that currently occupies a niche in the SAN monitoring market and as yet doesn’t seem to have worthy competitor, namely Virtual Instruments.

My own experience of Storage monitoring has always been a pain in the sense that nine times out of ten it was a defensive exercise in proving to the applications, database or server guys that the problem didn’t lie with the storage. Storage most of the time is fairly straightforward, wherein if there are any performance problems with the storage system they’ve usually stemmed from any immediate change that may have occurred. For example provision a write intensive LUN to an already busy RAID group and you only have to count the seconds before your IT director rings your phone on the verge of a heart attack at how significantly his reporting times have increased. But then there was always the other situation when a problem would occur with no apparent changes having been made. Such situations required the old hat method of troubleshooting supposed storage problems by pinpointing whether the problem was between the Storage and the SAN fabric or between the Server and the SAN but therein dwelled the Bermuda Triangle at the centre of it all i.e. the SAN. Try to get a deeper look into the central meeting point of your Storage Infrastructure and to see what real time changes have occurred on your SAN fabric and you’d subsequently enter a labyrinth of guesses and predictions.

Such a situation occurred to me when I was asked to analyze and fix an ever-slowing backup of an Oracle database. Having bought more LTO4 tapes, incorporating a destaging device, spending exorbitant amounts of money on man days for the vendor’s technical consultants, playing around with the switches buffer credits and even considering buying more FC disks, the client still hadn’t resolved the situation. Now enter yours truly into the labyrinth of guesses and predictions. Thankfully I was able to solve the issue by staying up all night and running a Solaris IOSTAT, while simultaneously having the storage system up on another screen. Eventually I was able to pinpoint (albeit with trial and error tactics) the problem to rather large block sizes and particular LUNs that were using the same BEDs and causing havoc on their respected RAID groups. With several more sleepless nights to verify the conclusion, the problem was finally resolved. Looking back surely there was a better, cost effective and more productive way to have solved this issue. There was but I just wasn’t aware of it.

Furthermore ask any Storage guy that’s familiar with SAN management/monitoring software such as HDS’ Tuning Manager, EMC’s ControlCenter, HP’s Storage Essentials and their like and they’ll know full well that despite all the SNIA SMI-S compliancy they still fail to provide metrics beyond the customary RAID group utilization, historic IOPS/sec, cache hit rate, disk response times etc. in other words from the perspective of the end-user there really is little to monitor and hence troubleshoot. Frustratingly such solutions still fail to provide performance metrics from an application to storage system view and thus also fail to allow the end user to verify if they are indeed meeting the SLAs for that application. Put this scenario in the ever growing virtual server environment and you are further blinded by not knowing the relation between the I/Os and the virtual machines from which they originated.

Moreover Storage vendors don’t seem to be in a rush to solve this problem either and the pessimist in me says this is understandable when such a solution would inevitably lead to a non-procurement of unnecessary hardware. With a precise analysis and pinpointing of performance problems/degradation and you have the consequent annulment of the haphazard ‘let’s throw some more storage at it’, ‘let’s buy SSDs’ or ‘let’s upgrade our Storage System’ solutions that are currently music to the ears of storage vendor sales guys. So amidst these partial viewing vendor provided monitoring tools, which lack that essential I/O transaction-level visibility, Virtual Instruments (VI) pushes forth it’s solution, which boldly claims to encompass the most comprehensive monitoring and management of end-to-end SAN traffic. From the intricacies of a virtual machine’s application to the Fibre Channel cable that’s plugged into your USPV, VMax etc. VI say they have an insight. So looking back had I had VI’s ability to instantly access trending data on metrics such as MB/sec, CRC errors, log ins and outs etc. I could have instantly pinpointed and resolved many of the labyrinth quests I had ventured through so many times in the past.

Looking even closer at VI, there are situations beyond the SAN troubleshooting syndrome in which it can benefit an organization. Like most datacenters if you have one of the Empire State Building-esque monolithic storage systems it is more than likely being under utilized with the majority of its residing applications not requiring the cost and performance of such a system. So while most organizations are aware of this and look to saving costs by tiering their infrastructure onto cheaper storage via the alignment of their data values to the underlying storage platform, it’s seldom a seen reality due to the headaches and lack of insight related to such operations. Tiering off an application onto a cheaper storage platform requires the justification from the Storage Manager that there will be no performance impact to the end users but due to the lack of precise monitoring information, many are not prepared to take that risk. In an indirect acknowledgement to this problem, several storage vendors have looked at introducing automated tiering software for their arrays which in essence merely looks at the LUN utilization before migrating them to either higher-performance drives or cheaper SATA drives. In reality this is still a rather crude way of tiering an infrastructure when you consider it ignores SAN fabric congestion or improper HBA queue depths. In such a situation a monitoring tool that tracks I/Os across the SAN infrastructure without being pigeonholed to a specific device is axiomatic in the enablement of performance optimization and the consequent delivery of Tier I SLAs with cheaper storage – cue VI and their VirtualWisdom 2.0 solution.

In the same way that server virtualisation exposed the under utilization of physical server CPU and Memory, the VirtualWisdom solution is doing the same for the SAN. While vendors are more than pleased to further sell more upgraded modules packed with ports for their enterprise directors, it is becoming increasingly apparent that most SAN fabrics are significantly over-provisioned with utilization rates often being less than 10%. While many SAN fabric architects seem to overlook fan in ratios and oversubscription rates in a rush to finish deployments within specified project deadlines, underutilized SAN ports are now an ever-increasing reality that in turn bring with them the additional costs of switch and storage ports, SFPs and cables.

Within the context of server virtualisation itself, which has undoubtedly brought many advantages with it, one irritating side affect has been the rapid expansion of FC traffic to accommodate the increased number of servers going through a single SAN switch port and the complexity now required to monitor it. Then there’s the virtual maze which starts with applications within the Virtual Machines that are in turn running on multi-socket and multi-core servers, which are then connected to a VSAN infrastructure only to finally end up on storage systems which also incorporate virtualization layers whether that be with externally attached storage systems or thinly-provisioned disks. Finding an end-to-end monitoring solution in such a cascade of complexities seems an almost impossibility. Not so it seems for the team at Virtual Instruments.
Advancing upon the original NetWisdom premise, VI’s updated Virtual Wisdom 2.0 has a virtual software probe named ProbeV. The ProbeV collects the necessary information from the SAN switches via SNMP and on a port to port basis metrics on information such as the number of frames and bytes are collated alongside potential faults such as CRC errors synchronization loss, packet discards or link resets /failures. Then via the installation of splitters (which VI name TAPs - Traffic Access Points) between the storage array ports and the rest of the SAN, a percentage of the light from the fibre cable is then copied to a data recorder for playback and analysis. VI’s Fibre-Channel probes (ProbeFCXs) then analyze every frame header, measuring every SCSI I/O transaction from beginning to end. This enables a view of traffic performance whether related to the LUN, HBA, read/write level, or application level, allowing the user to instantly detect application performance slowdowns or transmission errors. The concept seems straightforward enough but it’s a concept no one else has yet been able to put in practice, despite growing competition from products such as Akorri's BalancePoint, Aptare's StorageConsole or Emulex's OneCommand Vision.

Added to this VI’s capabilities can also provide a clear advantage in preparing for a potential virtualization deployment or dare I fall for the marketing terminology – a move to the private cloud. Lack of insight of performance metrics has evidently led to the stagnation of the majority of organizations virtualising their tier 1 applications. Server virtualization has reaped many benefits for many organizations, but ask those same organizations how many of them have migrated their IO intensive tier 1 applications from their SPAARC based physical platforms to an Intel based virtual one and you’re probably looking at a paltry figure. The simple reason is risk and fear of performance degradation, despite logic showing that a virtual platform with resources set up as a pool could potentially bring numerous advantages. Put this now in the context of a world where cloud computing is the new buzz as more and more organizations look to outsource many of their services and applications and you then have even fewer numbers willing to launch their mission critical applications from the supposed safety and assured performance of the in-house datacenter to the unknown territory of the clouds. It is here where VirtualWisdom 2.0 has the potential to be absolutely huge in the market and at the forefront of the inevitable shift of tier 1 applications to the cloud. While I admittedly I find it hard to currently envision a future where a bank launches it’s OLTP into the cloud based on security issues alone, I’d be blinkered to not realize that there is a future where some mission-critical applications will indeed take that route. With VirtualWisdom’s ability to pinpoint virtualized application performance bottlenecks in the SAN, it’s a given that the consequences will lead to an instantly significant higher virtual infrastructure utilization and subsequent ROI.

The VI strategy is simple in that by recognizing I/O as the largest cause of application latency, VirtualWisdom’s inclusion of baseline comparisons of I/O performance, bandwidth utilization and average I/O completions comfortably provide the necessary insight fundamental to any major virtualization or cloud considerations an organization may be planning for. With its ProbeVM, a virtual software probe that collects status from VMware servers via vCenter, the data flow from virtual machine through to the storage system can be comprehensively analyzed with historical and real-time performance dashboards leading to an enhanced as well as accurate understanding of resource utilization and performance requirements. With a predictive analysis feature based on real production data the tool also provides the user the ability to accurately understand the effects of any potential SAN configuration or deployment changes. With every transaction from Virtual Machine to LUN being monitored, latency sources can quickly be identified whether it’s from the SAN or the application itself, enabling a virtual environment to be easily diagnosed and remedied should any performance issues occur. With such metrics at their disposal and the resultant confidence given to the administrator, the worry of meeting SLAs could quickly become a thing of the past while also rapidly hastening the shift towards tier 1 applications being on virtualized platforms. So despite growing attention being given to other VM monitoring tools such as Xangati or Hyperic, they’re solutions still lack the comprehensive nature of VI.

The advantages to blue-chip, big corporate customers are obvious and as their SAN and virtual environments continue to grow, an investment into a VirtualWisdom solution should soon become compulsory for any end of year budget approval. In saying that though, the future of VI also quite clearly lies beyond the big corporates with benefits which include the enablement of an organization to have real- time proactive monitoring and alerting, consolidation, preemptive analysis of any changes within the SAN or Virtual environment and comprehensive trend analysis of application, host HBA, switches, virtualization appliances, storage ports and LUN performance. Any company therefore looking to either consolidate their costly over-provisioned SAN, accelerate troubleshooting, improve their VMware server utilization & capacity planning, implement a tiering infrastructure or migrate to a cloud would find the CAPEX improvements that come with VirtualWisdom a figure too hard to ignore. So while Storage vendors don’t seem to be in any rush to fill this gap, they too have an opportunity to undercut their competitors by working alongside VI by promoting its benefits as a complement to their latest hardware, something which EMC, HDS, IBM and most recently Dell have cottoned on to having signed an agreement to sell the VI range as part their portfolio. Despite certain pretenders claiming to take its throne, FC is certainly here to stay for the foreseeable future. If the market/customer base is allowed to fully understand and recognize its need, then there’s no preventing a future when just about every SAN fabric comes part and parcel with a VI solution ensuring its optimal use. Whether VI eventually get bought out by one of the large whales or continue to swim the shores independently, there is no denying that companies will need to seriously consider the VI option if they’re to avoid drowning in the apprehensive nature of virtual infrastructure growth or the ever increasing costs of under-utilized SAN fabrics.

VDI – A Vulnerably Dangerous Investment or A Virtual Dream Inclusion?

Posted by Archie Hendryx on Saturday, August 21, 2010

PCs are part of everyday life in just about every organization. First there’s the purchase of the hardware and the necessary software followed by an inventory recorded and maintained by the IT department. Then normal procedure would dictate that the same IT department would then install all required applications before delivering them physically to the end user. Then over a period of time the laptop/PC would be maintained by the IT department with software updates, patches, troubleshooting etc. to ensure full utilization of employees. Once the PC/laptop becomes outdated, the IT department is then tasked with the monotonous task of removing the hardware, deleting sensitive data and removing any installed applications to free up licenses. All of this is done to enable the whole cycle to be repeated all over again. So in this vicious circle, there are obvious opportunities to better manage resources and save unnecessary OPEX & CAPEX costs, one such solution being virtual desktops.

Having witnessed the financial rewards of server virtualization, enterprises are now taking note of the benefits and usage of virtualization to support their desktop workloads. Consolidation, centralization are now no longer buzz words which were once used for marketing spin but are instead tangible realities for IT managers who initially took that unknown plunge into what was then the deep mystical waters of virtualization. Now they’re also realizing that by enabling thin clients the cost of their endpoint hardware is also significantly driven down by the consequent lifespan extension of existing PCs. Indeed the future of endpoint devices is one that could revolutionize their existent IT offices – a future of PC/laptop-less office desks replaced by thin client compatible portable iPads? Anything is now possible.

There’s also no doubting that VDI brings with it even further advantages one being improved security. With data always being administered via the datacenter rather than from the vulnerability of an end user’s desktop, risks of data loss or theft are instantly mitigated. No longer can sensitive data potentially walk out of the company’s front doors. Also with centralized administration, data can instantly be protected from scenarios where access needs to be limited or copying needs protection. For example a company that has numerous outsourcers / contractors on site can quickly set their data and application access to be specified or even turned off. Indeed there is nothing stopping an organization in setting up ‘a contractor’ desktop template which can be provisioned instantly and then decommissioned the moment the outsourced party’s contract expires.

By centralizing the infrastructure, fully compliant backup policies can also become significantly easier. With PCs and hard drives constantly crashing leading to potential data loss, the centralized virtual desktop has an underlying infrastructure which is continuously backed up. Additionally with the desktop instance not being bound to the PC’s local storage but instead stored in the server, recovery from potential outages are significantly quicker with even the option of reverting the virtual desktops back to their last known good states. Imagine the amount of work the customary employees that constantly bombard the IT helpdesk with countless “help I’ve accidentally deleted my hard drive” phone calls could actually get done now, not to mention the amount of time it will free up for your IT helpdesk team. In fact you might even end up with an IT helpdesk that gets to answer the phone instead of taking you straight to voicemail.

Additionally an IT helpdesk team would also be better utilized with the centralized, server-based approach allowing for both the maintenance of desktop images and specific user data all without having to visit the end user’s office. Hence with nothing needing to be installed on the endpoint, deployment becomes incredibly faster and easier with VDI than the traditional PC desktop deployment. This can also be extended to the laborious practice of having to individually visit each desktop to patch applications, provision and decommission users, as well as upgrade to newer operating systems. By removing such activities, the OPEX savings are more than substantial.

OPEX savings can also be seen with the added benefit of optimizing the productivity of highly paid non-technical end users by avoiding them having to needlessly maintain their desktop applications and data. Furthermore the productivity of employees can also be improved significantly by a centralized control of which applications are used by end users and a full monitoring of their usage, so long gone should be the days of employees downloading torrents or mindlessly chatting away on social networks during working hours. Even the infamously slow start up time of Windows which has consequently brought with it the traditional yet unofficial morning coffee/cigarette break can be eradicated with the faster Windows boot up times found with VDI. Even lack of access to an employee’s corporate PC can no longer be used as an excuse to not log in from home or elsewhere remotely when required – a manager’s dream and a slacker’s nightmare.

So with all these benefits, where lies the risk or obstacle to adopting a VDI infrastructure for your company? Well as with most technology there rarely exists a one solution fits all scenario and VDI is no different. Prior to any consideration for VDI, a company must first assess their infrastructure and whether VDI could indeed reap these benefits or alternatively possibly cause it more problems.

One of the first issues to look for is whether the organization has a high percentage of end users which manipulate complex or very large files. In other words if a high proportion of end users are constantly in need of using multimedia, 2D or 3D modeling applications, or VOIP, than VDI should possibly be reconsidered for a better managed desktop environment. The performance limitations that came about with server-based computing platforms such as Microsoft's Terminal Services with regards to bandwidth, latency and graphics capabilities are still fresh in the mind of many old school IT end users and without the correct pre-assessment those old monsters could rear their ugly head. For example an infrastructure that has many end users using high performance / real time applications should think carefully before going down the VDI route regardless of what the sales guys claim.

Despite this though if having taken all this into consideration and realizing your environment is suited to a VDI deployment the benefits and consequent savings are extensive despite the initial expenditure. As for which solution to take this leads to another careful consideration and one that needs to be investigated beyond the usual vendor marketing hype.

Firstly when it comes to server virtualization, there currently is no threatening competition (certainly not in the Enterprise infrastructure) to VMware’s VSphere 4. In the context of desktop virtualization though, the story has been somewhat different. Citrix’s XenDeskTop for those who’ve deployed it certainly know that it has better application compatibility than VMview 3. Add to the problems of multimedia freeze framing that would often occur with the VMview 3 solution and Citrix looked to have cornered a market in the virtual sphere which initially seemed destined to be monopolized by VMware. Since then VMware have hit back with VMview 4 which brought in the vastly improved PCOIP display protocol which dwarfs their original RDS protocol and simplified their integration with Active Directory and overall installation of the product, but in performance terms XenDeskTop still has an edge. So it comes as no surprise that rumours are rife that VMWorld 2010 which is soon to take place in a couple of weeks will be the launching pad for VMview 4.5 and a consequent onslaught on the Citrix VDI model. Subsequent retaliation is bound to follow from Citrix who seemed to have moved their focus away from the server virtualization realm in favour of the VDI milieu which can only be better for the clients that they are aiming for. Already features such as Offline Desktop, which allow end users to download and run their virtual desktops offline and then later resynchronize with the data center are being developed beyond the beta stage.

So the fact remains that quickly provisioning desktops from a master image and instantly administering policies, patches and updates without affecting user settings, data or preferences is an advantage many will find hard to ignore. So while VDI has still many areas for improvement, depending on your infrastructure it may already be an appropriate time to reap the rewards of its numerous benefits.

VSphere 4 still leaves Microsoft Hyper V-entilating

Posted by Archie Hendryx on Tuesday, August 03, 2010

When faced with a tirade of client consultations and disaster recovery proposals/assessments, you can’t help but be inundated with opportunities to showcase the benefits of server virtualization and more specifically VMware’s Site Recovery Manager. It’s a given that if an environment has a significant amount of applications running on X86 platforms, then virtualization is the way to go not just for all the consolidation and TCO savings but for the ease in which high availability, redundancy and business continuity can be deployed. Add to that the benefit of a virtualized disaster recovery solution that can easily be tested, failed over or failed back. With what was once a complex procedure, testing can now be done via a simple GUI based recovery plan. Thus one should consequently see the eradication of trepidation that often existed in testing out how full proof an existent DR procedure actually was. Long gone should be the days of the archaic approach of the 1000 page Doomsday Book-like disaster recovery plans which the network, server and storage guys had to rummage through during a recovery situation, often becoming a disaster within itself. Hence then there really is little argument to not go with a virtualized DR site and more specifically VMware’s Site Recovery Manager, but not so it seems if you’ve been cornered and inculcated by the Microsoft Hyper V Sales team.

Before I embark further, let’s be clear that I am not an employee or sales guy for VMware - I’m just a techie at heart who loves to showcase great technology. Furthermore let it go on record that I’ve never really had a bone of contention with Microsoft before – their Office products are great, Exchange still looks fab and I still run Windows on my laptop (albeit on VMware Fusion). I even didn’t take that much offense when I recently purchased Windows 7 only to realize that it was just a well marketed patch for the heir to the disastrous Windows ME throne i.e. Windows Vista. I also took it with a pinch of salt that Microsoft were falsely telling customers that Exchange would run better on local disks as opposed to the SAN in an attempt to safeguard themselves from the ongoing threat of Google Apps (a point well exposed and iterated on David Vellante’s Wikibon article, “Why Microsoft has it’s head up it’s DAS”). Additionally my purchase of Office 2010 in which I struggled to fathom the significant difference between Office 2007, still didn’t irk me that much. What has turned out to be the straw that broke the camel’s back though is the constant claims Microsoft are making that Hyper-V is somehow an equally good substitute to VMware and consequently pushing customers to avoid a Disaster Recovery Plan that includes Site Recovery Manager. So what exactly are the main differences between the two hypervisors and why is it that I so audaciously refuse to even consider Hyper-V as an alternative to VSphere 4?

Firstly one of the contentions often faced with virtualizing is the notion that some applications don’t perform well if at all when on a virtualized platform. This is true when put in the context of Hyper V, which currently limits the number of vCPUs to only 4. That’s pretty much a no go for CPU thirsty applications leading to an erroneous idea that a large set of applications should be excluded from virtualization. This is not the case when put in the VSphere 4 context where guests can have up to 8 cores of vCPUs. In an industry which is following a trend of CPUs scaling up by adding cores instead of increasing clock rates, the future of high-end x86 servers provides a vast potential for just about any CPU hungry application to run on a virtualized platform – something VSphere 4 is already taking the lead in.

Then there’s the management infrastructure in which Hyper V uses software named Systems Center (SC) and more specifically the Systems Center Virtual Machine Manager (SCVMM), whereas the VSphere4 equivalent is named vCenter Server. With Hyper-V being part of a complete Microsoft virtualization solution, System Center is generally used to manage Windows Server deployments. The System Center Virtual Machine Manager on the other hand not only manages Hyper-V-hosted guests but also Virtual Server, VMware Server and VMware ESX and GSX guests. Ironically this can then also be extended to managing vMotion operations between ESX hosts, (perhaps an inadvertent admission from Microsoft that vMotion wipes the floor off their equivalent Live Migration). Compared to vCenter Server which can either be a physical or virtual machine this comes across as somewhat paltry when VSphere 4 now offers the ability to allow multiple vCenter servers to be linked together and controlled from a single console, enabling a consolidated management of thousands of Virtual Machines and several Datacenters. Add to this the functionality that vCenter Server provides a search-based navigation tool that enables the finding of virtual machines, physical hosts and other inventory objects based on a user defined criteria and you have the ability to quickly find unused virtual machines or resources in the largest of environments all through a single management pane.

Taking the linked management capabilities of vCenter further, VSphere 4 also offers what they term the vNetwork Distributed Switch. Previously for an ESX server a virtual network switch was provisioned and managed and configured. With the vNetwork Distributed Switch, virtual switches can now span multiple ESX servers while also allowing the integration of third-party distributed switches. For example the Cisco Nexus 1000v is the gateway for the network gurus to enter the world of server virtualization and take the reins of the virtual network which were previously being run by VM system admins. Put this in the context of multiple vCenter Servers in the new linked mode and end users have the capability to not only manage numerous virtual machines but also the virtual network switches. In an Enterprise environment where there are hundreds of servers and thousands of virtual machines, what previously would have been a per-ESX switch configuration change can now be done centrally and in one go with the vNetwork Distributed Switch. Hyper V as of yet has no equivalent.

That broad approach has also pushed VMware to not only incorporate the network guys into their world, but also the security and backup gurus. With VSphere 4’s VMSafe, VMware have now enabled the use of 3rd party security products within their Virtual Machines. An avenue for the security guys to at last enter the virtual matrix they previously had little or no input in. Then there’s the doorway that VSphere 4 has opened for backup gurus such as Veeam to plug into virtual machines and take advantage of the latest developments such as Change Block Tracking and vStorage APIs bringing customers a more sophisticated and sound approach to VM backups. Hyper V still has no VMsafe equivalent and certainly no Change Block Tracking.

Furthermore as Microsoft flaunt Hyper V’s latest developments, scrutiny shows that they are merely features that have been available on VMware for several years and even then still don’t measure up in terms of performance. Point in case being Hyper V’s rather ironically titled ’Quick Motion’. For high availability and unplanned downtime protection Hyper-V clusters have a functionality that restarts Virtual Machines on other cluster nodes if a node fails. With ‘Quick Motion’ a Virtual Machine is then moved between cluster hosts. Where it fails though is in its inability to do the action instantly as is the case with VMware’s vMotion and HA features. This hardly exudes confidence in Hyper V when a potential move that can take several seconds leaves you exposed to the risk of a network connection failure which consequently results in further unplanned downtime. Subsequently Quick Motion’s inability to seamlessly move Virtual Machines across physical platforms results in downtime requirements for any potential server maintenance. This is certainly not the case with VMware and vMotion wherein server maintenance requiring downtime is a thing of the past.

Moreover so seamless is the vMotion process that the end user has no idea that his virtual machine has just crossed physical platforms while they were inputting new data. This leads us to Hyper V’s reaction and improved offering now termed Live Migration which Microsoft claim is now on a par with vMotion. Upon further inspection this still isn’t the case as the amount of vMotion operations that can be simultaneously done between physical servers is still far more limited with Hyper V. Additionally while Hyper V claims to be gaining ground, VMware in return have shot even further ahead with VSphere4’s Storage vMotion capabilities which allows ‘on the fly’ relocation of virtual disks between the storage resources within their given cluster. So as VMware advances and fine tunes its features such as Distributed Resource Scheduler, Distributed Power Management (DPM), Thin Provisioning, High Availability (HA) etc., Hyper V is only just announcing similar functions.

Another issue with Hyper-V is that it’s simply an add-on of Windows Server which relies on a Windows 2008 parent partition i.e. it’s not a bare metal hypervisor as virtual machines have to run on the physical system’s operating system, (something akin to VMware’s Workstation). Despite Microsoft’s claims that the device drivers have low latency access to the hardware, thus providing a hypervisor-like layer that runs alongside the full Windows Server software, in practical terms those that have deployed both Hyper V and VMware can testify the performance stats are still not comparable. One of the reasons for this is that VMware have optimized their drivers with the hardware vendors themselves unlike Hyper V which sadly is stuck in the ‘Windows’ world.

This leads to my next point that with VSphere 4 there is no reliance on a general operating system and the various operating systems that are now supported by VMware continues to grow. Microsoft on the other hand, being the potential sinking ship that she is in the Enterprise Datacenter have tried to counter this advantage with marketing Hyper V as being able to run on a larger variety of hardware configurations. One snag they don’t talk about so much is that it has to be a hardware configuration that is designed to support Windows. Ironic when one of the great things about virtualization is that Virtual Machines with just about any operating system can now be run together on the same physical server, sharing pools or resources – not so for Microsoft and Hyper V who desperately try to corner customers to remain on a made-for-PC operating system that somehow got drafted into datacenters. Question now is how many more inevitable reboots will it take on a Windows Enterprise Server before IT managers say enough is enough?

Then there are some of the new features that were introduced in VSphere 4 which still have failed to take similar shape in the Hyper V realm. For example VMDirectPath I/O which allows device drivers in virtual machines to bypass the virtualization layer and access the physical resources directly – a great feature for workloads that need constant and frequent access to I/O devices.

There’s also the Hot-Add features wherein a virtual machine running Windows 2000 or above can have its network cards, SCSI adaptors, sound cards, CD-ROMs added or removed while still powered on. They even go further by letting your Win 2003 or above VM hot add memory or CPU and even extend your VMDK files – all while the machine is still running. There’s still nothing ‘hot’ to add from the Hyper V front.

Also instead of the headache inducing complexities that come with Microsoft’s Cluster Service, VSphere 4 comes with Fault tolerance – a far easier alternative for mission critical applications that can’t tolerate downtime or data loss. By simply creating a duplicate virtual machine on a separate physical host and via vLockstep technology to ensure consistency of data, VSphere 4 offers a long awaited and straightforward alternative to complex clustering that further enhances the benefits of virtualization. No surprise then that currently the Microsoft Hyper V sales guys tend to belittle it as no great advantage.

Another VSphere 4 feature which also holds great benefits and is non-existent in Hyper V is that of Memory overcommitment. This feature allows the allocation of more RAM to virtual machines than is physically available on the physical host. Via techniques such as Transparent page sharing, virtual machines can share their common code thus leading to significant savings in the all too common situation of having to add more memory to an existent server which equates to more than the cost price of the server.

So while Hyper V has also recently caught up with a Site Recovery Manager equivalent with the Citrix Essentials for Hyper V package, it’s still doing just that i.e. playing catch up. One of the main arguments for Hyper V is that it’s free or nearly free but again that’s the marketing jargon that fails to elaborate that you have to buy a license for a Windows Server first and hence help maintain the dwindling lifespan of Microsoft within the Datacenter. Another selling point that Hyper V had was that they were better aimed for small to medium sized businesses due to their cheaper cost….the recent announcement of VSphere 4.1 may now also put bed to that claim. So like all great empires, collapses are imminent and while I don’t believe Microsoft are going to the I.T. Black Hole, they certainly don’t look like catching up with VMware in the ever emerging and growing market of virtualization.

NetApp Justifies Storage Efficiency Tag with Primary Deduplication

Posted by Archie Hendryx on Friday, June 04, 2010

The pendulum has shifted. We are in an era in which Storage Managers are in the ascendancy while vendors must shape up to meet customer demands in order to survive the current economic plight. Long gone are the days of disk happy vendors who could easily shift expensive boxes of FC disks or Account Managers who boasted their huge margins at the selling of skyscraper storage systems to clients who faced an uphill struggle to meet their

constantly growing storage demands. With responses such as thin/dynamic/virtual provisioning arrays and automated storage tiering, vendors have taken a step towards giving customers solutions that will enable them to use more of what they already have as well as utilise cheaper disks. Another such feature now starting to really prick the conscience of vendors as customers become more savvy is that of primary deduplication or the more aptly termed ‘data reduction’. So as this cost saving surge continues some vendors have cheekily tried to counteract it with sales pitches for exorbitantly priced Flash SSDs (which promise 10 times performance yet shamelessly sit on the back end of Stor

age systems dependent on the latency of their BEDs and RAID controllers) as a means to keep margins up. But not the WAFL kings NetApp….

Mention deduplication and you most likely think of backup environments where redundant data is eliminated, leaving only one copy of the data and an index of the duplicated data should it ever be required for restoration. With only the unique data stored the immediate benefits of deduplication are obvious from a reduction in backup storage capacity, power, space and cooling requirements to reduction in the amount of data sent across the WAN for remote backups, replication and disaster recovery. Not only that, deduplication savings has also shifted the backup paradigm from tape to disk allowing quicker restores and reduced media handling errors (and yes I have made it no secret of giving kudos to Data Domain in this respect). Shift this concept now to primary storage though and you have a different proposition with different challenges and advantages.

Primary storage is accessed or written to constantly therefore necessitating that any deduplication process must be fast enough to eliminate any potential overhead or delay to data access. Add to the equation that unlike backup data the amounts of duplicate data are not in the same proportion as that found in Primary storage and you also have a lesser yield in deduplication ratios. Despite this though, NetApp have taken Primary deduplication by the horns and are offering genuine data reduction that extends beyond the false marketing of archiving and tiering being data reduction techniques when in fact all they are is the shoving of data onto different platforms.

Most vendors on the ‘data reduction’ bandwagon have gone with file level deduplication which looks at the file system itself replacing identical files with one copy and links for the duplicate files. Hence there is no requirement for the file to be decompressed or reassembled upon end user request due to the same data merely having numerous links. Therefore the main advantage is that data access should be without any added latency. In real terms though this minimalist approach doesn’t produce data reduction ratios that yield anything significant for the user to be particularly excited about.

On the flip side what is referred to as sub file level deduplication has an approach familiar to those who already use deduplication for their backups. Using the hash based technology; files are first broken into chunks. Each chunk of data is then assigned a unique identification, whereupon duplicated identifications of chunks are replaced with a pointer to the original chunk. Such an approach brings the added advantage of discovering duplicate patterns in random places irregardless of how the data is saved. With the addition of compression end users can also significantly reduce the size of chunks. Of course this also adds the catch 22 situation of deduplication achieving better efficiency with smaller chunks, while compression is more effective with larger chunks. Hence why NetApp have yet to incorporate compression alongside their sub level deduplication. Despite this though NetApp are showing results that when put in a virtual context are more than impressive.

One of the first major vendors to incorporate primary data deduplication, NetApp is comfortably verifying their ‘storage efficiency’ selling tag when put in the context of server and desktop virtualisation. One of the many benefits of VMware (or other server virtualisation platforms) is their ability to rapidly deploy new virtual machines from stored templates. Each of these VM templates includes a configuration file and several virtual disk files. It is these virtual disk files that include the operating system, common applications and patch systems or updates and it is these that are constantly duplicated each time a cloned VM is deployed. Imagine now a deployment of 200 like for like VMs and then put NetApp’s primary deduplication process wherein multiple machines end up sharing the same physical blocks in a FAS system and you’ve got some serious reduction numbers and storage efficiency. With reduction results of 75% to 90%, NetApp’s advantage comes from their long established snapshot-magic producing WAFL (write anywhere file level) technology. With its in built CRC checksum for each block of data store, the WAFL already has block-based pointers. By running the deduplication at scheduled times all checksums are examined, with the filer doing a block-level comparison of blocks if any of the checksums match. If a match is identified, then one of the WAFL block-based pointers simply replaces the duplicated block. Due to the scheduled nature of the operation occurring during quiet periods, the performance impact is also not that intrusive giving the NetApp solution significant storage savings especially when similar operating systems and applications are grouped into the same datastores. Add to the mix that NetApp’s PAM (Performance Accelerator Module) is also dedupe-aware, common block reads are quickly satisfied from cache bringing even faster responses by not having to search through every virtual disk file (VMDK). NetApp also ‘go further, faster’ so to speak with the addition of their flex clone technology which rapidly deploys VM clones which are also prededuplicated.

So while arguments may be raised that NetApp’s sub level deduplication suffers from the physical layer constraints of WAFL’s 4KB block size or their lack of compression, the truth is that they have deliberately avoided such alternatives. If they’d have opted for using sliding block chunking where a window is passed along the file stream to seek out a more naturally occurring internal file or added compression algorithms, the overhead that would come with such additions would render most of the advantages of primary

dedupe worthless. Yes, Ocarina and Storwize have appliances that compress and uncompress data as it’s alternatively stored and read but what performance overhead do such technologies have when hundreds of end users concurrently access the same email attachment? As for Oracle’s Solaris ZFS file system sub level deduplication which is yet to see the light of day one wonders how much hot water it will get Oracle into should it turn out to be a direct rip off of the NetApp model.

Bottom line is as long as the primary deduplication model you employ gives you the reduction numbers worth the inevitable overhead then it's more than a beneficial cost saving feature. Furthermore while I’m the first to admit that NetApp certainly have their flaws but when it comes to primary deduplication and consequent data reduction they really are making your storage more efficient.

EMC's VPLEX & Planned Takeover of DataDirect Networks

Posted by Archie Hendryx on Friday, May 28, 2010

Talk to the average Storage Engineer who manages the growth of your datacenter’s modular system about Petaflops, Exabytes, Petabytes of Archives or 1TB of sustained bandwidth and you’ll probably find them scratching their heads in disbelief. This is the reality that does exist in the world of super computing and what is sometimes referred to as Extreme Storage. While some Storage Managers would feel they are suffering with their exponential data growth and decreasing budgets, their problems can’t be classified as ‘Extreme’ unless they’re dealing with ExaBytes (1018 bytes) of storage with trillions of data transactions per second, trillions of files and a data transfer rate from storage to application that exceeds a TB per second. Couple that with the conundrum that it’s for relatively few users and requires the data to be secure, both for short-term and long-term retention and then you have a real case for Extreme Storage……. well at least for now.

Such figures though are not the concern of the average datacenter manager or storage vendors with architectures that are catered and designed for IOPS centric database driven applications so much so that even SNIA has yet to give Extreme Storage the relevance of a definition in their Storage Dictionary. Not EMC though, where if my sources are correct, they deem Extreme Storage not only a key to their own future but also that of the Storage industry’s such that they are already concocting an audacious takeover plan of the company DataDirect Networks.

Before I embark upon my controversial claim, let’s rewind back a few weeks to EMC World, Boston where most of the buzz centred on the launch of the new VPLEX. A nifty idea that would take cloud computing enthusiasts to an ever approaching reality by creating heterogeneous pools of storage that can be accessed and shared over a distance. Couple that with VMware integration and you have the ability to VMotion your applications across datacenters that are miles apart - a great idea and one that strikes a double whammy at both Storage vendors HDS/IBM/HP/NetApp and the virtualisation ‘catch-up guys’ MS Hyper-V and Red Hat. As IT Directors yearn for a virtual infrastructure for their applications that goes beyond the physical limitations of the datacenter, EMC’s trump card of setting up a centrally managed pool of virtual resources spread across numerous datacenters via VPLEX is nothing short of a ‘virtual’ revolution. With Site Recovery Manager, VMware already had the edge over their competitors by in essence providing an extended version of their ‘high availability’ concept that could span across data centers. With VPLEX the VMotion concept of moving a virtual server across physical platforms ‘on the fly’ can now also be extended across datacenters. Moreover while EMC have currently failed to corner the market of virtualisation of heterogeneous storage dominated by HDS and IBM with their product Invista, the launch of the VPLEX now takes that battle head on with the added value of cross-site virtualisation. So how then does this link to my bold prediction that Extreme Storage is next on EMC’s radar with more significantly a proposed takeover of the company DDN?

The VPLEX model is poised to have four versions, two of which are already available namely VPLEX Local and VPLEX Metro with VPLEX Geo and VPLEX Global to follow suit. VPLEX Local is the straightforward virtualisation of heterogeneous storage systems behind one management pane within your datacenter, a solution that has successfully been offered by HDS for several years. VPLEX Metro though allows the concept to stretch up to 100km, hence enabling the virtualisaion of storage spanning datacenters across cities. Based on a combination of hardware and software which is placed between the traditional FC attached storage systems and the servers, the VPLEX rack virtualizes the heterogeneous mix of disk arrays into what EMC term ‘a federated pool of virtual storage’. As for the hardware itself, it contains a management server, FC switches, Ethernet switches, the standard redundant power supplies and the VPLEX engines. Within each engine rests a pair of quad core Intel CPUs and directors which each contain 32 FC ports with 8Gbps bandwidth. With an active-active cluster spread over one to four VPLEX engines the requirements to seamlessly VMotion applications across a 100km distance is more than easily met, hence being coined VPLEX Metro. The question that now stands is for the proposed VPLEX Geo and VPLEX Global i.e. would such hardware and performance stats add up for say data that needs to be VMotioned across continents as the name suggests? Indeed such distances and endeavours would not be the requirement of EMC’s regular customer base of industries that demand financial transaction processing but rather those that are facing a content nightmare and need the expertise and performance figures that are associated with Extreme Storage.

When you’re talking Extreme Storage you’re talking DDN i.e. DataDirect Networks. While relatively unknown, DDN still possess an impressive resume of HPC clients from NASA, Lawrence Livermore Laboratories to movie special effects users like Pacific Title & Art Studio. Thus as far as being a company that can act as a platform from which EMC can build out its ‘global’ and ‘geo’ cloud storage offerings, DDN already have credible references to do so quite easily.

Furthermore a potential acquisition of DDN will allow EMC to penetrate a HPC customer base that they’re currently unfamiliar to. Fields ranging from High Energy Physics companies such as Fermilab, Nuclear research organisations such as CERN, particle physics research companies such as DESY to National Security and Intelligence are all potential clients that EMC could take on with a new Extreme Storage Platform that incorporates VPLEX and deals with large data that is locally or globally distributed with long-term retention. It would clearly give EMC a major distinction from its current major competitors.

Ironically though it’s one of EMC’s current competitors, HP that have already made moves into Extreme Storage with their Ibrox based HP 9100 Extreme. The HP 9100 was marketed as an Extreme Storage system and was shamelessly targeting Web companies and their like who required multipetabytes of data storage. HP’s aim was to profit from an emerging market of heavy users such as ever growing and popular social networks with their online subscriber information and video content as well as users of video surveillance systems and research organizations. While this was a brave attempt even HP had to concede to DDN’s supremacy and expertise in the field when they only this week agreed an OEM relationship for DataDirect Networks (DDN)’s S2A9900 disk array to be bundled with the Lustre File System resold by the SCI group within HP. Indeed HP are now like every large HPC OEM vendor out there – reselling DDN. With partnerships already with IBM, Dell and SGI, the one big name missing from the list is EMC. Now with the VPLEX Global and Geo offering soon to be unveiled, a relationship with DDN whether it be an acquisition or OEM seems inevitable.

In fact DDN and EMC are certainly no strangers to each other, when last year the former launched a direct onslaught on EMC's Atmos cloud storage product with their Web-Optimised Scaler (WOS). Designed for geographically dispersed global storage clouds, the WOS is a geo-cluster of quasi-filers that store objects with an API-access to a global namespace. Boasting scalability that is currently growing beyond 200 billion files of storage and 1 million file reads per second for objects, such stats are effortlessly achieved through the simultaneous access of numerous WOS boxes. Hence not only flooring EMC's Atmos in terms of transaction rates and file retrieval rates, being a file-based product the WOS also hits the EMC NAS jugular namely the Celerra. As for how the WOS works on a global scale; the storing of objects, which are files or groups of files are each given a unique object number which identifies the datacenter containing the WOS system that stores the object and the object itself. Datacenters are linked via WOS nodes which form the WOS cloud while the WOS API is used to access servers to read or write objects to the WOS cloud. A straightforward concept but the question now is how much of this explanation will replace the word DDN with EMC and WOS with VPLEX Global come the launch of the latest EMC masterplan? Put this in the wider context of the upcoming VPLEX Geo and Global, and I have little doubt that EMC Execs (renowned for preferring to spend outrageously than OEM a potential competitor) are furiously sharpening their pencils, carefully concocting a takeover of the still relatively small yet growing company that is DDN.

After the rapidly swift takeover of Data Domain last year, nothing surprises me anymore with regards to the financial clout of EMC. So while a takeover of DDN will not only bring about the removal of the competitive edge that DDN currently poses and enable EMC’s vision of ‘VPLEXing’ across the globe to become an instant reality, the benefits of such a deal would bring even more so to EMC’s constantly growing portfolio. In the words of Bob Dylan,“Times are a changing” and the digital content explosion brought about by the rapid growth of online, nearline and backup data pools has left the traditional storage systems designed by EMC and their like defunct and inadequate to compete in such a vast growing market. Like a crumbling empire the domination of transactional data that factored so heavily in the design of storage systems has ended with an unscrupulous coup de etat of unstructured data requiring extreme performance, scalability and density becoming the mainstream. EMC have clocked on to this and are pushing their future be involved in this direction. Should a DDN deal go through, EMC will not only have advanced themselves into a new customer base but would also bring in vast technical expertise ranging from high-speed FPGA parity calculation accelerators instead of RAID systems, high speed Infiniband interconnects etc. that can only enhance their current Enterprise and Modular range. As for EMC’s direct competitors such as HP, IBM, HDS etc. who will they have to turn to for an OEM deal or expertise should they also decide to enter the fast growing market trend towards Extreme Storage……perhaps EMC themselves if these predicted developments are to bear fruit.

The Unified Storage Battlefield Could Decide the Future of Storage

Posted by Archie Hendryx on Thursday, April 29, 2010

In the past week HDS finally revealed their response to the VMware-Cisco-EMC alliance with the launch of a unified computing platform including integrated storage, server, and networking technology. With the aid of Microsoft, HDS have stated that their centralized storage, server and networking platform will be launched early next year. In the tradition of my enemy’s enemy is my friend, HDS have also signed an OEM deal with Microsoft under which Microsoft System Center Operations Manager, System Center Virtual Machine Manager and Windows Server 2008 R2 will be tightly integrated with Hyper-V. Added to this is HDS Dynamic Provisioning and the HDS Storage Cluster for Microsoft Hyper-V. Moreover despite the secrecy, the networking brains behind the platform are most probably Brocade, the grandfathers of SAN who also now have a sound grip on IP networking since their acquisition of Foundry back in 2008.

Well, it’s no surprise that with the current turmoil brought upon by the disbandment of the SUN OEM deal, HDS are desperate to announce a new product despite it being more than six months away. But the trend towards Unified Storage is one that is being followed by many in an attempt to adhere to the economic climate and the rapid drive towards consolidation. While at one point it was NetApp’s domain of which no one seemed to be interested in, the Unified Storage demand has grown considerably with customers seeing the mass of potential and savings that come with running and managing files and applications from a single device. By consolidating file-based and block-based access in a single platform and hence supporting FC, iSCSI, and NAS, customers immediately reap the benefits of reduced hardware requirements, lower capital expenditures and simplified single pane management. So now the war of vendors has entered a new battlefield in which nearly all are making a bid to usurp the lion’s share of the spoils. But like in every battle there will ultimately be casualties…

Cisco, the IP kings have bravely entered the arena and are pushing forward plans with their combination of networking, blade servers and storage in a single architecture i.e the Unified Computing System (UCS) platform. Whether they can convince customers that an IP switch company can build servers remains to be seen, but Cisco already proved doubters wrong when they successfully entered the SAN market by drawing on the IP niche that they had established in just about every data center in the world.

HP's acquisition of 3Com on the other hand was instigated to provide the networking brains for their ‘converged computing’ model that binds server, storage, and networking resources. How the powerhouse of HP will fair is not at as difficult to predict given the success of their blade systems and credence amongst customers as a server platform provider. But are they entering the arena too late and how will this fair with their OEM relationship with HDS?

Within this battlefield of generals, there are also some charlatans who have cheekily tried to gain some market share just by coining the term ‘unified storage’. IBM and NEC for example, have brought out backup and recovery systems within a single architecture that lack any NAS support, yet still coin the term ‘unified storage’. Such pretenders may suffer an early death especially when smaller companies such as BlueArc go the whole nine yards with their Titan systems that not only support SAN and NAS but can also utilize WAN via Riverbed's Steelhead networking solution.

Then there’s the SUN 7000 series from Oracle’s Sun Microsystems. A great bargain for the amount of functionality that it provides from unlimited snapshots, integral data compression, iSCSI thin provisioning, virus scanning, remote replication as well as the expected support for CIFS, NFS, HTTP, FTP and iSCSI. Additionally the 7000 series supports RAID 5, RAID 6 arrays and ZFS Hybrid storage pools which can capitalize on the high performance of Flash memory devices and DRAM memory. Yet despite how great the 7000 is, it’s coming from a camp that has been mortally wounded with the messy Oracle takeover and the bureaucracy that surrounds it, to which customers are now suffering the effects of. Will customers purchase a great product that will immerse it into an eon of political wrangling when they need and rely on quick and timely support?

It’s evident that HDS or anyone else for that matter, which coins the term ‘Unified Storage’, is going to have a tough time dealing with EMC. The marketing machine which currently knows no bounds, made an unashamed onslaught on the small business market cornered by NetApp when they launched the Celerra. While in essence it was just a Clariion with a NAS gateway, it fully supported SAN and NAS as well as NFS 2, 3 and 4, and CIFS file sharing. Furthermore EMC’s entry into the market seems to be with a strategic plan that seems to span the company as a whole, which is minimizing its different hardware platforms.

When EMC released the V-Max platform, one of the most notable things was its usage of hardware components that were already available on other EMC hardware platforms. From the Clariion-esque disk drives, flash drives, DAE’s, LCC’s, Intel x64 CPU’s, fans to power supplies, the Celerra, like the V-Max is also made in the same mould. With the Clariion, CDL, EDL and Celerra platforms all sharing similar hardware components, it’s only a matter of time before the anomalous architecture of the archive platform, Centera is either changed to fit the mould or replaced completely in favour of a unified platform that seamlessly integrates with the Celerra or Clariion.

As Cisco had done before them when they added SAN to their IP portfolio and what NetApp have done to some extent with ONTAP, EMC’s common hardware direction could eventually lead to underlying software being the only thing which distinguishes different EMC platforms.

So while currently unified storage limits the level of control in file-based versus block-based I/O and hence does give lesser performance than its dedicated block-based counterpart, a strategic approach that takes a long term look at the term ‘unified’ could change the face of high end storage systems in the future. As storage systems move further towards consolidation, it is indeed the winner in the battlefield of unified storage that that will eventually draw others to a new beginning and approach and ultimately the end of the current trend of 7 feet tall high end enterprise systems that have housed data centers for so many years. A self tiering SATA / SSD Unified Platform without FC disks?….Let’s watch this space.

Data Domain's CPU Centric Deduplication Genius is no Dupe

Posted by Archie Hendryx on Tuesday, April 06, 2010

Last year EMC’s somewhat controversial acquisition of Data Domain right under the noses of NetApp raised several eyebrows to say the least. Considering the reported amount of $2.1 billion and their already deduplication packed portfolio which consisted of the source based Avamar, the file-level deduplication/compression of its Celerra filer and their Quantum dedupe integrated VTLs, some heads were left scratching as to what actually was the big deal with the target based deduplication solution of Data Domain. Almost a year on and with Data Domain’s DD880 being adopted by an ever growing customer base, the heads have stopped scratching and are paying close attention as to what is probably the most significant advancement in backup technology of the last decade.

With deduplication currently being all the rage, with possibly only ‘Cloud Computing’ overshadowing it, the benefits of deduplication are becoming an exigency for backup and storage architects. With most backup software producing copious amounts of duplicate data stored in multiple locations, deduplication offers the ability to eliminate those redundancies and hence use less storage, less bandwidth for backups and hence shrink backup windows. With source based and file level based deduplication offerings, it is Data Domain’s target based solution i.e. the big black box that is clearly taking the lead and producing the big percentages in terms of data reduction. So what exactly is so amazing about the Data Domain solution, when upon initial glance at for example the DD880 model, all one can see is just a big black box? Even installing one of the Data Domain boxes hardly requires much brainpower apart from the assignment of an IP address and a bit of cabling. And as for the GUI, one could easily forget about it as the point of the ‘big black box’ is that you just leave it there to do its thing and sure enough it does its thing.

And while the big black box sits there in your data center the figures start to jump out at you where an average backup environment can see a reduction of up to 20 times. For example a typical environment with a first full backup of 1TB with only 250GB of physical data will immediately see a quadrupled reduction. If such an environment was to take weekly backups with a logical growth rate of 1.4TB per week but with only a physical growth of 58GB per week, the approximate reduction could go up to more than 20 times within four months:

Reduction =
First Full + (Cumulative Logical Growth x Number of weeks) / Physical Full + (Cumulative Physical Growth x Number of weeks)

e.g. After 25 weeks
Reduction = 1TB + (1.4TB x 25) / 0.250TB + (0.058TB x 25)
= 35TB / 1.7TB
= 21 times less data is backed up

So how does Data Domain come up with such impressive results? Upon closer inspection, despite being considered the ‘latest technology’, Data Domain’s target based deduplication solution has actually been around since 2003, so in other words these guys have been doing this for years. Now in 2010 with the DD880, to term their latest ‘cutting edge’ would be somewhat misleading when a more suitable term would be ‘consistently advancing’. Those consistent advancements have come from the magic of the big black box being based on its CPU-centric architecture and hence not reliant upon adding more disk drives. So whenever Intel unveils a new processor, Data Domain does likewise with its incorporation into their big black box. Consequently the new DD880’s stunning results are the result of its incorporation of a quad-socket quad-core processor system. With such CPU power the DD880 can easily handle aggregate throughput to up to 5.4 TB per hour and single-stream throughput of up to 1.2 TB per hour while supporting up to 71 TB of usable capacity, leaving its competitors in its wake. Having adopted such an architecture, Data Domain have pretty much guaranteed a future of advancing their inline deduplication architecture by taking advantage of every inevitable advance on Intel's CPUs.

Unlike the source based offerings, Data Domain’s Target-based solution is controlled by a storage system rather than a host and thus takes the files or volumes from the disk and simply dumps them onto to the disk-based backup target. The result is a more robust and sounder solution to a high change-rate environment or one with large databases where RPOs can be met a lot easier than with a source-based dedupe solution.

Another conundrum that Data Domain’s solution brings up is the future of tape based backups. The cheap RAID 6 protected 1 TB / 500 GB 7.2k rpm SATA HDD disks used by the DD880 alongside the amount of data reduced via its deduplication also brings into question the whole cost advantage of backing up to tape. If there’s less data to back up and hence fewer disks than tape required, what argument remains for avoiding the more efficient disk to disk back up procedure? An elimination of redundant data with a factor of 20:1 brings the economics of disk backup closer than ever to those of tape backups. Couple that with the extra costs of tape backups often failing, the tricky recovery procedures of tape based backups as well as backup windows which are increasingly scrutinized; this could well be the beginning of the end of the Tape Run guys having to do their regular rounds to the safe.

Furthermore with compatibility already with CIFS, NFS, NDMP and the Symantec OpenStorage, word is already out that development work is being done to integrate closer with EMC’s other juggernauts VMware and Networker. So while deduplication and its many forms saturate the market and bring in major cost savings to backup architectures across the globe, it is Data Domain’s CPU based, target based inline solution which has the most promising foundation and future and currently unsurpassable results. $2.1 billion? Sounds like a bargain.