The Unified Storage Battlefield Could Decide the Future of Storage

In the past week HDS finally revealed their response to the VMware-Cisco-EMC alliance with the launch of a unified computing platform including integrated storage, server, and networking technology. With the aid of Microsoft, HDS have stated that their centralized storage, server and networking platform will be launched early next year. In the tradition of my enemy’s enemy is my friend, HDS have also signed an OEM deal with Microsoft under which Microsoft System Center Operations Manager, System Center Virtual Machine Manager and Windows Server 2008 R2 will be tightly integrated with Hyper-V. Added to this is HDS Dynamic Provisioning and the HDS Storage Cluster for Microsoft Hyper-V. Moreover despite the secrecy, the networking brains behind the platform are most probably Brocade, the grandfathers of SAN who also now have a sound grip on IP networking since their acquisition of Foundry back in 2008.

Well, it’s no surprise that with the current turmoil brought upon by the disbandment of the SUN OEM deal, HDS are desperate to announce a new product despite it being more than six months away. But the trend towards Unified Storage is one that is being followed by many in an attempt to adhere to the economic climate and the rapid drive towards consolidation. While at one point it was NetApp’s domain of which no one seemed to be interested in, the Unified Storage demand has grown considerably with customers seeing the mass of potential and savings that come with running and managing files and applications from a single device. By consolidating file-based and block-based access in a single platform and hence supporting FC, iSCSI, and NAS, customers immediately reap the benefits of reduced hardware requirements, lower capital expenditures and simplified single pane management. So now the war of vendors has entered a new battlefield in which nearly all are making a bid to usurp the lion’s share of the spoils. But like in every battle there will ultimately be casualties…

Cisco, the IP kings have bravely entered the arena and are pushing forward plans with their combination of networking, blade servers and storage in a single architecture i.e the Unified Computing System (UCS) platform. Whether they can convince customers that an IP switch company can build servers remains to be seen, but Cisco already proved doubters wrong when they successfully entered the SAN market by drawing on the IP niche that they had established in just about every data center in the world.

HP's acquisition of 3Com on the other hand was instigated to provide the networking brains for their ‘converged computing’ model that binds server, storage, and networking resources. How the powerhouse of HP will fair is not at as difficult to predict given the success of their blade systems and credence amongst customers as a server platform provider. But are they entering the arena too late and how will this fair with their OEM relationship with HDS?

Within this battlefield of generals, there are also some charlatans who have cheekily tried to gain some market share just by coining the term ‘unified storage’. IBM and NEC for example, have brought out backup and recovery systems within a single architecture that lack any NAS support, yet still coin the term ‘unified storage’. Such pretenders may suffer an early death especially when smaller companies such as BlueArc go the whole nine yards with their Titan systems that not only support SAN and NAS but can also utilize WAN via Riverbed's Steelhead networking solution.

Then there’s the SUN 7000 series from Oracle’s Sun Microsystems. A great bargain for the amount of functionality that it provides from unlimited snapshots, integral data compression, iSCSI thin provisioning, virus scanning, remote replication as well as the expected support for CIFS, NFS, HTTP, FTP and iSCSI. Additionally the 7000 series supports RAID 5, RAID 6 arrays and ZFS Hybrid storage pools which can capitalize on the high performance of Flash memory devices and DRAM memory. Yet despite how great the 7000 is, it’s coming from a camp that has been mortally wounded with the messy Oracle takeover and the bureaucracy that surrounds it, to which customers are now suffering the effects of. Will customers purchase a great product that will immerse it into an eon of political wrangling when they need and rely on quick and timely support?

It’s evident that HDS or anyone else for that matter, which coins the term ‘Unified Storage’, is going to have a tough time dealing with EMC. The marketing machine which currently knows no bounds, made an unashamed onslaught on the small business market cornered by NetApp when they launched the Celerra. While in essence it was just a Clariion with a NAS gateway, it fully supported SAN and NAS as well as NFS 2, 3 and 4, and CIFS file sharing. Furthermore EMC’s entry into the market seems to be with a strategic plan that seems to span the company as a whole, which is minimizing its different hardware platforms.

When EMC released the V-Max platform, one of the most notable things was its usage of hardware components that were already available on other EMC hardware platforms. From the Clariion-esque disk drives, flash drives, DAE’s, LCC’s, Intel x64 CPU’s, fans to power supplies, the Celerra, like the V-Max is also made in the same mould. With the Clariion, CDL, EDL and Celerra platforms all sharing similar hardware components, it’s only a matter of time before the anomalous architecture of the archive platform, Centera is either changed to fit the mould or replaced completely in favour of a unified platform that seamlessly integrates with the Celerra or Clariion.

As Cisco had done before them when they added SAN to their IP portfolio and what NetApp have done to some extent with ONTAP, EMC’s common hardware direction could eventually lead to underlying software being the only thing which distinguishes different EMC platforms.

So while currently unified storage limits the level of control in file-based versus block-based I/O and hence does give lesser performance than its dedicated block-based counterpart, a strategic approach that takes a long term look at the term ‘unified’ could change the face of high end storage systems in the future. As storage systems move further towards consolidation, it is indeed the winner in the battlefield of unified storage that that will eventually draw others to a new beginning and approach and ultimately the end of the current trend of 7 feet tall high end enterprise systems that have housed data centers for so many years. A self tiering SATA / SSD Unified Platform without FC disks?….Let’s watch this space.

Data Domain's CPU Centric Deduplication Genius is no Dupe

Last year EMC’s somewhat controversial acquisition of Data Domain right under the noses of NetApp raised several eyebrows to say the least. Considering the reported amount of $2.1 billion and their already deduplication packed portfolio which consisted of the source based Avamar, the file-level deduplication/compression of its Celerra filer and their Quantum dedupe integrated VTLs, some heads were left scratching as to what actually was the big deal with the target based deduplication solution of Data Domain. Almost a year on and with Data Domain’s DD880 being adopted by an ever growing customer base, the heads have stopped scratching and are paying close attention as to what is probably the most significant advancement in backup technology of the last decade.

With deduplication currently being all the rage, with possibly only ‘Cloud Computing’ overshadowing it, the benefits of deduplication are becoming an exigency for backup and storage architects. With most backup software producing copious amounts of duplicate data stored in multiple locations, deduplication offers the ability to eliminate those redundancies and hence use less storage, less bandwidth for backups and hence shrink backup windows. With source based and file level based deduplication offerings, it is Data Domain’s target based solution i.e. the big black box that is clearly taking the lead and producing the big percentages in terms of data reduction. So what exactly is so amazing about the Data Domain solution, when upon initial glance at for example the DD880 model, all one can see is just a big black box? Even installing one of the Data Domain boxes hardly requires much brainpower apart from the assignment of an IP address and a bit of cabling. And as for the GUI, one could easily forget about it as the point of the ‘big black box’ is that you just leave it there to do its thing and sure enough it does its thing.

And while the big black box sits there in your data center the figures start to jump out at you where an average backup environment can see a reduction of up to 20 times. For example a typical environment with a first full backup of 1TB with only 250GB of physical data will immediately see a quadrupled reduction. If such an environment was to take weekly backups with a logical growth rate of 1.4TB per week but with only a physical growth of 58GB per week, the approximate reduction could go up to more than 20 times within four months:

Reduction =
First Full + (Cumulative Logical Growth x Number of weeks) / Physical Full + (Cumulative Physical Growth x Number of weeks)

e.g. After 25 weeks
Reduction = 1TB + (1.4TB x 25) / 0.250TB + (0.058TB x 25)
= 35TB / 1.7TB
= 21 times less data is backed up

So how does Data Domain come up with such impressive results? Upon closer inspection, despite being considered the ‘latest technology’, Data Domain’s target based deduplication solution has actually been around since 2003, so in other words these guys have been doing this for years. Now in 2010 with the DD880, to term their latest ‘cutting edge’ would be somewhat misleading when a more suitable term would be ‘consistently advancing’. Those consistent advancements have come from the magic of the big black box being based on its CPU-centric architecture and hence not reliant upon adding more disk drives. So whenever Intel unveils a new processor, Data Domain does likewise with its incorporation into their big black box. Consequently the new DD880’s stunning results are the result of its incorporation of a quad-socket quad-core processor system. With such CPU power the DD880 can easily handle aggregate throughput to up to 5.4 TB per hour and single-stream throughput of up to 1.2 TB per hour while supporting up to 71 TB of usable capacity, leaving its competitors in its wake. Having adopted such an architecture, Data Domain have pretty much guaranteed a future of advancing their inline deduplication architecture by taking advantage of every inevitable advance on Intel's CPUs.

Unlike the source based offerings, Data Domain’s Target-based solution is controlled by a storage system rather than a host and thus takes the files or volumes from the disk and simply dumps them onto to the disk-based backup target. The result is a more robust and sounder solution to a high change-rate environment or one with large databases where RPOs can be met a lot easier than with a source-based dedupe solution.

Another conundrum that Data Domain’s solution brings up is the future of tape based backups. The cheap RAID 6 protected 1 TB / 500 GB 7.2k rpm SATA HDD disks used by the DD880 alongside the amount of data reduced via its deduplication also brings into question the whole cost advantage of backing up to tape. If there’s less data to back up and hence fewer disks than tape required, what argument remains for avoiding the more efficient disk to disk back up procedure? An elimination of redundant data with a factor of 20:1 brings the economics of disk backup closer than ever to those of tape backups. Couple that with the extra costs of tape backups often failing, the tricky recovery procedures of tape based backups as well as backup windows which are increasingly scrutinized; this could well be the beginning of the end of the Tape Run guys having to do their regular rounds to the safe.

Furthermore with compatibility already with CIFS, NFS, NDMP and the Symantec OpenStorage, word is already out that development work is being done to integrate closer with EMC’s other juggernauts VMware and Networker. So while deduplication and its many forms saturate the market and bring in major cost savings to backup architectures across the globe, it is Data Domain’s CPU based, target based inline solution which has the most promising foundation and future and currently unsurpassable results. $2.1 billion? Sounds like a bargain.