The SANMAN: March 2012

Storage According to the VMware Admin: SDRS, SIOC, VASA & Storage vMotion

Posted by Archie Hendryx on Sunday, March 25, 2012

System Admins were generally the early embracers and end users of VMware ESX as they immediately recognized the benefits of virtualization. Having been bogged down with the pains of running physical servers such as downtime for maintenance, patching and upgrades, they were the natural adopters of the bare metal hypervisor. The once Windows 2003 system admin was soon configuring virtual networks and VLANs as well as carving up Storage datastores, quickly empowering them as the master of this new domain that was revolutionizing the datacenter. As the industry matured in its understanding of VMware, so did VMware’s recognition that the networking, security and storage expertise should be broadened to those that had been involved in such work in the physical world. Along came features such as the Nexus 1000v and VM vShield that enabled the network and security teams to also plug into the ‘VM world’ enabling them to add their expertise and participate in the configuration of the virtual environment. With vSphere 5, VMware took the initiative further by bridging the Storage and VMware gap with new features that Storage teams could also take advantage of. Despite this terms such as SIOC, Storage DRS, VASA and Storage vMotion still seem to draw blanks from most Storage folk or are looked down upon as ‘a VMware thing’. So what exactly are these features and why should Storage as well as VM admin take note of them as well as work together to take full advantage of their benefits?

Firstly there’s Storage DRS (SDRS) and in my opinion the most exciting new feature of vSphere 5. SDRS enables the initial placement and on-going space & load balancing of VMs across datastores that are part of the same datastore cluster. Simply put think of a datastore cluster as an aggregation of multiple datastores into a single object and SDRS balancing the space and I/O load across it.

In the case of space utilization, this takes place by ensuring that a set threshold is not exceeded. So should a VM reach say 70% space threshold, then SDRS will move the VMs via Storage vMotion to other datastores to balance out the load.

Storage DRS based on Space utilisation

The other balancing feature which is load balancing based on I/O metrics, uses the vSphere feature Storage I/O Control (SIOC). In this instance SIOC is used to evaluate the datastores in the cluster by continuously monitoring how long it takes an I/O to do a round trip and then feeds this information to Storage DRS. If the latency value for a particular datastore is above a set threshold value for a period of time, then SDRS will rebalance the VMs across the datastores in the cluster via Storage vMotion. With many Storage administrators operating ‘dynamic tiering’ or ‘fully automated tiering’ at the backend of their storage arrays, it’s vital that a co-operative design and decision is made to ensure that the right features are utilized at the right time.

Storage DRS based on I/O latency

While most are aware of vMotion’s capabilities of seamlessly migrating VMs across hosts, Storage vMotion is a slightly different feature that allows the migration of running VMs from one datastore to another without incurring any downtime. In vSphere 5.0, Storage vMotion has been improved by enabling the operation to take place a lot quicker.

It does this by using a new Mirror Driver mechanism that keep blocks on the destination synchronized with any changes made to the source after any initial copying. The migration process then does a single pass of the disk, copying all the blocks to the destination disk. If there are any blocks that have changed this copy, the mirror driver will then synchronise from the source to the destination. It’s this single pass block copy that enables Storage vMotion to take place a lot quicker, enabling the end user to reap the benefits immediately.

Storage vMotion & the new Mirror Driver

As for the new feature named VASA, this has a focus around providing insight and information to the VM admin about the underlying storage. To explain VASA in its simplest terms is a new set of APIs that enables storage arrays to provide vCenter with visibility into the storage’s configuration, health status and functional capabilities. VASA also allows the VM admin to see the features and capabilities of their underlying physical storage. It allows the admin to see details such as the number of spindles for a volume, the number of expected IOPS or MB/s, the RAID levels, whether the LUNs are either thick or thin provisioned or even if there are any deduplication or compression details. By leveraging the information provided by VASA, SDRS can also utilize this to make its recommendations on space and I/O load balancing. Basically VASA is a great feature that ensures VM admins can quickly provision storage to VMs that are most applicable to them.

This leads onto the feature termed Profile Driven Storage. Profile Driven Storage is a feature that enables you to select the correct datastore on which to deploy your VMs based on that datastore’s capabilities. So building a Storage Profile, can happen in two ways, one is that the storage device has its capability associated automatically via VASA. The other way is that the storage device’s capability is user-defined and manually associated.

VASA & Profile Driven Storage

With the User-Defined option you can apply labels to your storage, such as Bronze, Silver & Gold based on the capabilities of that Storage. So for example once a profile is created and the user-defined capabilities are added to a datastore, you can then use that profile to select the correct storage for a new VM. If the profile is compatible with the VM’s requirements it is said to be compliant, if they do not, then the VM is said to be non-compliant. So while VASA and profile driven storage are still a new feature, their potential is immense especially in the future, as storage admin can potentially work alongside VM admins to help classify and tier their data.

As mentioned before Storage I/O Control or SIOC is a feature that enables you to configure rules and policies to help specify the business priority of each VM. It does this by dynamically allocating I/O resources to your critical application VMs whenever there’s an I/O congestion detected. Furthermore by enabling SIOC on a datastore you can trigger the monitoring of device latency as observed by the hosts. As SIOC takes charge of I/O allocation to VMs it also by default ignores

Disk.SchedNumReqOutstanding (DSNRO). Typically it’s DSNRO that sets the Queue Depth on the hypervisor layer but once SIOC is enabled it consequently takes on this responsibility basing its judgements on the I/O congestion and policy settings. This offloads a significant amount of performance design tasks from the admins but ultimately still requires the involvement of the Storage team to ensure that I/O contention is not falsely coming from poorly configured Storage and highly congested LUNs.

SIOC ignores Disk.SchedNumReqOutstanding to set the Queue Depth at the hypervisor level

So while these new features are ideal for the SMB they may still not be the sole answer to every Storage / VMware related problem related to virtualizing mission critical applications. As with any new feature or technology their success relies in their correct planning, design and implementation and for that to happen a siloed VM or Storage only approach needs to be evaded.

Cisco's UCS - The Prime Choice for Cloud & Big Data Servers?

Posted by Archie Hendryx on Monday, March 19, 2012

Back in March 2009, when Cisco announced the launch of their UCS platform and subsequent intention to enter the world of server hardware, eyebrows were raised including my own. There was never any disputing that the platform would be adopted by some customers, certainly after seeing how Cisco successfully gatecrashed the SAN market and initially knocked Brocade off their FC perch. We’d all witnessed how Cisco used its IP datacenter clout and ability to propose deals that packaged both SAN MDS and IP switches with a consequent single point of support to quickly take a lead in a new market. Indeed it was only after Brocade’s 2007 acquisition of McData and when Cisco started to focus on FCoE that Brocade regained their lead in FC SAN switch sales. Where mine and others’ doubts lay were whether the UCS was going to be good enough to compete with the already proven server platforms of HP, IBM and Dell. Well, roll on three years and the UCS now boasts 11,000 customers worldwide and an annual run rate of £822m making it the fastest growing product in Cisco’s history. Amazingly Cisco is already third in worldwide blade server market share with 11%, closely behind HP and IBM. So now with this week’s launch of the UCS’ third generation and its integration of the new Intel Xeon processor E5-2600, it’s time to accept that all doubts have been swiftly erased.

Unlike other server vendors, Cisco’s UCS launch was from a fresh-fields approach that recognized the industry’s shift towards server virtualization and consolidation. Not tied down by legacy architectures, Cisco entered the server market at the same time Intel launched their revolutionary Intel Xeon 5500 processors and immediately took advantage with their groundbreaking memory extension feature. By creating a way to map four distinct physical memory modules (DIMMs) to a single logical DIMM that would be seen by the processor’s memory channel, Cisco introduced a way to have 48 standard slots as opposed to the 12 found in normal servers. With the new B200 M3 blade server, there’s now support for up to 24 DIMM slots for memory running up to 1600 MHz and up to 384 GB of total memory as well as 80 Gbits per second of I/O bandwidth.This is even more impressive when you factor in that with the Cisco UCS 5108 Chassis also being able to accommodate up to eight of these blades, scalability can go up to a remarkable 320 per Cisco Unified Computing System. Added to this Cisco took convergence further by making FCoE the standard with Fabric Interconnects that not only acted as the brains for their servers but also helped centralize management. With the ability to unite up to 320 servers as a single system, they also supported line-rate, low latency lossless 10 Gigabit Ethernet as well as FCoE. This enabled a unified network connection for each blade server with just a wire-once 10Gigabit Ethernet FCoE downlink, reducing cable clutter and centralizing network management via the UCS Manager GUI. Now with the newly launched UCS 6296UP, the Fabric Interconnect will double the switching capacity of the UCS fabric from 960Gbps to 1.92Tbps as well as the number of ports from 48 to 96.

Cisco UCS' Memory Extension

Other features such as FEX introduced the ability to ease management. FEX (Fabric Extenders) are platforms that act almost like remote line cards for the parent Cisco Nexus switches. Hence the Fabric Extenders don’t perform any switching and are managed as an extension of the fabric interconnects. This enables the UCS to scale to many chassis without increasing the amount of switches, as switching is removed from the chassis. Furthermore there is no need for separate chassis management modules as the fabric extenders alongside the fabric interconnects manage the chassis’ fans, power supplies etc. This means there’s no requirement to individually manage each FEX as everything is inherited from the upstream switch therefore allowing you to simply plug in and play a FEX for a rack of pre-cabled servers. Regardless of configured policies, upgrading or deploying of new features would simply require a change on the upstream switch because the FEX inherits from the parent switch, leading everything to be automatically propagated across the racks of servers.

The UCS components

With the aforementioned B200 M3 blade, there is also two mezzanine I/O slots, one that is coincidentally used by the newly launched 1240 virtual interface card. The VIC1240 provides 40 Gbps capacity which can of course be sliced up into virtual interfaces delivering flexible bandwidth to the UCS blades. Moreover with a focus on virtualization and vSphere integration, the VIC 1240 implements Cisco’s VM-FEX and supports VMware's VMDirectPath with vMotion technology. The concept of VM-FEX is again centered on the key benefits of consolidation, this time around the management of both virtual and physical switches. With the advent of physical 10GB links being standard, VM-FEX enables end users to move away from the complexity of managing standard vSwitches and consequently a feature that was designed and introduced when 1GB links were the norm. It does this by providing VM virtual ports on the actual physical network switch hence avoiding the hypervisor’s virtual switch. The VM’s I/O is therefore sent directly to the physical switch, making the VM’s identity and positioning information known to the physical switch, eliminating local switching from the hypervisor. Unlike the common situation when trunking of the physical ports was a requirement to enable traffic between VMs on different physical hosts, the key point here is that the network configuration is now specific to that port. That means once you’ve assigned a VLAN to the physical interface, there is no need for trunking and you’ve also ensured network consistency across your ESX hosts. The VM-FEX feature also has two modes, the first mode being called emulated mode where the VM’s traffic is passed through the hypervisor kernel. The other ‘high-performance’ mode utilizes the VMDirectPath I/O and bypasses the hypervisor kernel going directly to the hardware resource associated with the VM.

High Performance Mode utilises VMware's VMDirectPath I/O feature

Interestingly the VMDirectPath I/O feature is another key vSphere technology that often gets overlooked but one that adds great benefit by allowing VMs to directly access hardware devices. First launched in vSphere 4.0, one of its limitations was that it didn’t allow you to vMotion the VM, which may explain its lack of adoption. Now though with vSphere 5.0 and the UCS, vMotion is supported. Here the VIC sends the VM’s I/O directly to the UCS fabric interconnect, which then offloads the VM’s traffic switching and policy enforcement. By interoperating with VMDirectPath the VIC transfers the I/O state of a VM as well as its network properties (VLAN, port security, rate limiting, QoS) to vCenter as it vMotions across ESX servers. So while you may not get an advantage on throughput, where VMDirectPath I/O’s advantage lies is in its ability to save on CPU workloads by freeing up CPU cycles that were needed for VM switching, making it ideal for very high packet rate workloads that need to sustain their performance. Of course you can also now transition the device from one that is paravirtualized to one that is directly accessed and the other way around. VM-FEX basically merges the virtual access layer with the physical switch, empowering the admin to now provision and monitor from a consolidated point.

As well as blade servers, Cisco are also serving up (excuse the pun) new rack servers which update their C-class range; the 1U C220 M3 and the 2U C240 M3 server. With the announcement that the UCS Manager software running in the Fabric Interconnect will now be able to manage both blade and rack servers as a common entity, there is also news that this will eventually scale out as a single management domain for thousands of servers. Currently under the moniker of “Multi-UCS Manager”, the plan is to expand the current management domain limit of 320 servers to up to 10,000 servers spread across data centers around the world, empowering server admin to centrally deploy templates, policies, and profiles as well as manage and monitor all of their servers. This would of course bring huge dividends in terms of OPEX savings, improved automation and orchestration setting the UCS up as a very hard to ignore option in any new Cloud environment.

A single management pane for up to 10,000 servers

As well as Cloud deployments, the UCS is also being set up to play a key role in the explosion of big data. With the recent announcement that Greenplum and Cisco are finally teaming together to utilize the C-class rack servers, there is already talk of pre-configured Hadoop stacks. With Greenplum’s MR Hadoop distribution integrating with Cisco's C-class rack servers, it’s pretty obvious that the C-class UCS servers will also quickly gain traction in the market much like their B-series counterparts.

Incredibly it was not long ago that Cisco was just a networking company that’s main competitor was Brocade. Fast forward to March 2012 and Brocade’s CEO Mike Klayko is stating "If you can run Cisco products then you can run ours" to justify Brocade's IP credentials. When their once great competitor inadvertently admits they’re entering the IP world as a reaction to Cisco rather than a perceived demand from the market it really does showcase how far Cisco have come. It also speaks volumes that alternatively, Cisco proactively entered the server world when no perceived demand existed within that market. Three years later and with 11% market share and groundbreaking features built for the Cloud and Big Data, Cisco has moved far beyond its networking competitors and is well placed to be a mainstay powerhouse in the server milieu.

Disaster Recovery Monitoring

Posted by Archie Hendryx on Saturday, March 10, 2012

The key to a Disaster Recovery investment is being able to test and failover i.e. check that it actually works. Hence it is vital that the SAN that is being used for this replication be optimized and provide a RTO that meets with the business’ demands. While there are sufficient tools to monitor the IP or DWDM links for cross site replication, it is still best practice to TAP at the replication links for your Disaster Recovery infrastructure to incorporate the monitoring of the FC SAN.

Here's what was my final video for Virtual Instruments that quickly explains how you can proactively take out the "Disaster" from Disaster Recovery....

Understanding IOPS

Posted by Archie Hendryx on Friday, March 09, 2012

IOPS is commonly recognized as a standard measurement of performance whether to measure the Storage Array’s backend drives or the performance of the SAN. In its most basic terms IOPS are the number of operations issued per second, whether, read, writes or other and admins will typically use their Storage Array tools or applications such as Iometer to monitor IOPS.

IOPS will vary on a number of factors that include a system’s balance of read and write operations, whether the traffic is sequential, random or mixed, the storage drivers the OS background operations or even the I/O Block size.

Block size is usually determined by the application with different applications using different block sizes for various circumstances. So for example Oracle will typically use block sizes of 2 KB or 4 KB for online transaction processing and larger block sizes of 8 KB, 16 KB, or 32 KB for decision support system workload environments. Exchange 2007 may use an 8KB Block size, SQL a minimum of 8KB and SAP 64KB or even more.

IOPS and MB/s both need to be considered

Additionally it is standard practice that when IOPS is considered as a measurement of performance, the throughput that is to say MB/sec is also looked at. This is due to the different impact they have with regards to performance. For example an application with only a 100MB/sec of throughput but 20,000 IOPs, may not cause bandwidth issues but with so many small commands, the storage array is put under significant exertion as its front end processors have an immense workload to deal with. Alternatively if an application has a low number of IOPS but significant throughput such as long sustained reads then the exertion will occur upon the SAN’s links.

Despite this MB/s and IOPS are still not a good enough measure of performance when you don’t take into consideration the Frames per second. To elaborate, referring back to the FC Frame, a Standard FC Frame has a Data Payload of 2112 bytes i.e. a 2K payload. So in the example below where an application has an 8K I/O block size, this will require 4 FC Frames to carry that data portion. In this instance this would equate to 1 IOP being 4 Frames. Subsequently 100 IOPS in this example would equate to 400 Frames. Hence to get a true picture of utilization looking at IOPS alone is not sufficient because there exists a magnitude of difference between particular applications and their I/O size with some ranging from 2K to even 256K, with some applications such as backups having even larger I/O sizes and hence more Frames.

Frames per second give you a better insight of demand and throughput

Looking at a metric such as the ratio of frames/sec to Mb/sec as is displayed below, we will actually get a better picture and understanding of the environment and it’s performance.

To elaborate, the MB/sec to Frames/Sec ratio is different to the IOPS metric. So with reference to this graph of MB/sec to Frame/sec ratio, the line graph should never be below the 0.2 of the y-axis i.e. the 2K data payload.

If the ratio falls below this, say at the 0.1 level, we can identify that data is not being passed efficiently despite the throughput being maintained (MB/sec).

Given a situation where you have the common problem of slow draining devices, the case that MB/s and IOPS alone are not sufficient is even more compelling as you can actually be misled in terms of performance monitoring.

To explain, Slow draining devices are devices that are requesting more information than they can consume and hence cannot cope with the incoming traffic in a timely manner. This is usually because the devices such as the HBA have slower link rates then the rest of the environment, or the server or device are being overloaded in terms of CPU or memory and thus having difficulty in dealing with the data requested. To avoid performance problems it is imperative to proactively identify them before they impact the application layer and consequently emanate to the business’ operations.

Slow Draining devices - requesting more information than they can consume

In such a situation, looking again at the MB/S Frames per Sec ratio graph below we can now see that the ratio is at the 0.1 level, in other words we are seeing a high throughput but minimum payload. This enables you to proactively identify if there are a number of management frames being passed instead of data as they are busily reporting on the physical device errors that are occurring.

Management Frames being passed can mislead

So to conclude without taking Frames per second into consideration and having an insight into this ratio it is an easy trap to falsely believe that everything is ok and data is being passed as you see lots of traffic as represented by MB/S, when in actuality all you are seeing are management frames reporting a problem.

Here's an animated video to further explain the concept: