I'm delighted to have been asked by BrightTalk to present a webinar for their upcoming Enterprise Storage Summit. I hope you can join and look forward to your feedback!
If it wasn’t for the cost and complexity of Mainframes would the industry have ever shifted towards Open Systems? The performance, maintenance and security advantages of Mainframes were never disputed yet the complexity of running them required a unique expertise while the cost could rarely be justified except for high scale data warehousing. But what if it was possible to apply the advantages of the Mainframe model to an x86 infrastructure? What if that cost and complexity could be replaced by simplicity and CAPEX & OPEX reduction? These are the questions that are being asked by businesses across the globe as they look to reassess how they approach their IT with decreasing budgets and increasing demands. The once common debates related to speeds and feeds of different components are now giving way to discussions on how IT can quickly deliver a suitable service to the business. It is this debate that is leading the industry to an inflection point and the consequent rise of the x86 Mainframe. Join Archie Hendryx as he discusses how a new approach to IT infrastructure is needed to solve the incumbent challenges that are being faced within the industry.
This last fortnight there’s been a cacophony
of hyperbole and at times marketing fluff from vendors and analysts with
regards to Reference Architectures and Converged Infrastructures. As IBM launched
PureSystems, NetApp & Cisco decided it was also a good time to reiterate
their strong partnership with FlexPod. In the midst of this, EMC decided
to release their new and rather salaciously titled VSPEX. From the remnants and
ashes of all these new product names and fancy launch conferences, the
resultant war blogs and Twitterati battles ensued. As I poignantly watched on
from the trenches in an almost Siegfried Sassoon moment, it was quickly
becoming evident that there was now an even more ambiguous understanding of
what distinguishes a Converged Infrastructure from a Reference Architecture,
what it’s relation was with the Private Cloud and more importantly whether you,
the end user should even care.
IBM's PureSystems - Converged Infrastructure or a well marketed Single Stack?
There’s a huge and justified commotion in the
industry over Private Cloud because with lower costs, reduced complexity and
greater data center agility, the advantages are compelling for any business
looking to streamline and optimize its IT. In the pursuit of attaining such
benefits and ensuring a successful Private Cloud deployment, one of the most
critical components that need to be considered is that of the infrastructure
and its underlying resource pools. With resource pools being the foundation of
rapid elasticity and instantaneous provisioning, a Private Cloud’s success
ultimately depends on the stability, reliability, scalability and performance
of its infrastructure. With existent datacenters commonly accommodating legacy
servers that require a refresh or new multiprocessor servers that are entrenched
between an old and insufficient network infrastructure, one of the main
challenges of a Private Cloud deployment is how to upgrade it without
introducing risk. With this challenge and the industry’s pressing need for an
economically viable answer, the solution was quickly conceived and baptized as
“Converged Infrastructure”. Sadly like all great ideas and concepts,
competition and marketing fluff quickly tainted the lucidity of such an obvious
solution by introducing other terms such as “Reference Architectures” and
“Single Stack Solutions”. Even more confusing was the launch of vendor products
that used such terms synonymously, together or as separate distinct entities.
So what exactly differentiates these terms and which is the best solution to
meet the infrastructure challenge of a Private Cloud deployment?
EMC's new VSPEX - Reference Architecture with a variety of options of components
Reference Architectures for all intents and
purposes are essentially just whitepaper-based solutions that are derived from
previously successful configurations. Using various vendor solutions and leveraging
their mutual partnerships & alliances, Reference Architectures are
typically integrated and validated platforms built from server, network and
storage components with an overlying hypervisor. NetApp’s FlexPod and EMC’s
VSPEX fall into this category and both invariably point to their flexibility as
a major benefit as they enable end users to mix and
match as long as there remains a resemblance to the reference. With open APIs
to various management tools, Reference Architectures are cleverly marketed as a
quick, easy to deploy and risk free infrastructure solution for Private Clouds.
Indeed Reference Architectures are a great solution for a low budget SMB that
is looking to introduce itself to the world of Cloud. As for a company that is
either in or bordering on the Enterprise space and looking to seriously deploy
their workloads onto a Private Cloud, it's important to remember that sometimes
things that are great on paper can still end up being a horrible mess in
reality – anyone who's watched Lynch's Dune can pay testament to that.
The difficulty with Reference Architectures
is that fundamentally they still have no hardened solution configuration
parameters and ironically what they term an advantage i.e. flexibility, is
actually their main flaw as their piece by piece approach of using solutions
from many different vendors merely masquerades the same old problems. Due to
being whitepaper solutions, integration of specific components is only
documented as a high level overview with component ‘a’ being detailed as
compatible with component ‘c’. With regards to the specifics and how these
components integrate in detail, these are simply not available or realized
until the Reference Architecture is cobbled together by the end user, who
ultimately assumes all of the risk and financial obligation to ensure it not
only works correctly but is also performing at optimum levels. This haphazard
trial and error approach is counterproductive to the accelerated,
pre-integrated, pretested and optimized model that is required by the
infrastructure of a Private Cloud.
Furthermore Reference Architectures are based
on static deployments of sizing and architecture that typically has little
relation to the end users actual environment or needs, posing a problem whenever
reconfiguration or resizing is required. With end users being left to resize
and consequently reconfigure & reintegrate their solution, they also have
to constantly find a way to integrate their existing toolsets with the open
APIs. This subsequently eliminates a lot of the benefits associated with “quick
time to value” as many deployment projects get caught up in the quagmire of
such triviality. Added to this, once you’ve begun resizing or customizing your
architecture, you’ve actually made changes that are
a deviation from the proposed standard and hence no longer recognizable to the
original reference. This leads to the other complication with Reference
Architectures, namely support issues.
With more than 90% of support calls being related to logical configuration
issues, they are more often than not an occurrence of bugs or incompatibility
issues. When the vendor has no responsibility or knowledge of that logical
build based on the fact that they meet your “requirements” to be flexible, the
situation doesn’t bode any better than when you have a traditional
infrastructure deployment. Vendor finger pointing
is one the most frustrating experiences you inevitably have to face when
deploying an IT infrastructure in the traditional way. Being on a 4am conference
call during a Priority 1 with the different organizational silos and the
numerous vendors that make up the infrastructure is a painful experience I’ve
personally had to face. It’s not a pretty sight when you’re impatiently waiting
for a resolution while the networking company blames the firmware on the
Storage and the Storage vendor blames the bugs with the servers while all the
time you are sitting their watching your CEO’s face turn into a tomato while
the vein in his neck throbs incessantly. When you log a support call for
your reference architecture who is actually responsible? Is it the company you
bought it from or one of the many manufacturers that you used to assemble your
self-built masterpiece? Furthermore which of those manufacturers or vendors
will take full responsibility when you’ve ended up building, implementing and
customizing the architecture yourself? Even at the point of
deployment, the Reference Architecture carries elements of ambiguity for the
end user ranging from which software and firmware releases to run to who is
responsible for the regression testing of the logical build. For instance what if you decide to proactively update to one of your
components’ latest firmware releases and then find out it’s not compatible with
another of your components? Who owns the risk? Also for example if you buy a
“flexible” Reference Architecture from vendor X, how will vendor X be able to
distinguish what it is you’ve actually deployed and how it’s configured without
having to spend an aeon on the phone doing a fact finding session, all while
your key applications are down? Reference Architectures are great for a
test environment or simple cheap and cheerful solution but using them as a
platform to take key applications to the Cloud reeks of more 4am conference
calls and exploding tomatoes.
Oracle Exalogic - Virtualization with OracleVM not VMware
Single Stack Infrastructures on the other
hand while also sometimes marketed as a Converged Infrastructure or a
“flexible” Reference Architecture (or sometimes both!) are another completely
distinct offering in the market. These solutions are typically marketed as
“All-in-one” solutions, and come in a various number of guises. Products such
as Oracle’s Exadata and Exalogic, Dell’s vStart, HP’s CloudSystem Matrix and
IBM’s PureSystems are all examples of the Single Stack solution where the
vendors have tightly defined software stacks above the virtualization layer.
Such solutions will also combine a bundled infrastructure and service offerings
making them potential “Clouds in a Box”. While on the outset these seem ideal
and quick to deploy and manage, there are actually a number of challenges with
the Single Stack solution. The first challenge is that the Single Stack will
always provide you their own inherent components regardless of whether they are
inferior to other products in the market. So for example, instead of having
network switches from the well established Cisco or Brocade, if you opt with
the HP solution you’re looking at HP’s ProCurve, 3Com, H3C and TippingPoint.
Worse still is if you go with the Oracle stack you’re condemned to have
OracleVM as opposed to the market leading and technically superior VMware.
Another challenge is that you’re also tied down to that one vendor and are now
a victim of vendor lock-in. Instead of just having infrastructure that will fit
your existing software toolset and service management, you will inevitably have
to rip and replace these with the Single Stack’s product set. Additionally
these complex and non-integrated software and hardware stacks require
significant time to deploy and integrate, reducing a considerable amount of the
value that comes from an accelerated deployment.
HP's CloudSystem Matrix - A Single Stack that will also bundle in HP's Service offerings with the Infastructure
A true converged infrastructure is one that
is not only pretested and preconfigured but also and more importantly
pre-integrated; in other words it ships out as a single SKU and product to the
customer. While it may use different components from different vendors, they
are still components that are from market leaders and are well established in
the Enterprise space. Furthermore while it may not have the “flexibility” of a
Reference Architecture, it’s the rigidity and adherence to predefined standards
that make the Converged Infrastructure the ideal fit for serious contenders who
are looking for a robust, scalable, simply supported and accelerated Private
Cloud infrastructure. The only solution that is on the market that fits that
category is VCE's Vblock. By being built, tested, pre-integrated and configured
before being sent to the end user as a single product, the Converged
Infrastructure for the Amsterdam datacenter will be exactly the same as the
deployment in Bangalore, Shanghai, Dubai, New York and London. In this instance
the shipped Converged Infrastructure merely requires the end user to plug in
and supply network connectivity.
VCE's unique Vblock 700LX - A true Converged Infrastructure that ships out as a pre-tested & pre-integrated solution
With such a model, support issues are quickly
resolved and vendor finger-pointing is eliminated. For example the support call
is with one vendor (the Converged Infrastructure manufacturer) and they alone
are the owner of the ticket because the Converged Infrastructure is their
product. Moreover once a product model of a converged infrastructure has been
shipped out, problems that may potentially be faced by a customer in Madrid can
easily be replicated and tested on a like for like lab with the same product in
London, rapidly resolving performance issues or trouble tickets.
Deploying a preconfigured, pretested and
pre-integrated standardized model can also quickly eliminate issues with
firmware updates and patching. With traditional deployments, keeping patches
and firmwares up to date with multiple vendors, components and devices can be
an operational role by itself. You would first have to assess the
criticality of each patch and relevance to each platform as well as validate
firmware compatibility with other components. Additionally you’d also need to
validate the patches by creating ‘mirrored’ Production Test Labs and then also
have to figure out what your rollback mechanism is if there are any issues. By
having a pre-integrated Converged Infrastructure all of this laborious and
tedious complication is removed. All patches and firmwares can be pretested and
validated on standardized platforms in labs that are exactly the same as the
standardized platforms that reside in your datacenter. Instead of a multitude of
updates from a multitude of vendors each year, a converged infrastructure
offers the opportunity to have a single matrix that upgrades the infrastructure
as a whole and risk free.
A Converged Infrastructure offers a standardized model making patching & firmware upgrades seamless regardless of location or number
The other distinctive feature of a Converged
Infrastructure is its accelerated deployment. By being shipped to the customer
as a ready assembled, logically configured product and solution, typical
deployments can range from only 30-45 days i.e. from procurement to production.
In contrast other solutions such as Reference Architectures could take twice as
long if not longer as the staging, racking and logical build is still required
once delivered to the customer. It’s this speed of deployment which makes the
Converged Infrastructure the ideal solution for Private Cloud deployments and
an immediate reduction in your total cost of ownership, especially when the
business or application owners demands an instant platform for their new
projects.
The other benefit of having a company that
continuously builds standardized and consistent infrastructures that are
configured and deployed for key applications such as Oracle, SAP or Exchange is
that you end up with an infrastructure that not only consolidates your
footprint and accelerates your time to deployment but also optimizes and in most
cases improves the performance of your key apps. I’ve recently seen a customer
gain a 300% performance improvement with their Oracle databases once they
decided to migrate them off their Enterprise Storage Arrays, SPAARC servers and
SAN switches in favour of a Converged Infrastructure, i.e. the Vblock. Of
course there were a number of questions, head scratching and pontifications as
to what was seemingly inexplicable; “how could you provide such performance
when we’ve spent months optimizing our infrastructure?” The answer is
straightforward in that regardless of how good an engineering team you have, it
is rare that they are solely focused on building a standardized infrastructure
on a daily basis that is customized for a key application and is factoring all
of the components comprehensively.
To elaborate, typically customers will have an
in house engineering department where they’ll have a Storage team, a Server
team, a Network team, an Apps team, a SAN team etc. All of these silos then
need to share their expertise and somehow correlate them together prior to
building the infrastructure. Compare this to VCE and the Converged
Infrastructure approach, where instead there are dedicated engineering teams
for each step of the building process whose expertise is centred and focused
upon a single enabling platform, i.e. the Vblock. Firstly there’s the engineering team that
does the physical build (including thermals, power efficiency, cooling,
cabling, equipment layout for upgrade paths etc.). This is then passed on to
another dedicated engineering team that takes that infrastructure and certifies
the software releases as well as test the logical build configurations all the
way through to the hypervisor. There’s then another engineering organization
that’s sole purpose is to test applications that are commonly deployed on these
Vblock infrastructures such as Oracle, SAP, Exchange, VDI etc. This enables the
customer that orders for example an “Oracle Vblock” to have an infrastructure
that was specifically adapted both logically and physically to not only meet
the needs of their Oracle workloads but also optimize its performance. This is
just a glimpse of the pre-sales aspect; post sales you have a dedicated team
responsible for the product roadmap of the entire infrastructure ensuring that
software or component updates are checked and advised to customers once they
are deemed suitable for a production environment. The list of dedicated teams
goes on but the common denominator is that they are all part of a seamless
process that aims at delivering and supporting an infrastructure designed and
purpose built for mission critical application optimization.
So whether you’re feeling Pure, Flexy or
Spexy the key thing is to distinguish between Reference Architectures, Single
Stack Solutions and the Vblock i.e. a Converged Infrastructure and align the
right solution to the right business challenge. For fun and adventure I'd
always purchase a kit car over a factory built car. I'd have great fun building
it from all the components available to me and have it based on my Reference
handbook. I could even customize my kit car with a 20 inch exhaust pipe, Dr.
Dre hydraulics and fluffy dice because it's flexible just like a Reference
Architecture. Alternatively because I love Audi so much I could buy an Audi car
that has all of its components made by Audi. So that means ripping out the
Alpine CD player for an Audi one, the BOSE speakers for Audi ones and even
removing the Michelin tyres for some new Audi ones, regardless of whether
they're any good or if they’re just OEM’d from a budget manufacturer - just
like a Single Stack Solution. Ultimately if I'm serious about performance and
reliability I'll just buy a manufactured Audi S8 that's pre-integrated and
deployed from the factory with the best of breed components. Sure I can choose
the colour, I can decide on the interior etc. but it's still built to a
standard that's designed and engineered to perform. Much like a Converged
Infrastructure, while I may choose to have a certain amount of CPU for my
Server blades and a certain amount of IOPS and capacity for Storage, I still
have a standardized model that's designed and engineered to perform and scale
at optimum levels. For a Private or Hybrid Cloud infrastructure that successfully
hosts and optimizes critical applications as well as de-risk their
virtualization, the solution can only mean one thing - it's Converged.
System
Admins were generally the early embracers and end users of VMware ESX as they immediately
recognized the benefits of virtualization.Having been bogged down with the pains of running physical servers such
as downtime for maintenance, patching and upgrades, they were the natural
adopters of the bare metal hypervisor. The once Windows 2003 system admin was soon
configuring virtual networks and VLANs as well as carving up Storage
datastores, quickly empowering them as the master of this new domain that was
revolutionizing the datacenter. As the industry matured in its understanding of
VMware, so did VMware’s recognition that the networking, security and storage
expertise should be broadened to those that had been involved in such work in
the physical world. Along came features such as the Nexus 1000v and VM vShield
that enabled the network and security teams to also plug into the ‘VM world’
enabling them to add their expertise and participate in the configuration of
the virtual environment. With vSphere 5, VMware took the initiative further by
bridging the Storage and VMware gap with new features that Storage teams could
also take advantage of. Despite this terms such as SIOC, Storage DRS, VASA and
Storage vMotion still seem to draw blanks from most Storage folk or are looked
down upon as ‘a VMware thing’. So what exactly are these features and why
should Storage as well as VM admin take note of them as well as work together
to take full advantage of their benefits?
Firstly there’s
Storage DRS (SDRS) and in my opinion the most exciting new feature of vSphere 5.
SDRS enables the initial placement and on-going space & load balancing of VMs
across datastores that are part of the same datastore cluster.Simply put think of a datastore cluster as an aggregation of multiple datastores
into a single object and SDRS balancing the space and I/O load across it.
In
the case of space utilization, this takes place by ensuring that a set threshold
is not exceeded. So should a VM reach say 70% space threshold, then SDRS will move
the VMs via Storage vMotion to other datastores to balance out the load.
Storage DRS based on Space utilisation
The
other balancing feature which is load balancing based on I/O metrics, uses the
vSphere feature Storage I/O Control (SIOC).
In this instance SIOC is used to evaluate the datastores in the cluster by continuously
monitoring how long it takes an I/O to do a round trip and then feeds this
information to Storage DRS. If the latency value for a particular datastore is
above a set threshold value for a period of time, then SDRS will rebalance the
VMs across the datastores in the cluster via Storage vMotion. With many Storage
administrators operating ‘dynamic tiering’ or ‘fully automated tiering’ at the
backend of their storage arrays, it’s vital that a co-operative design and
decision is made to ensure that the right features are utilized at the right
time.
Storage DRS based on I/O latency
While
most are aware of vMotion’s capabilities of seamlessly migrating VMs across
hosts, Storage
vMotion is a slightly different feature that allows the migration of running
VMs from one datastore to another without incurring any downtime. In vSphere 5.0,
Storage vMotion has been improved by enabling the operation to take place a lot
quicker.
It does this by using a new Mirror Driver mechanism that keep
blocks on the destination synchronized with any changes made to the source
after any initial copying. The migration process then does a single pass of the
disk, copying all the blocks to the destination disk. If there are any blocks that
have changed this copy, the mirror driver will then synchronise from the source
to the destination. It’s this single pass block copy that enables Storage
vMotion to take place a lot quicker, enabling the end user to reap the benefits
immediately.
Storage vMotion & the new Mirror Driver
As
for the new feature named VASA, this has a focus around providing insight and
information to the VM admin about the underlying storage. To explain VASA in
its simplest terms is a new set of APIs that enables storage arrays to provide
vCenter with visibility into the storage’s configuration, health status and
functional capabilities. VASA also allows the VM admin to see the features and
capabilities of their underlying physical storage. It allows the admin to see
details such as the number of spindles for a volume, the number of expected IOPS
or MB/s, the RAID levels, whether the LUNs are either thick or thin provisioned
or even if there are any deduplication or compression details. By leveraging the information
provided by VASA, SDRS can also utilize this to make its recommendations on
space and I/O load balancing. Basically VASA is a great feature that ensures VM
admins can quickly provision storage to VMs that are most applicable to them.
This
leads onto the feature termed Profile Driven Storage. Profile Driven Storage is
a feature that enables you to select the correct datastore on which to deploy your
VMs based on that datastore’s capabilities. So building a Storage Profile, can
happen in two ways, one is that the storage device has its capability
associated automatically via VASA. The other way is that the storage device’s
capability is user-defined and manually associated.
VASA & Profile Driven Storage
With
the User-Defined option you can apply
labels to your storage, such as Bronze, Silver & Gold based on the
capabilities of that Storage. So for example once a profile is created and the
user-defined capabilities are added to a datastore, you can then use that
profile to select the correct storage for a new VM. If the profile is compatible
with the VM’s requirements it is said to be compliant, if they do not, then the VM is said to be non-compliant. So
while VASA and profile driven storage are still a new feature, their potential
is immense especially in the future, as storage admin can potentially work
alongside VM admins to help classify and tier their data.
As mentioned
before Storage I/O Control or SIOC is a feature that enables you to configure
rules and policies to help specify the business priority of each VM. It does
this by dynamically allocating I/O resources to your critical application VMs
whenever there’s an I/O congestion detected. Furthermore by enabling SIOC on a
datastore you can trigger the monitoring of device latency as observed by the
hosts. As SIOC takes charge of I/O allocation to VMs it also by default ignores
Disk.SchedNumReqOutstanding (DSNRO). Typically
it’s DSNRO that sets the Queue Depth on the hypervisor layer but once SIOC is
enabled it consequently takes on this responsibility basing its judgements on
the I/O congestion and policy settings. This offloads a significant amount of performance
design tasks from the admins but ultimately still requires the involvement of
the Storage team to ensure that I/O contention is not falsely coming from poorly
configured Storage and highly congested LUNs.
SIOC ignores Disk.SchedNumReqOutstanding to set the Queue Depth at the hypervisor level
So while these new features are ideal for the
SMB they may still not be the sole answer to every Storage / VMware related
problem related to virtualizing mission critical applications. As with any new
feature or technology their success relies in their correct planning, design
and implementation and for that to happen a siloed VM or Storage only approach
needs to be evaded.
Back in March 2009, when Cisco announced the launch of their UCS platform and subsequent intention to enter the world of server hardware, eyebrows were raised including my own. There was never any disputing that the platform would be adopted by some customers, certainly after seeing how Cisco successfully gatecrashed the SAN market and initially knocked Brocade off their FC perch. We’d all witnessed how Cisco used its IP datacenter clout and ability to propose deals that packaged both SAN MDS and IP switches with a consequent single point of support to quickly take a lead in a new market. Indeed it was only after Brocade’s 2007 acquisition of McData and when Cisco started to focus on FCoE that Brocade regained their lead in FC SAN switch sales. Where mine and others’ doubts lay were whether the UCS was going to be good enough to compete with the already proven server platforms of HP, IBM and Dell. Well, roll on three years and the UCS now boasts 11,000 customers worldwide and an annual run rate of £822m making it the fastest growing product in Cisco’s history. Amazingly Cisco is already third in worldwide blade server market share with 11%, closely behind HP and IBM. So now with this week’s launch of the UCS’ third generation and its integration of the new Intel Xeon processor E5-2600, it’s time to accept that all doubts have been swiftly erased.
Unlike other server vendors, Cisco’s UCS launch was from a fresh-fields approach that recognized the industry’s shift towards server virtualization and consolidation. Not tied down by legacy architectures, Cisco entered the server market at the same time Intel launched their revolutionary Intel Xeon 5500 processors and immediately took advantage with their groundbreaking memory extension feature. By creating a way to map four distinct physical memory modules (DIMMs) to a single logical DIMM that would be seen by the processor’s memory channel, Cisco introduced a way to have 48 standard slots as opposed to the 12 found in normal servers. With the new B200 M3 blade server, there’s now support for up to 24 DIMM slots for memory running up to 1600 MHz and up to 384 GB of total memory as well as 80 Gbits per second of I/O bandwidth.This is even more impressive when you factor in that with the Cisco UCS 5108 Chassis also being able to accommodate up to eight of these blades, scalability can go up to a remarkable 320 per Cisco Unified Computing System.
Added to this Cisco took convergence further by making FCoE the standard with Fabric Interconnects that not only acted as the brains for their servers but also helped centralize management. With the ability to unite up to 320 servers as a single system, they also supported line-rate, low latency lossless 10 Gigabit Ethernet as well as FCoE. This enabled a unified network connection for each blade server with just a wire-once 10Gigabit Ethernet FCoE downlink, reducing cable clutter and centralizing network management via the UCS Manager GUI. Now with the newly launched UCS 6296UP, the Fabric Interconnect will double the switching capacity of the UCS fabric from 960Gbps to 1.92Tbps as well as the number of ports from 48 to 96.
Cisco UCS' Memory Extension
Other features such as FEX introduced the ability to ease management. FEX (Fabric Extenders) are platforms that act almost like remote line cards for the parent Cisco Nexus switches. Hence the Fabric Extenders don’t perform any switching and are managed as an extension of the fabric interconnects. This enables the UCS to scale to many chassis without increasing the amount of switches, as switching is removed from the chassis. Furthermore there is no need for separate chassis management modules as the fabric extenders alongside the fabric interconnects manage the chassis’ fans, power supplies etc. This means there’s no requirement to individually manage each FEX as everything is inherited from the upstream switch therefore allowing you to simply plug in and play a FEX for a rack of pre-cabled servers. Regardless of configured policies, upgrading or deploying of new features would simply require a change on the upstream switch because the FEX inherits from the parent switch, leading everything to be automatically propagated across the racks of servers.
The UCS components
With the aforementioned B200 M3 blade, there is also two mezzanine I/O slots, one that is coincidentally used by the newly launched 1240 virtual interface card. The VIC1240 provides 40 Gbps capacity which can of course be sliced up into virtual interfaces delivering flexible bandwidth to the UCS blades. Moreover with a focus on virtualization and vSphere integration, the VIC 1240 implements Cisco’s VM-FEX and supports VMware's VMDirectPath with vMotion technology. The concept of VM-FEX is again centered on the key benefits of consolidation, this time around the management of both virtual and physical switches. With the advent of physical 10GB links being standard, VM-FEX enables end users to move away from the complexity of managing standard vSwitches and consequently a feature that was designed and introduced when 1GB links were the norm. It does this by providing VM virtual ports on the actual physical network switch hence avoiding the hypervisor’s virtual switch. The VM’s I/O is therefore sent directly to the physical switch, making the VM’s identity and positioning information known to the physical switch, eliminating local switching from the hypervisor. Unlike the common situation when trunking of the physical ports was a requirement to enable traffic between VMs on different physical hosts, the key point here is that the network configuration is now specific to that port. That means once you’ve assigned a VLAN to the physical interface, there is no need for trunking and you’ve also ensured network consistency across your ESX hosts. The VM-FEX feature also has two modes, the first mode being called emulated mode where the VM’s traffic is passed through the hypervisor kernel. The other ‘high-performance’ mode utilizes the VMDirectPath I/O and bypasses the hypervisor kernel going directly to the hardware resource associated with the VM.
High Performance Mode utilises VMware's VMDirectPath I/O feature
Interestingly the VMDirectPath I/O feature is another key vSphere technology that often gets overlooked but one that adds great benefit by allowing VMs to directly access hardware devices. First launched in vSphere 4.0, one of its limitations was that it didn’t allow you to vMotion the VM, which may explain its lack of adoption. Now though with vSphere 5.0 and the UCS, vMotion is supported. Here the VIC sends the VM’s I/O directly to the UCS fabric interconnect, which then offloads the VM’s traffic switching and policy enforcement. By interoperating with VMDirectPath the VIC transfers the I/O state of a VM as well as its network properties (VLAN, port security, rate limiting, QoS) to vCenter as it vMotions across ESX servers. So while you may not get an advantage on throughput, where VMDirectPath I/O’s advantage lies is in its ability to save on CPU workloads by freeing up CPU cycles that were needed for VM switching, making it ideal for very high packet rate workloads that need to sustain their performance. Of course you can also now transition the device from one that is paravirtualized to one that is directly accessed and the other way around. VM-FEX basically merges the virtual access layer with the physical switch, empowering the admin to now provision and monitor from a consolidated point.
As well as blade servers, Cisco are also serving up (excuse the pun) new rack servers which update their C-class range; the 1U C220 M3 and the 2U C240 M3 server. With the announcement that the UCS Manager software running in the Fabric Interconnect will now be able to manage both blade and rack servers as a common entity, there is also news that this will eventually scale out as a single management domain for thousands of servers. Currently under the moniker of “Multi-UCS Manager”, the plan is to expand the current management domain limit of 320 servers to up to 10,000 servers spread across data centers around the world, empowering server admin to centrally deploy templates, policies, and profiles as well as manage and monitor all of their servers. This would of course bring huge dividends in terms of OPEX savings, improved automation and orchestration setting the UCS up as a very hard to ignore option in any new Cloud environment.
A single management pane for up to 10,000 servers
As well as Cloud deployments, the UCS is also being set up to play a key role in the explosion of big data. With the recent announcement that Greenplum and Cisco are finally teaming together to utilize the C-class rack servers, there is already talk of pre-configured Hadoop stacks. With Greenplum’s MR Hadoop distribution integrating with Cisco's C-class rack servers, it’s pretty obvious that the C-class UCS servers will also quickly gain traction in the market much like their B-series counterparts.
Incredibly it was not long ago that Cisco was just a networking company that’s main competitor was Brocade. Fast forward to March 2012 and Brocade’s CEO Mike Klayko is stating "If you can run Cisco products then you can run ours" to justify Brocade's IP credentials. When their once great competitor inadvertently admits they’re entering the IP world as a reaction to Cisco rather than a perceived demand from the market it really does showcase how far Cisco have come. It also speaks volumes that alternatively, Cisco proactively entered the server world when no perceived demand existed within that market. Three years later and with 11% market share and groundbreaking features built for the Cloud and Big Data, Cisco has moved far beyond its networking competitors and is well placed to be a mainstay powerhouse in the server milieu.
The key to a Disaster Recovery investment is being able to test and failover i.e. check that it actually works. Hence it is vital that the SAN that is being used for this replication be optimized and provide a RTO that meets with the business’ demands. While there are sufficient tools to monitor the IP or DWDM links for cross site replication, it is still best practice to TAP at the replication links for your Disaster Recovery infrastructure to incorporate the monitoring of the FC SAN.
Here's what was my final video for Virtual Instruments that quickly explains how you can proactively take out the "Disaster" from Disaster Recovery....
IOPS
is commonly recognized as a standard measurement of performance whether to
measure the Storage Array’s
backend drives or the performance of the SAN. In its most basic terms IOPS are
the number of operations issued per second, whether, read, writes orother
and admins will typically use their Storage Array tools orapplications
such as Iometer to
monitorIOPS.
IOPS will vary on a number of factors that include a system’s
balance of read and write operations, whether the traffic is sequential, random
or mixed, thestorage
drivers the OS background operations or even the I/O Block size.
Block size is usually determined by the application with different
applications using different block sizes for various circumstances. So for
example Oracle will typically useblock sizes of 2 KB or 4 KBfor
online transaction processing and larger block sizes of 8 KB, 16 KB, or 32 KB
for decision support system workload environments. Exchange 2007 may use an 8KB
Block size, SQL a minimum of 8KB and SAP 64KB or even more.
IOPS and MB/s both need to be considered
Additionally
it is standard practice that when IOPS is considered as a measurement of
performance, the throughput that is to say MB/sec is also looked at. This is
due to the different impact they have with regards to performance.For example an application with only a 100MB/sec of throughput but
20,000 IOPs, may not cause bandwidth issues but with so many small commands,
the storage array is put under significant exertion as its front end processorshave
an immense workload to deal with. Alternatively if an application has a low
number of IOPS but significant throughput such as long sustained reads then the
exertion will occur upon the SAN’s links.
Despite
this MB/s and IOPS are still not a good enough measure of performance when you
don’t
takeinto
considerationthe
Frames per second. To elaborate, referring back to the FC Frame, a Standard FC Frame has a
Data Payload of 2112 bytes i.e. a 2K payload. So in the example below where an application has an 8K I/O block size, this
will require 4 FC Frames to carry that data portion. In this instance this
would equate to 1 IOP being 4 Frames. Subsequently 100 IOPS in this example would equate to 400 Frames. Hence
to get a true picture of utilization looking at IOPS alone is not sufficient
because there exists a magnitude of difference between particular applications
and their I/O size with some ranging from 2K to even 256K, with some applications such as backups having
even larger I/O sizes and hence more Frames.
Frames per second give you a better insight of demand and throughput
Looking
at a metric such as the ratio of frames/sec to Mb/sec as is displayed below,
we will actually get a better picture and understanding of the environment and
it’s
performance.
To elaborate, the MB/sec to Frames/Sec ratio is different to the IOPS metric. So with reference to this graph of MB/sec to Frame/sec ratio, the line graph should never be below the 0.2 of the y-axis i.e. the 2K data payload.
If
the ratio falls below this, say at the 0.1 level, we can identify that data is
not being passed efficiently despite the throughput being maintained (MB/sec).
Given
a situation where you have the common problem of slow draining devices, the case that MB/s and
IOPS alone are not sufficientis even more compelling as you can
actually be misled in terms of performance monitoring.
To
explain, Slow draining devices are devices that are requesting more information
than they can consume and hence cannot cope with the incoming traffic in a
timely manner.This
is usually becausethe
devices such as the HBA have slower link rates then the rest of the
environment, or the server or device are being overloaded in terms of CPU or
memory and thus having difficulty in dealing with the data requested.To
avoid performance problems it is imperative to proactively identify them before
they impact the application layer and consequently emanate to the business’
operations.
Slow Draining devices - requesting more information than they can consume
In
such a situation, looking again at the MB/S Frames
per Sec ratio graph below we can now see that the ratio is at the 0.1 level, in other
words we are seeing a high throughput but minimum payload. This enables you to
proactively identify if there are a number of management frames being passed
instead of data as they are busily reporting on the physical device errors that
are occurring.
Management Frames being passed can mislead
So
to conclude without taking Frames per second into consideration and having an
insight into this ratio it is an easy trap to falsely believe that everything
is ok and data is being passed asyou see lots of traffic as represented by
MB/S, when in actuality all you are seeing are management frames reporting a
problem.
Here's an animated video to further explain the concept: