Getting a perspective of SAN Volume Controller HyperSwap

What’s there in a name? Well when it comes to HyperSwap, it is a complete giveaway. Whole game in the storage industry runs around high-availability. HyperSwap is just an extension to that. As as the name suggests, HyperSwap is some sort of “quick swap” / switch to another site in case of disaster. But before we look into this we will cover some basics of SVC and HA.

1. SVC: SAN volume controller virtualizes the storage behind it by allowing diverse kind of storage boxes to be connected behind it.

2. I/O Groups: SVC has about 8 nodes and has 4 I/O groups (2 nodes per I/O group). Concept of I/O groups is to have effective fail over through partner node. We will not go into much details of how failover etc is managed. It’s important to know that the disks are visible through both the node in an I/O group. In general one one I/O group is mapped for access to a specific volume.

High availability solutions with SVC:

a. Metro Mirroring: Used for synchronous copy of I/O coming to a volume to another volume. Copy is not successful on the host until copy to second volume is successful. This happens from storage level alone. To the host the second volume is presented as read only volume. In case of a disaster happening, access is provided through the second volume. For that to happen the read only property of the volume will have to be changed.

b. Host side multipathing: For the host each I/O group node presents itself as a path to the storage device. So if there are 2 HBA ports on the host then the host will see 4 paths to the storage device. This is illustrated in the below mentioned diagram. Understanding the above is important because these paths to the storage or connections to a specific node plays an important role in hyper-swap (from storage perspective).

c. Host side clustering: Two or more hosts together create a fail-over relationship where each disk is mapped to all the hosts. In case of host failure the redundant host takes over the application and continues the I/O.

Disaster Recovery site (DR site): If for some reason, everything fails on a site an alternate site is maintained with more or less the same configuration. Using metro mirror / global copy with change volume there is a continuous data replication happening to DR site from the main site. In worst case scenario where the who site fails, customer has an option to switch back to DR site.

So how does HA picture look as of now?

1. There is redundancy on the host side with multiple hosts for failover.

2. There is redundancy on the SAN side with multiple HBAs connecting to SAN.

3. There is redundancy on the storage side with multiple I/O groups and multiple nodes in the same I/O group providing failover.

4. There is even bigger redundancy on the site side, where we have another DR site that can be accessed in the event of total failure.

But all of these cases would still cause some outage, if some disaster happens on the storage side. That’s where another solution “enhanced stretch cluster”comes into picture. Its a precursor to hyperSwap. 

Enhanced Stretch cluster(ECS): In case of stretch cluster each SVC I/O group is divided between two sites. In this case hosts will see preferred paths from the production site and the non preferred from the alternate site.

Challenge happens in the case where production site is gone and you are left with DR site. In this case access is available to only one node. This means that there is no redundancy available at the DR site.

So now the current status is that the ESC provides only a limited high-availability. A solution is required that will provide complete redundancy on the DR site as well. That’s where hyper-swap comes into picture.

Hyper-swap:

It uses the existing infrastructure of stretch cluster / metro mirror to great affect and provides I/O group node redundancy on the DR site as well. Hyper-swap stretches the SVC cluster in a real sense and places full I/O group on each site (instead of placing nodes). Instead of two node getting stretched on two side now 4 nodes or 2 I/O groups are placed on each site.

From the host perspective, two Iogrs are mapped to a single volume with paths to the main site as preferred paths and to DR site as non preferred. The volume movement to the DR site is done using something called NDVM (Non disruptive volume movement).

But that’s not how it looks under the hood. Under the hood its another play altogether. In the backend there are two volumes at each site. As I/O goes to the preferred path nodes, a synchronous copy is created on the DR site volume.The volume on the DR site is not mapped to the host.

So what happens if primary I/O grp becomes unavailable?

HyperSwap provides failover to another site in disaster within 30 seconds of time which is well within the application layer timeout for most of the critical applications.

If for some reason the I/O group where the primary vdisk exists is gone then paths from second I/O grp take over and SVC internally manages to use the secondary vdisk to process the data. For the host nothing changes. There is some latency but no I/O loss is suffered.

To know more email: marketing@calsoftinc.com

Contributed by: Himanshu Sonkar|Calsoft Inc.

 
Share:

Related Posts

Product Lifecycle Management in Software Development using Large Language Models

Product Lifecycle Management in Software Development using Large Language Models

The data of any organization is of extreme value. But what happens when that data is not trustworthy and accessible to your teams? You will face challenges…

Share:
Kubernetes Introduction and Architecture Overview

Kubernetes: Introduction and Architecture Overview

Containers are taking over and have become one of the most promising methods for developing applications as they provide the end-to-end packages necessary to run your applications….

Share:
How to Perform Hardware and Firmware Testing of Storage Box

How to Perform Hardware and Firmware Testing of Storage Box

In this blog will discuss about how to do the Hardware and firmware testing, techniques used, then the scope of testing for both. To speed up your testing you can use tools mentioned end of this blog, all those tools are available on internet. Knowing about the Hardware/Firmware and how to test all these will help you for upgrade testing of a product which involve firmware

Share:
Cloud Application Development

Challenges of Cloud Application Development

Explore the challenges and solutions of cloud application development, including benefits, performance issues, and overcoming vendor lock-in for seamless cloud integration.

Share:
5 Best Practices in Cloud-native Application Development

5 Best Practices in Cloud-native Application Development

Explore the top 5 best practices in cloud-native application development to ensure your apps are robust, scalable, and efficient. Learn more now!

Share:
Anomaly Detection in Machine Learning Classification Algorithms vs Anomaly Detection

Anomaly Detection in Machine Learning: Classification Algorithms vs Anomaly Detection

Discover the power of anomaly detection in machine learning to enhance operational efficiency, reduce costs, and mitigate risks with the right algorithms and features.

Share: