RecoverPoint For VMs (RP4VMs) 5.1 Is here

July 7, 2017, 8:48 am

≪ Previous: RecoverPoint 5.1 Is Here, A Must upgrade!

Along with the release of (the Classic) RP 5.1 release which you can read about here https://xtremio.me/2017/07/06/recoverpoint-5-1-is-here-a-must-upgrade/

We have also just released RP4VMs 5.1, to me, this if where scale and simplicity made their way into the product, here are some of the new features / changes

Deployer Improvements

DEPLOYER – INSTALL A vRPA CLUSTER

You now get many more pre-automated validations done for you so you know in advanced and before the actually deployments what errors you might have.

You now also get a much more simplified network setting screen, a good divide between the LAN / WAN settings.

If the previous installation has failed, the repository volume will be deleted as well and won’t prevent you from trying again.

It will also show you some basic instructions for troubleshooting.

Replicating Resource Reservations

Enhancement of Replicate hardware changes whereby the source VM reservations (Memory or CPU) can be reflected on the replica VM.

Encompassed within the Hardware changes option during the VM
Protection Flow (shown below) or in the RP4VMs plugin, Protection-
>VM->Hardware Settings.

The resource reservation replication will enable when the replica VM is powered off
• Any CPU\Memory reservation changes performed on the source VM will reflect on the replica VM only when in image access

Shadowless SMURF

SMURF = Starting Machine Up at Remote Facility
• Shadow – use a minimal hardware machine for SMURF in order to reduce consumption of resources at the remote
• Having a VM powered up in the remote in order to access the VMDKs

Looks and sounds like a “contradiction in terms” but in reality it means no more shadow VM .vmx file – the shadow and replica will use the same .vmx file
• The look and feel remains the same

Why are we doing this?

Minimizing ReconfigVM API calls – multiple operations in the system that each independently made API reload or reconfig calls (MORef ID
and VC UID now maintained)
• NSX with network segregation feature: NSX configuration was reset when transitioning between shadow and replica due to VC UID momentarily being wrong
• Some Cloud metering and monitoring systems where sensitive to the previous configuration
• Better Storage vMotion support on both modes of the replica VM – when SvMotion was used on the shadow, the replica VM vmx was not moved (or vice versa if SvMotion on the target)

Silent Image Access

This feature allows PP4VMs and RP4VMs users to perform Image Access without powering on the Replica VM
• PP4VMs does not require the VM to be powered on to recover and for RP4VMs customers they can choose to perform to their own power up sequence outside the scope of RP.
• Supports both backup Data Domain local replica copies and RP4VMs Local and remote replica copies
• The user can initiate Recover Production or Failover without powering on the replica VM.

This feature allows RP4VMs and PP4VMs users to perform Image Access without powering on the Replica VM

If the user initiates a Failover the plugin will display the warning message above
• Note that when failing over the replica VM will remain in a powered down state and the CG will move to a “Paused” state

RE-IP OR START-UP SEQUENCE ENABLED

A validation warning will be displayed to the user requesting confirmation before finishing the “Test a Copy” Wizard
• The system will block both production recovery and failover operations and return an error message

Failover Networks

Allow users to better view and change the networks of a replica VM which will be applied after failover. This feature arose from customer
complaints about the network changing from the “Test Network” chosen when entering Image Access to a arbitrary “Failover Network” after failing over

MODIFY FAILOVER NETWORKS IN THE PLUGIN

MODIFY FAILOVER NETWORKS IN THE FAILOVER FLOW

Scale & Performance

The 5.1 release aims to improve RP4VMs scalability capabilities by protecting maximal scale with minimal number of vRPA clusters
• The goal is to increase the RP4VMs scale limitation: replicating up to 256 consistency groups on a single ‘Bronze+’ vRPA (2 vCPUs, 8 GB RAM) and above

NEW (and awesome!) COMPRESSION LIBRARY

One compression level RP4VMs

RecoverPoint for VMs provides enhanced scale-out ability for a cluster of Bronze+ (2 vCPU/8GB RAM) vRPAs:

Protect up to 8,000 VMs per vCenter
Protect up to 1,000 VMs per vRPA cluster
Manage up to 256 consistency groups by a single vRPA cluster

RecoverPoint for VMs achieves 100 percent across-the-board improvements in performance

CONSISTENCY GROUP STATS IN PLUGIN
• New option to choose the statistic time span

Below you can see some demos Idan Kentor (Corporate SE) @idankentor recorded

Here’s a demo showing the VM protection

and one showing a more advanced protection options

and lastly, the orchestration and failover options

↧

EMC Storage Analytics 4.3 Is GA

July 12, 2017, 9:15 am

≫ Next: vSphere 6.5 Update 1 is out, here’s why you want to upgrade

≪ Previous: RecoverPoint For VMs (RP4VMs) 5.1 Is here

Hi,

We have just released an updated version of the ESA plugin for VMware vRealize Operation, new to this release is the support for vROPS 6.6, new XtremIO metrics etc

As always, you can download the latest version from https://support.emc.com/search/?text=esa%204.3&facetResource=DOWN

The UI of vROPS 6.6 is different and it feels like the biggest overhaul since the very early days of vCOPS

Once you installed the ESA adapter (one adapter for all the DellEMC arrays), you can start gathering metrics, for example, here you can see the array CPUs utilization at the storage controller level and the volume level, you can of course customize the dashboards to your liking but we provide you very good ones anyway

Here’s another report showing the array data reduction metrics

Another report you can see below is the “TOP-10” volumes IO activity

↧

vSphere 6.5 Update 1 is out, here’s why you want to upgrade

July 27, 2017, 11:37 pm

≫ Next: XCOPY Chunk Sizes – Revisited (and data reduction as well)

≪ Previous: EMC Storage Analytics 4.3 Is GA

VMware have just released the first major update to vSphere 6.5, normally, I don’t blog on these but this update is so big and it fixes some really annoying bugs I saw using the GA version of vSphere 6.5..thankfully, we worked hard with their support to overcome some of the issues I highlighted in yellow, this was of course done for the greater good.

The release notes for ESXI 6.5 U1 can be seen here https://docs.vmware.com/en/VMware-vSphere/6.5/rn/vsphere-esxi-651-release-notes.html and it can be downloaded from here https://my.vmware.com/web/vmware/details?downloadGroup=ESXI65U1&productId=614&rPId=17343

The release notes for vCenter 6.5 U1 can be seen here https://docs.vmware.com/en/VMware-vSphere/6.5/rn/vsphere-vcenter-server-651-release-notes.html and it can be downloaded from here https://my.vmware.com/web/vmware/details?downloadGroup=VC65U1&productId=614&rPId=17343

Below you can see that partial list of things that were close to my heart.

Storage Issues

Modification of IOPS limit of virtual disks with enabled Changed Block Tracking (CBT) fails with errors in the log files

To define the storage I/O scheduling policy for a virtual machine, you can configure the I/O throughput for each virtual machine disk by modifying the IOPS limit. When you edit the IOPS limit and CBT is enabled for the virtual machine, the operation fails with an error The scheduling parameter change failed. Due to this problem, the scheduling policies of the virtual machine cannot be altered. The error message appears in the vSphere Recent Tasks pane.

You can see the following errors in the /var/log/vmkernel.log file:

2016-11-30T21:01:56.788Z cpu0:136101)VSCSI: 273: handle 8194(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000
2016-11-30T21:01:56.788Z cpu0:136101)ScsiSched: 2760: Invalid Bandwidth Cap Configuration
2016-11-30T21:01:56.788Z cpu0:136101)WARNING: VSCSI: 337: handle 8194(vscsi0:0):Failed to invert policy

This issue is resolved in this release.
When you hot-add multiple VMware Paravirtual SCSI (PVSCSI) hard disks in a single operation, only one is visible for the guest OS

When you hot-add two or more hard disks to a VMware PVSCSI controller in a single operation, the guest OS can see only one of them.

This issue is resolved in this release.
An ESXi host might fail with a purple screen

An ESXi host might fail with a purple screen because of a race condition when multiple multipathing plugins (MPPs) try to claim paths.

This issue is resolved in this release.
Reverting from an error during a storage profile change operation, results in a corrupted profile ID

If a VVol VASA Provider returns an error during a storage profile change operation, vSphere tries to undo the operation, but the profile ID gets corrupted in the process.

This issue is resolved in this release.
Incorrect Read or Write latency displayed in vSphere Web Client for VVol datastores

Per host Read or Write latency displayed for VVol datastores in the vSphere Web Client is incorrect.

This issue is resolved in this release.
An ESXi host might fail with a purple screen during NFSCacheGetFreeEntry

The NFS v3 client does not properly handle a case where NFS server returns an invalid filetype as part of File attributes, which causes the ESXi host to fail with a purple screen.

This issue is resolved in this release.
The lsi_mr3 driver and hostd process might stop responding due to a memory allocation failure in ESXi 6.5

The lsi_mr3 driver allocates memory from address space below 4GB. The vSAN disk serviceability plugin lsu-lsi-lsi-mr3-plugin and the lsi_mr3 driver communicate with each other. The driver might stop responding during the memory allocation when handling the IOCTL event from storelib. As a result, lsu-lsi-lsi-mr3-plugin might stop responding and the hostd process might also fail even after restart of hostd.

This issue is resolved in this release with a code change in the lsu-lsi-lsi-mr3-plugin plugin of lsi_mr3 driver, setting a timeout value to 3 seconds to get the device information to avoid plugin and hostd failures.
When you hot-add an existing or new virtual disk to a CBT (Changed Block Tracking) enabled virtual machine (VM) residing on VVOL datastore, the guest operation system might stop responding

When you hot-add an existing or new virtual disk to a CBT enabled VM residing on VVOL datastore, the guest operation system might stop responding until the hot-add process completes. The VM unresponsiveness depends on the size of the virtual disk being added. The VM automatically recovers once hot-add completes.

This issue is resolved in this release.
When you use vSphere Storage vMotion, the UUID of a virtual disk might change

When you use vSphere Storage vMotion on vSphere Virtual Volumes storage, the UUID of a virtual disk might change. The UUID identifies the virtual disk and a changed UUID makes the virtual disk appear as a new and different disk. The UUID is also visible to the guest OS and might cause drives to be misidentified.

This issue is resolved in this release.
An ESXi host might stop responding if a LUN unmapping is made on the storage array side

An ESXi host might stop responding if a LUN unmapping is made on the storage array side to those LUNs while connected to an ESXi host through Broadcom/Emulex fiber channel adapter (the driver is lpfc) and has I/O running.

This issue is resolved in this release.
An ESXi host might become unresponsive if the VMFS-6 volume has no space for the journal

When opening a VMFS-6 volume, it allocates a journal block. Upon successful allocation, a background thread is started. If there is no space on the volume for the journal, it is opened in read-only mode and no background thread is initiated. Any intent to close the volume, results in attempts to wake up a nonexistent thread. This results in the ESXi host failure.

This issue is resolved in this release.
An ESXi host might fail with a purple screen if the virtual machines running on it have large capacity vRDMs and use the SPC4 feature

When the virtual machines use the SCP4 feature with Get LBA Status command to query thin-provisioned features of large vRDMs attached, the processing of this command might run for a long time in the ESXi kernel without relinquishing the CPU. The high CPU usage can cause the CPU heartbeat watchdog process to deem a hung process and the ESXi host might stop responding.

This issue is resolved in this release.
An ESXi host might fail with a purple screen if the VMFS6 datastore is mounted on multiple ESXi hosts, while the disk.vmdk has file blocks allocated from an increased portion on the same datastore

A VMDK file might reside on a VMFS6 datastore which is mounted on multiple ESXi hosts (for example 2 hosts, ESXi host1 and ESXi host2). When the VMFS6 datastore capacity is increased from ESXi host1, while having it mounted on ESXi host2, and the disk.vmdk has file blocks allocated from an increased portion of the VMFS6 datastore from ESXi host1. Now, if the disk.vmdk file is accessed from ESXi host2, and if the file blocks are allocated to it from ESXi host2, the ESXi host2 might fail with a purple screen.

This issue is resolved in this release.
After installation or upgrade certain multipathed LUNs will not be visible

If the paths to a LUN have different LUN IDs in case of multipathing, the LUN will not be registered by PSA and end users will not see them.

This issue is resolved in this release.
A virtual machine residing on NFS datastores might be failing the recompose operation through Horizon View

The recompose operation in Horizon View might fail for desktop virtual machines residing on NFS datastores with stale NFS file handle errors, because of the way virtual disk descriptors are written to NFS datastores.

This issue is resolved in this release.
An ESXi host might fail with a purple screen because of a CPU heartbeat failure

An ESXi host might fail with a purple screen because of a CPU heartbeat failure only if the SEsparse is used for creating snapshots and clones of virtual machines. The use of SEsparse might lead to CPU lockups with the warning message in the VMkernel logs, followed by a purple screen:

PCPU <cpu-num> didn’t have a heartbeat for <seconds> seconds; *may* be locked up.

This issue is resolved in this release.
Disabled frequent lookup to an internal vSAN metadata directory (.upit) on virtual volume datastores. This metadata folder is not applicable to virtual volumes

The frequent lookup to a vSAN metadata directory (.upit) on virtual volume datastores can impact its performance. The .upit directory is not applicable to virtual volume datastores. The change disables the lookup to the .upit directory.

This issue is resolved in this release.
Performance issues on Windows Virtual Machine (VM) might occur after upgrading to VMware ESXi 6.5.0 P01 or 6.5 EP2

Performance issues might occur when the not aligned unmap requests are received from the Guest OS under certain conditions. Depending on the size and number of the not aligned unmaps, this might occur when a large number of small files (less than 1 MB in size) are deleted from the Guest OS.

This issue is resolved in this release.
ESXi 5.5 and 6.x hosts stop responding after running for 85 days

ESXi 5.5 and 6.x hosts stop responding after running for 85 days. In the /var/log/vmkernel log file you see entries similar to:

YYYY-MM-DDTHH:MM:SS.833Z cpu58:34255)qlnativefc: vmhba2(5:0.0): Recieved a PUREX IOCB woh oo
YYYY-MM-DDTHH:MM:SS.833Z cpu58:34255)qlnativefc: vmhba2(5:0.0): Recieved the PUREX IOCB.
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674)qlnativefc: vmhba2(5:0.0): sizeof(struct rdp_rsp_payload) = 0x88
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674qlnativefc: vmhba2(5:0.0): transceiver_codes[0] = 0x3
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674)qlnativefc: vmhba2(5:0.0): transceiver_codes[0,1] = 0x3, 0x40
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674)qlnativefc: vmhba2(5:0.0): Stats Mailbox successful.
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674)qlnativefc: vmhba2(5:0.0): Sending the Response to the RDP packet
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674 0 1 2 3 4 5 6 7 8 9 Ah Bh Ch Dh Eh Fh
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674)————————————————————–
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674) 53 01 00 00 00 00 00 00 00 00 04 00 01 00 00 10
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674) c0 1d 13 00 00 00 18 00 01 fc ff 00 00 00 00 20
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674) 00 00 00 00 88 00 00 00 b0 d6 97 3c 01 00 00 00
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674) 0 1 2 3 4 5 6 7 8 9 Ah Bh Ch Dh Eh Fh
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674)————————————————————–
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674) 02 00 00 00 00 00 00 80 00 00 00 01 00 00 00 04
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674) 18 00 00 00 00 01 00 00 00 00 00 0c 1e 94 86 08
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674) 0e 81 13 ec 0e 81 00 51 00 01 00 01 00 00 00 04
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674) 2c 00 04 00 00 01 00 02 00 00 00 1c 00 00 00 01
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674) 00 00 00 00 40 00 00 00 00 01 00 03 00 00 00 10
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674)50 01 43 80 23 18 a8 89 50 01 43 80 23 18 a8 88
YYYY-MM-DDTHH:MM:SS.833Z cpu58:33674) 00 01 00 03 00 00 00 10 10 00 50 eb 1a da a1 8f

This is a firmware problem and it is caused when Read Diagnostic Parameters (RDP) between the Fibre Channel (FC) Switch and the Hot Bus Adapter (HDA) fails 2048 times. The HBA adapter stops responding and because of this the virtual machine and/or the ESXi host might fail. By default, the RDP routine is initiated by the FC Switch and occurs once every hour, resulting in a reaching the 2048 limit in approximately 85 days.

This issue is resolved in this release.
Resolve the performance drop in Intel devices with stripe size limitation

Some Intel devices, for example P3700, P3600, and so on, have a vendor specific limitation on their firmware or hardware. Due to this limitation, all IOs across the stripe size (or boundary), delivered to the NVMe device can be affected from significant performance drop. This problem is resolved from the driver by checking all IOs and splitting command in case it crosses the stripe on the device.

This issue is resolved in this release.
Remove the redundant controller reset when starting controller

The driver might reset the controller twice (disable, enable, disable and then finally enable it) when the controller starts. This is a workaround for the QEMU emulator for an early version, but it might delay the display of some controllers. According to the NVMe specifications, only one reset is needed, that is, disable and enable the controller. This upgrade removes the redundant controller reset when starting the controller.

This issue is resolved in this release.
An ESXi host might fail with purple screen if the virtual machine with large virtual disks uses the SPC-4 feature

An ESXi host might stop responding and fail with purple screen with entries similar to the following as a result of a CPU lockup.

0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]@BlueScreen: PCPU x: no heartbeat (x/x IPIs received)
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]Code start: 0xxxxx VMK uptime: x:xx:xx:xx.xxx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]Saved backtrace from: pcpu x Heartbeat NMI
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]MCSLockWithFlagsWork@vmkernel#nover+0xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]PB3_Read@esx#nover+0xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]PB3_AccessPBVMFS5@esx#nover+00xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]Fil3FileOffsetToBlockAddrCommonVMFS5@esx#nover+0xx stack:0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]Fil3_ResolveFileOffsetAndGetBlockTypeVMFS5@esx#nover+0xx stack:0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]Fil3_GetExtentDescriptorVMFS5@esx#nover+0xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]Fil3_ScanExtentsBounded@esx#nover+0xx stack:0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]Fil3GetFileMappingAndLabelInt@esx#nover+0xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]Fil3_FileIoctl@esx#nover+0xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]FSSVec_Ioctl@vmkernel#nover+0xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]FSS_IoctlByFH@vmkernel#nover+0xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]VSCSIFsEmulateCommand@vmkernel#nover+0xx stack: 0x0
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]VSCSI_FSCommand@vmkernel#nover+0xx stack: 0x1
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]VSCSI_IssueCommandBE@vmkernel#nover+0xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]VSCSIExecuteCommandInt@vmkernel#nover+0xx stack: 0xb298e000
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]PVSCSIVmkProcessCmd@vmkernel#nover+0xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]PVSCSIVmkProcessRequestRing@vmkernel#nover+0xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]PVSCSI_ProcessRing@vmkernel#nover+0xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]VMMVMKCall_Call@vmkernel#nover+0xx stack: 0xx
0xnnnnnnnnnnnn:[0xnnnnnnnnnnnn]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x0

This occurs if your virtual machine’s hardware version is 13 and uses SPC-4 feature for the large virtual disk.

This issue is resolved in this release.
The Marvell Console device on the Marvell 9230 ACHI controller is not available

According to the kernel log, the ATAPI device is exposed on one of the AHCI ports of the Marvell 9230 controller. This Marvel Console device is an interface to configure RAID of the Marvell 9230 AHCI controller, which is used from some Marvell CLI tools.

As a result of the esxcfg-scsidevs -l command, the host equipped with the Marvell 9230 controller cannot detect the SCSI device with the Local Marvell Processor display name.

The information in the kernel log is:
WARNING: vmw_ahci[XXXXXXXX]: scsiDiscover:the ATAPI device is not CD/DVD device

This issue is resolved in this release.
SSD congestion might cause multiple virtual machines to become unresponsiv

Depending on the workload and the number of virtual machines, diskgroups on the host might go into permanent device loss (PDL) state. This causes the diskgroups to not admit further IOs, rendering them unusable until manual intervention is performed.

This issue is resolved in this release.
An ESXi host might fail with purple screen when running HBR + CBT on a datastore that supports unmap

The ESXi functionality that allows unaligned unmap requests did not account for the fact that the unmap request may occur in a non-blocking context. If the unmap request is unaligned, and the requesting context is non-blocking, it could result in a purple screen. Common unaligned unmap requests in non-blocking context typically occur in HBR environments.

This issue is resolved in this release.
An ESXi host might lose connectivity to VMFS datastore

Due to a memory leak in the LVM module, you might see the LVM driver running out of memory on certain conditions, causing the ESXi host to lose access to the VMFS datastore.

This issue is resolved in this release.

↧

XCOPY Chunk Sizes – Revisited (and data reduction as well)

July 31, 2017, 11:12 am

≫ Next: VMworld 2017 – DellEMC XtremIO X2 Goes GA

≪ Previous: vSphere 6.5 Update 1 is out, here’s why you want to upgrade

As we get very close to the XtremIO X2 GA, I wanted to compare some important metrics, XCOPY performance and data reduction (DRR) between the two platforms.

Lets start with DRR, that’s a straight forward one to compare, I took a windows 10 VM and cloned it to an X1 array,

On X1, the capacity that VM consumed was 8.54GB Physical capacity and 11.89GB logical capacity, with a total DRR of 1.4:1

I then cloned it to X2, On X2, the capacity that VM consumed was 6.99GB Physical capacity and 11.97GB logical capacity, with a total DRR of 1.7:1, that’s DRR efficiency just there!

Now, let’s compare XCOPY speed and potentially look to optimize it even further.

By default, the XCOPY chunk size is 4MB, in the past we recommended to change it to 0256kb for X1 as it turned out to be the sweet spot between performance, time and latency. See a blog post I wrote here

So on X1, lets change the XCOPY chunk size to 0256kb using the following command

And run the XCOPY operation using the 0256KB XCOPY parameter, the operation started at 8:55:05 and concluded at, that’s 180 seconds

Because we used 256kb chunk size for the operation, we can see that the “blocks” reporting is highlighting this as larger than 1MB BW

Latency peaked at roughly 0.1 ms latency

And the array (single X1) CPU utilization has peaked to 81% during the operation

Now, on X2, lets change (or ensure) the XCOPY chunk size to 0256kb using the following command

And re-run the XCOPY operation using the 0256KB XCOPY parameter, the operation started at 7:58:30 and concluded at 8:00:20, that’s 80 seconds

Because we used 256kb chunk size for the operation, we can see that the “blocks” reporting is highlighting this as larger than 1MB BW

Latency peaked at roughly 0.023 ms latency

And the array (single X2-S) CPU utilization has peaked to 70% during the operation

Lastly, lets change the XCOPY chunk size to 4MB using the following command

And re-run the XCOPY operation using the 4MB XCOPY parameter, the operation started at 7:29:05 and concluded at 7:30:20, that’s 75 seconds

Because we used 4MB chunk size for the operation, we can see that the “blocks” reporting is highlighting this as larger than 1MB BW

Latency peaked at roughly 0.5 ms latency

And the array (single X2-S) CPU utilization has peaked to 55% during the operation

you can see a video i recorded showing it all here

So, to conclude, X2 cloning speed was X 2.25 faster! And using the 4MB XCOPY chunk size on X2, you could save extra 5 seconds of that 100 VMs cloning, obviously, the more VMs you clone, the larger the time gap will be!

XtremIO Model + Block Size Used By XCOPY	Time	Latency
X1 – 256kb	180 seconds	0.1 ms
X2 – 256kb	80 seconds	0.023 ms
X2 – 4096kb (4mb)	75 seconds	0.46 ms

↧

VMworld 2017 – DellEMC XtremIO X2 Goes GA

August 28, 2017, 12:32 am

≫ Next: First Dell EMC XtremIO X2 White Papers

≪ Previous: XCOPY Chunk Sizes – Revisited (and data reduction as well)

DIVInxSXUAAMh1T.jpg large.jpg

I’m super excited to announce that we have just GA’d XtremIO X2 during VMworld 2017 and that’s no coincidence!

XtremIO is the best AFA for your virtual workloads and it’s time to sum it all up in one post so here goes

The AFA vendors promised us that by moving to their box, we could consolidate their entire virtual workloads into one or two frames but that is hardly the reality, most of the dual controllers based AFAs out there cannot cope with the load of some of the heavy virtualized apps that are out there so you actually end up with more silos

And if the fact that your dual controllers AFA architecture cannot cope with the IOPs/latency, you also have another issue, they don’t enough RAM to be used as the metadata for the actual data, think of metadata as the DNA structure of your actual data, the faster it is, the faster you know how to retrieve it, that was always the magic behind the core architecture of XtremIO

Enter XtremIO X2.

if you haven’t read the series of blost post we wrote on X2, starts here

https://xtremio.me/2017/05/08/dellemc-xtremio-x2-part-1-the-project-and-the-people/

X2 design goals were

Performance – to provide you NVMe like latency with the cost of traditional SSD’s

We were able to achieve this by analyzing our call home data gathered from thousands of clusters out there and finding out most of the IOs are coming from small IOPS

We could have just used whatever is the latest intel CPUs available at the time but we wanted to achieve much higher numbers than what was achievable with them and so we came with a new feature called “write boost” which improve both reads and writes latency by up to 4 times!! You can find a much more in-depth explanation in a blog post + demo I wrote here

https://xtremio.me/2017/05/08/dellemc-xtremio-x2-part-3-performance-galore-or-you-built-a-time-machine-out-of-a-delorean/

you can also watch a video recording i had with our chief architect here

want to see the effect of write boost + scale out in a virtual environment? here’s a demo that will explains it
improving upon all the AXIs of the X1 success
Cost. in X2 we now support / have:

Larger drives (1.92TB and 3.68TB).

Denser DAE (from 23 drives to 72!, start with 18 drives and grow with packs of 6).

25% better compression (in some cases, its more than this).

Both odd / even configurations are supported.

Write boost which improve the $/IOPs.

No more BBUs which improve the usable capacity per a floor space.
Management

The new UI is HTML-5 based but that’s not it’s highlight, it’s the intelligence we put in the UI which is unique in the industry

https://xtremio.me/2017/05/08/dellemc-xtremio-x2x1-management-part-2-troubleshooting/

things like weekly pattern (think VMware vCOPS)

And block size per IOPs and histograms reporting which are the holy grail where it comes to storage reporting

See a demo here

https://www.youtube.com/watch?v=w8dZ8pqkXnA#action=share
enterprise grade features
Native replication which is going to change how you replicate data, think Avamar Vs data domain where all the changes are calculated at the source, see a post and a demo here

https://xtremio.me/2017/05/09/dellemc-xtremio-x2-tech-preview-2-native-replication/

or just the demo here

https://www.youtube.com/watch?v=vCplLcFEHdw&t=169s
Best VMware integration

The VMware integration we have is 2^nd to none and I’d like to think of it as pillars of different hookup to VMware vSphere

The 1^st one is the “CORE”, this include a unique VAAI XCOPY integration which clones your VMs in matter of seconds instead of minutes, all the data services of XtremIO are inline

As of vSphere 6 U3, there is a native SATP adapter that detects the fact you are using XtremIO and will change the default SATP from “fixed” to “Round Robin”

The 2^nd one is “Manage & Monitor”, this one includes the VSI vCenter plugin and the ESA (vROPS) plugin, both are free and both are giving you the options to completely manage your XtremIO array from vCenter and to monitor it using VMware vRealize Operations, see demos here

and here

the 3^rd one is “Protect”, it’s allowing you to fully protect your Datastores, VMs, files from within the VMs and to apply the Copy Data Management (CDM) on them, see a demo of Appsync here

the 4^th one is “automate”, this is where our customers are going, they want to automate everything and as such they can use our free vRealize Orchestrator XtremIO plugin, see a demo of it here

https://www.youtube.com/watch?v=Tx8NfIIQF3M

↧

First Dell EMC XtremIO X2 White Papers

August 30, 2017, 8:18 am

≫ Next: The First XtremIO X2 VDI Reference Architecture is ready, download it while it’s hot!

≪ Previous: VMworld 2017 – DellEMC XtremIO X2 Goes GA

As part of the launch of DellEMC XtremIO X2, we have just published the first batch of white papers, they are discussing the core architecture of X2, provide a good introduction to the platform, explain how HA works, how the unique RAID (XDP) works etc

Click the screenshots below to download them

↧

The First XtremIO X2 VDI Reference Architecture is ready, download it while it’s hot!

September 4, 2017, 12:32 pm

≫ Next: RecoverPoint For VMs (RP4VMs) 5.1 Patch 1 Is Available

≪ Previous: First Dell EMC XtremIO X2 White Papers

Hi Everyone,

Carrying on with the crazy momentum we got around the launch of X2, I’m proud to share with you the first X2 Solutions Reference Architecture,

VDI based on VMware Horizon 7.2, this is one of the many RAs we have planned for the next months, We also have a MS SQL and an Oracle one coming up in the next few weeks and a generic VSI one as well.

As previously communicated, X2 “Write Boost” will change the dynamics of $/IOPs as evidently can be seen from the attached RA.

Thanks for Gilad Kinor who authored the paper!

To download the white paper, simply click the screenshot below

Couple of demos that are relevant for this use case

XtremIO X1 Vs X2, Comparing XCOPY (VMs Cloning) Time and Data Reduction

Booting Up 4000 Windows 10 Linked Clones VMs on a single Dell EMC XtremIO X2-S Array

↧

RecoverPoint For VMs (RP4VMs) 5.1 Patch 1 Is Available

September 19, 2017, 5:25 am

≫ Next: AppSync 3.1.0.3 IS Out

≪ Previous: The First XtremIO X2 VDI Reference Architecture is ready, download it while it’s hot!

New minor release of RP4VMs 5.1, this release contains the following fixes

Issue number	Product feature	Problem summary	Found in version	Fixed in version	KBA
144676	RP4VM	When trying to replicate a virtual machine to an existing virtual machine, getting an error message: Failed to find suitable target VMs with identical disk configuration	5.0.1.2	5.1 P1	502485
145005	RP4VM	When guestNicInfo network name is unset, and the Eth network is DVSwitch port, the DGUI will get 2 different names for the same VLAN, therefore DGUI will fail to install.	5.1	5.1 P1	WIP
145174,145602	RP4VM	Network Disconnections	5.1	5.1 P1	504101
145535	RP4VM	On VDS network, after upgrade to 5.1, network could be disconnected.	5.1	5.1 P1	504096
145547	RP4VM	After upgrading to 5.1, the following event is issued in the Virtual Center on RecoverPoint for VMs shadow VM that have EFI firmware configured: This virtual machine has insufficient memory to boot with EFI firmware, which requires a minimum of 96 MB. Please increase the virtual machine’s memory and try again. The virtual machine stayed powered off even though the Power on task succeeded.	5.1	5.1 P1	504098
145562	RP4VM	After upgrading to 5.1, the following event was issued in the Virtual Center on RecoverPoint for VMs shadow VM. No operating system was found. If you have an operating system installation disc, you can insert the disc into the system’s CD-ROM drive and restart the virtual machine. The virtual machine continually restarted in order to find the OS, causing CPU spikes on the ESX.	5.1	5.1 P1	503392
145574	RP4VM	During migration to 5.1, the shadow networks were replaced with replica networks.	5.1	5.1 P1	WIP
145623	RP4VM	When attempting to recover the production VM after it was deleted (missing from the Vcenter inventory\deleted from datastore), RecoverPoint fails to recreate the production VM to synchronize the changes.	5.1	5.1 P1	504106
145691	RP4VM	During failover process, user prompts appeared on web client stating that: Error: The imported network configuration cannot be applied. Dismiss this message if you want the startup sequence to continue. If you do not want the start-up sequence to continue, click the Close button to disable image access.	5.1	5.1 P1	503632
145706, 145685	RP4VM	Floppy and boot order issues. Virtual machine repetitively restarted, enable image access failed to load replica VM.	5.1	5.1 P1	No KBA
145805	RP4VM	Reboot regulation due to many replication crashes with journal compression enabled when using snapshot consolidation.	5.1	5.1 P1	503581

as always, the new release can be downloaded from below (click the screenshot)

The link to the RecoverPoint for VMS 5.1.0.1 release notes is: Release Notes

Let the time for the content to be refreshed.

↧

AppSync 3.1.0.3 IS Out

September 21, 2017, 8:42 am

≫ Next: A new IDC / XtremIO X2 White Paper

≪ Previous: RecoverPoint For VMs (RP4VMs) 5.1 Patch 1 Is Available

We have just released a minor update to Dell EMC AppSync, here are the fixes listed in its release notes

CQ	Fix Description		Version Fixed
110140	Resolved an issue related to service plan mix up that occurred when protecting multiple datastores that resided on VPLEX virtual volumes.		3.1.0.3
110748	Resolved an issue where RecoverPoint dynamic mount of XtremIO devices failed for physical machines.		3.1.0.3
110803	Resolved an issue related to VMAX3 where the snapshots created were not removed from the array even after a copy failure.		3.1.0.3
110994	Resolved an issue where Oracle no hot backup protection failed when both data and log files were on the same datastore.		3.1.0.3

111189		Resolved an issue where mount of VxVM logical volumes on EMC power devices failed.		3.1.0.3
111219		Resolved an issue where AppSync did not display the SuSE Linux OS patch level information.		3.1.0.3
111260		Improved the performance of device discovery while configuring storage groups for VMAX arrays.		3.1.0.3
111361		Resolved an issue that occurred during Oracle database recovery, if the production database had multiple archive destinations.		3.1.0.3
111536		Resolved an XtremIO connection issue that occurred when repurposing second generation copies.		3.1.0.3
111686		Resolved a RecoverPoint connection issue that occurred during mount failures.		3.1.0.3
111804		Resolved an issue where snapshots were not removed from the Export group after an AppSync unmount operation.		3.1.0.3
111992		Resolved an issue related to Oracle shutdown that occurred during an unmount operation.		3.1.0.3
112051		Resolved an issue related to unmount failures where a valid copy was marked as failed during mounted refresh.		3.1.0.3
112096		AppSync now supports creating multiple CLI sessions from the same host.		3.1.0.3
112105		Resolved an issue where Starter pack non compliance events were generated even after applying the DPS and VSL licenses.		3.1.0.3
112129		Resolved an EPIC environment issue where unmount operations failed for some configurations.		3.1.0.3
112160		Resolved an issue where large BCT files caused failures during an Oracle service plan run.		3.1.0.3
112201		Resolved an issue where the AppSync agent was vulnerable to remote DOS attacks from unsolicited hosts. For more information, see EMC AppSync Security Configuration Guide.		3.1.0.3
112248		Resolved an issue related to service plan failure that occurred when users migrated LUN from one array to another array using SRDF/Metro.		3.1.0.3

112265		Resolved a DataGuard standby node issue where AppSync did not generate alerts when log apply was resumed.		3.1.0.3
112435		You can now select the Mount on Standalone Server and Create RMAN Catalog entry options when mounting the copy on demand from the AppSync GUI.		3.1.0.3
112449		Added the EMCSkipIndication flag provided by SMI-S 8.4.0.8 to improve performance when creating copies on VMAX2 arrays.		3.1.0.3
112630		Resolved an issue where mounting a copy of datastore to ESX cluster running ESXi 6.5 or later failed.		3.1.0.3

Table 2 Fixed Issues for 3.1.0.2

To download the new version, just click the screenshot below

↧

A new IDC / XtremIO X2 White Paper

September 25, 2017, 10:45 am

≫ Next: New VMware Horizon + App Volumes On Dell EMC XtremIO White Paper

≪ Previous: AppSync 3.1.0.3 IS Out

XtremIO X2 is gaining momentum which is awesome, we have just released a new IDC white paper highlighting the benefits of X2 for the DB, CDM and VDI use cases.

Download the paper by linking the screenshot below

↧

New VMware Horizon + App Volumes On Dell EMC XtremIO White Paper

October 19, 2017, 6:50 am

≫ Next: RecoverPoint For VMS (RP4VMs) 5.1 SP1 Is Out

≪ Previous: A new IDC / XtremIO X2 White Paper

Some weeks ago, we have published the first Dell EMC XtremIO X2 Reference Architecture for VMware Horizon (full and linked clones) on XtremIO, the paper can be accessed here

https://xtremio.me/2017/09/04/the-first-xtremio-x2-vdi-reference-architecture-is-ready-download-it-while-its-hot/

and while that was the first paper to get out of the door, we have just published a new one that is all about “Next Gen” VDI using Horizon Instant Clones and App Volumes.

If you are new to instant clones and app volumes, I wrote a very detailed blog post about it here

https://xtremio.me/2016/06/30/vmware-horizon-7-0-1-instant-clones-app-volumes-2-11-windows-10-office-2016-on-xtremio/

ready? Know all there is to know about it and ready to read how these technologies works on XtremIO X2 ? look no further, the white paper can be access below

And there is also a brand new demo we have just published

Thanks To Tomer Nahumi who authored the paper.

↧

RecoverPoint For VMS (RP4VMs) 5.1 SP1 Is Out

October 27, 2017, 3:27 am

≫ Next: Comparing the new X2 Compression to X1 on an Oracle DB

≪ Previous: New VMware Horizon + App Volumes On Dell EMC XtremIO White Paper

Today is a special day, we have finally released RP4VMS 5.1 SP1, to call it a service pack will be unfair, I think it’s the biggest release or at least, the biggest architecture change we have made to RP4VMs since its “1.0” version (or should I say, it’s 4.X version)

So what’s new?

First, RP4VMs is really based on the trusted, robusted, physical RecoverPoint, we have 2 product lines, the classic RP which I used to replicate data to and from physical Dell EMC and non EMC arrays , for example, from XtremIO to Unity etc, and the relatively new one RecoverPoint For VMs which is used to replicate VMs and doesn’t rely or requires a specific storage array brand.

The beauty of it is that it really works at the hypervisor level, meaning that you can set your RPO at the VM level as oppose to the lun level which can host multiple VMs and you can only select one RPO per a lun.

The use cases are

Fail-Over, you have your primary site down and you want to recovery your VMs at the remote site
Test Copy, you want to test your VMs at the remote site in an isolated network while your production site VMs (and replication) is still on going
Recover Production, you want to use the remote, replicated data and recover your VMs but on the production site, meaning, your production site Is up but the content of it’s VM or VMs is corrupted etc

From a very high level architecture, you deploy a vRPA (OVA appliance) at the ESXi level, you probably want to deploy at least 2 for performances and HA capabilities, you then want to do it at the “remote” cluster, this can be a remote site with a remote vCenter or just a local but a different ESXi cluster in case you want to protect your data locally, each one of the copies in each of the cases I mentioned should reside on another storage, it can be a “real” array or just DAS, VSAN etc

So far, I haven’t described anything new to this release so why do I think it’s the biggest release ever?

Because now we do not require the internal iSCSI kernel to communicate between the vRPA to the ESXi kernel, all the internal and external communication is done over IP, that means that you don’t need to mess about with ESXI software kernel adapters, multipathing them etc, what a god send!, as someone who works very closely with the product, I can’t even describe how easier it is to work with the IP splitter Vs the iSCSI one and I am very excited to see it finally GA.

Some other components, just like the past, you have your deployment manager (web based) and your vCenter plugin to manage your VMs, from the ESXi splitter, you can now use either the vSCSI one (the “old” iSCSI based splitter for legacy customers who don’t want to upgrade to the new IP based one) and the ESXI VAIO filter which now can fully utilize the IP splitter.

Some scalability numbers that were also improved along the versions,
Up to 50 vRPAss per a vCenter
Up to 8000 protected VMS per a vCenter
Up to 256 Consistency Groups per a vRPA Cluster
Up to 1000 protected VMs per a vRPA Cluster
Up to 5 vCenters and ESXi Clusters registered with each vRPA Cluster

Just like the past, you can set the CG’s boot up priorities and the priority of the VMs power up within the CG.
Starting with 5.0.1 MAC replication to remote Copies is enabled by default
By default, MAC replication to Local Copies is disabled
During the Protect VM wizard, the user will have the option to enable MAC replication for copies residing on the same vCenter (local copies and remote copies when RPVM clusters are sharing the same vCenter)
- This can create a MAC conflict if the VM is protected back within the same VC/Network
- Available for different networks and/or VCs hosting the Local Copy
When enabled, The production VM network configuration is also preserved (so there’s no need to configure the Re-IP).

New design for Re-IP (5.0.1)
- Manage IP settings through UI
- No scripts required
- No manual operations
- Single click to retrieve protected VM settings
- Supports Microsoft and Linux
- Automatic MAC replication for remote copies
You can either assign the new IP for your recovered failed-over VM or in a case of a L2 network, you can leave the source IP when for the failed-over VM

Pre-Defined Failover Network Configuration

Define failover network per CG:

During Protect VM wizard

For already protected VMs

During Failover wizard

Testing Point-in-Time:

User can choose the pre-defined failover network, or a dedicated isolated network

Promote to failover flow:

User can continue with current test network, or use the pre-defined failover network

Recovery Operations – Status Reporting

Expand Or Reduce CG Without Journal Loss
- Provide flexibility for CG configuration
  - Add a new VM to the same CG
  - Add a VMDK to the protected VM
  - Remove a VM from an existing CG
  - Remove a VMDK from a protected VM
- No impact to Journal history
You can watch a video of the new IP based deployment manager here, Thanks Idan Kentor for producing it!

as always, you can download the new version from support.emc.com

2017-10-27_9-33-54

↧

Comparing the new X2 Compression to X1 on an Oracle DB

October 29, 2017, 8:07 am

≫ Next: Dell EMC XtremIO At Storage Field Day 2017

≪ Previous: RecoverPoint For VMS (RP4VMs) 5.1 SP1 Is Out

We have just released a new white paper comparing the improved data savings on the XtremIO X2 platform compared to X1.

The WP can be downloaded clicking the screenshot below

Huge thanks to Maciej Przepiorka who worked on the paper!

↧

Dell EMC XtremIO At Storage Field Day 2017

October 30, 2017, 11:27 am

≫ Next: Enterprise Storage Analytics (ESA) version 5.0 is now available.

≪ Previous: Comparing the new X2 Compression to X1 on an Oracle DB

Hi Folks

Between November the 8^th – 11^th, Dell EMC will present at Storage Field day 14, as part of our appearance, we will have a long deep dive session on X2 presented by our Chief Architect and our Field CTO, the materials will provide a deep dive into the XtremIO X2 product architecture plus new to be seen demos of it!, do not miss it, it’s your chance to view and ask questions in real time!

http://techfieldday.com/companies/dell-emc/

 November 9, 2017: Dell EMC Presents Data Protection at Storage Field Day 14

 November 9, 2017: Dell EMC Presents Midrange Systems at Storage Field Day 14

 November 8, 2017: Dell EMC Presents High-End Systems at Storage Field Day 14

↧

Enterprise Storage Analytics (ESA) version 5.0 is now available.

November 5, 2019, 12:17 pm

≫ Next: The CSI plugin 1.0 for Unity is now available

≪ Previous: Dell EMC XtremIO At Storage Field Day 2017

We have just released the new (5.0) and free version of our vROPS adapter, also known as ESA, it’s one of the first one that support VMware vRealize Operations 8.0! , if you are new to this, I highly suggest you start reading about it here

https://xtremio.me/2014/12/09/emc-storage-analytics-esa-3-0-now-with-xtremio-support/

ESA Overview:

Dell EMC Enterprise Storage Analytics(ESA) for vROps(formerly EMC Storage Analytics) allows VMware vRealize Operations customers to monitor their Dell EMC storage within a single tool (vROps); reducing time to resolution and handling problems before they happen.

Using vROps with ESA, users can:

Monitor view relationships/topologies from VMs to the underlying storage
View alerts and anomalies
Storage capacity including capacity planning
Use out-of-the-box Reports, Dashboards, Alerts or customize their own
View performance, performance trends, analysis, etc.

Highlights of this Release:

Rebranding ScaleIO to VxFlex OS
Support for latest Dell EMC storage platforms
Support vROps 8.0 using the latest libraries and SDK

Installation is super easy, just upgrade your existing ESA or install a new one, the links can be found below

Once the installation is done, you want to configure the Dell EMC storage array you would like to show it’s matrixes by creating an account to it as shown below

And once that is done, you will immediately see ESA start collecting data from the array, give it couple of minutes.

Once the collection is done, you can either use the generic vROPS dashboards or use our customized, storage based reports as seen below

Of course, the real value of vROPS with ESA is that it shows you an end-to-end visibility, from the hosts -> VM -> storage array which provides a very powerful insight !

Resources:

Online Support Pages

https://support.emc.com/products/30680_Enterprise-Storage-Analytics-for-vRealize-Operations
Technical Documentation

Product Guide: https://support.emc.com/docu96233_Enterprise-Storage-Analytics-for-vRealize-Operations-5.0-Product-Guide.pdf?language=en_US

Release Notes:
https://support.emc.com/docu96232_Enterprise-Storage-Analytics-for-vRealize-Operations-5.0-Release-Notes.pdf?language=en_US
Simple Support Matrix

https://elabnavigator.emc.com/vault/pdf/EMC_Storage_Analytics_ESA.pdf?key=1572892504759

↧

The CSI plugin 1.0 for Unity is now available

November 22, 2019, 10:21 am

≫ Next: Isilon – The Challenge of Files at Scale

≪ Previous: Enterprise Storage Analytics (ESA) version 5.0 is now available.

CSI Driver for Dell EMC Unity XT Family enables integration with Kubernetes open-source container orchestration infrastructure and delivers scalable persistent storage provisioning operations for Dell EMC Unity.

Highlights of this Release:

 Dynamic persistent volume provisioning

 Snapshot creation

 Automated volume provisioning workflow on Fiber Channel arrays

 Ease of volume identification

Software Support:

Supports CSI 1.1
Supports Kubernetes version 1.14
Supports Unity XT All-Flash and Hybrid Unified arrays based on Dell EMC Unity OE v.5.x

Operating Systems:

Supports Red Hat Enterprise Linux 7.5/7.6
Supports Centos 7.6

Resources:

below you can see a quick demo about using the plugin

↧

Isilon – The Challenge of Files at Scale

November 24, 2019, 10:29 pm

≫ Next: XtremIO 6.3 is here, Sync, Scale & Protect!

≪ Previous: The CSI plugin 1.0 for Unity is now available

I get a lot of requests to post about iSilon so I hooked up with Ron Steinke, A Technical Staff member of the iSilon Software Engineering to write some guest posts, I would really appreciate the feedback and whether you would like us to write more about iSilon

Scalability is a harder problem for stateful, feature rich solutions. Distributed filesystems are a prime example of this, as coordinating namespace and metadata updates between multiple head nodes presents a challenge not found in block or object storage.

The key is to remember that this challenge must be viewed in the context of the simplification it brings to application development. Application developers choose to use files to simplify the development process, with less concern about what this means for the ultimate deployment of the application at scale. For a process limited by the application development lifecycle, or dependent on third party applications, the tradeoff of utilizing a more complex storage solution is often the right one.

Part of the challenge of file scalability is fully replicating the typical file environment. Any scalability solution which imposes restrictions which aren’t present in the development environment is likely to run against assumptions built into applications. This leads to major headaches, and the burden of solving them usually lands on the storage administrator. A few of the common workarounds for a scalable flat file namespace illustrate these kinds of limitations.

One approach is to have a single node in the storage cluster managing the namespace, with scalability only for file data storage. While this approach may provide some scalability in other kinds of storage, it’s fairly easy to saturate the namespace node with a file workload.

A good example of this approach is default Apache HDFS implementation. While the data is distributed across many nodes, all namespace work (file creation, deletion, rename) is done by a single name node. This is great if you want to read through the contents of a large subset of your data, perform analytics, and aggregate the statistics. It’s less great if your workload is creating a lot of files and moving them around.

Another approach is namespace aggregation, where different parts of the storage array service different parts of the filesystem. This is effectively taking UNIX mount points to their logical conclusion. While this is mostly transparent to applications, it requires administrators to predict how much storage space each individual mount point will require. With dozens or hundreds of individual mount points, this quickly becomes a massive administration headache.

Worse is what happens when you want to reorganize your storage. The storage allocations that were originally made typically reflect the team structure of the organization at the time the storage was purchased. Organizations being what they are, the human structure is going to change. Changing the data within a single mount point involves renaming a few directories. Changes across mount points, or the creation of new mount points, involve data rewrites that will take longer and longer as the scale of your data grows.

Clearly these approaches will work for certain kinds of workflows. Sadly, most storage administrators don’t have control of their users’ workflows, or even good documentation of what those workflows will be. The combination of arbitrary workflows and future scaling requirements ultimately pushes many organizations away from limited solutions.

The alternative is a scale-out filesystem, which looks like a single machine both from the users’ and administrators’ perspective. A scale-out system isolates the logical layout of the filesystem namespace from the physical layout of where the data is stored. All nodes in a scale-out system are peers, avoiding specials roles that may make a particular node a choke point. This parallel architecture also allows each scale-out cluster to grow to meet the users’ needs, allowing storage sizes far larger than any other filesystem platform.

There are four main requirements to provide the transparency of scale-out:

A single flat namespace, available from and serviced by all protocol heads. This removes the scaling limitation of a single namespace node, by allowing the capacity for namespace work to scale with the number of nodes in the system.
Flat allocation of storage space across the namespace. While the data may ultimately be stored in different cost/performance tiers, these should not be tied to artificial boundaries in the namespace.
The ability to add space to the global pool by adding new storage nodes to the existing system. Hiding this from the way applications access the system greatly simplifies future capacity planning.
Fail-in-place, the ability of the system to continue operating if drives or nodes fail or are removed from the system. This removal will necessarily reduce the available storage capacity, but should not prevent the system from continuing to function. All of these capabilities are necessary to take full advantage of the power of a scale-out filesystem. Future posts will discuss some of the benefits and challenges this kind of scalable system brings. In my next post, we’ll see how the last two elements in the list help to enhance the long-term upgradability of the system.

↧

XtremIO 6.3 is here, Sync, Scale & Protect!

December 10, 2019, 4:50 am

≫ Next: The Kubernetes CSI Driver for Dell EMC Isilon v1.0 is now available

≪ Previous: Isilon – The Challenge of Files at Scale

We have just released the new XtremIO, 6.3 version with some big enhancements, so let’s dive into each one of them!

XtremIO Remote Protection

XtremIO Metadata-Aware Replication

XtremIO Metadata-Aware Asynchronous Replication leverages the XtremIO architecture to provide the most efficient replication that reduces the bandwidth consumption. XtremIO Content-Aware Storage (CAS) architecture and in-memory metadata allow the replication to transfer only unique data blocks to the target array. Every data block that is written in XtremIO is identified by a fingerprint which is kept in the data block’s metadata information.

If the fingerprint is unique, the data block is physically written and the metadata points to the physical block.
If the fingerprint is not unique, it is kept in the metadata and points to an existing physical block.

A non-unique data block, which already exists on the target array, is not sent again (deduplicated). Instead, only the block metadata is replicated and updated at the target array.

The transferred unique data blocks are sent compressed over the wire.

XtremIO Asynchronous Replication is based on Snapshot-shipping method that allows XtremIO to transfer only the changes, by comparing the changes between two subsequent Snapshots, benefiting from write-folding.

This efficient replication is not limited per volume, per replication session or per single source array, but is a global deduplication technology across all volumes and all source arrays.

In a fan-in environment, replicating from four sites to a single target site, as displayed in ‎Figure 14, overall storage capacity requirements (in all primary sites and the target site) are reduced by up to 38 percent^[1], providing the customers with considerable cost savings.

001

Global Data Reduction with XtremIO Metadata-Aware Replication

XtremIO Synchronous Replication (new to 6.3)

XtremIO enables to protect the data both asynchronously and synchronously when ‘zero data loss’ data protection policy is required.

XtremIO Synchronous replication is fully integrated with Asynchronous replication, in-memory snapshots and iCDM capabilities, which makes it the most efficient.

The challenge with Synchronous replication arises when the source and target are out of sync. This is true during the initial sync phase as well as when a disconnection occurs due to link failure or a user-initiated operation (for example, pausing the replication or performing failover).

Synchronous replication is highly efficient as a result of using these unique capabilities:

Metadata-aware replication – For the initial synchronization phase and when the target gets out of sync, the replication is using the metadata-aware replication to efficiently and quickly replicate the data to the target. The replication is using multiple cycles until the gap is minimal and then switches to synchronous replication. This reduces the impact on the production to a minimum and accelerates the sync time.
Recovery Snapshots – to avoid the need for a full copy or even a full metadata copy, XtremIO leverages the in-memory snapshots capabilities. Every few minutes recovery-snapshots are created on both sides, which can be used as a baseline in case a disconnection occurs. When the connection is resumed, the system only needs to replicate the changes made since the most recent recovery snapshot prior to the disconnection.
Prioritization – In order to ensure the best performance for the applications using Sync replication, XtremIO automatically prioritizes the I/O of Sync replication over Async replication. Everything is done automatically and no tuning or special definition is required.
Auto-Recovery from link disconnection– The replication resumes automatically when the link is back to normal.

XtremIO Synchronous replication is managed at the same location as the Asynchronous replication and supports all Disaster Recovery operations similarly to Asynchronous replication.

Switching between Async to Sync is simply performed using a single command or via the UI as can be seen below

Once changed, you can also see the new replication mode in the remote session view

Best Protection

XtremIO replication efficiency allows XtremIO to support the replication for All-Flash Arrays (AFA) high performance workloads. The replication supports both Synchronous replication and Asynchronous replication with an RPO as low as 30 seconds and can keep up to 500 PITs.^[2]. XtremIO offers simple operations and workflows for managing the replication and integration with iCDM for both Synchronous and Asynchronous replication:

Test a Copy (Current or specific PIT) at the remote host –
Testing a copy does not impact the production and the replication, which continues to replicate the changes to the target array, and is not limited by time. The “Test Copy” operation is using the same SCSI identity for the target volumes as will be used in case of Failover.
Failover –
Using the failover command, it is possible to select the current or any PIT at the target and promote it to the remote host. Promoting a PIT is instantaneous.
Failback –
Fails over back from the target array to the source array.
Repurposing Copies –
XtremIO offers a simple command to create a new environment from any of the replication PITs.

Refresh a Repurposing Copy –
With a single command, a repurpose copy can be refreshed from any replication PIT. This is very useful when refreshing the data from the production to a test environment that resides on a different cluster or when refreshing the DEV environment from any of the existing build versions.
- Creating a Bookmark on demand –
  An ad-hoc PIT can be created when needed. This option is very useful when an application-aware PIT is required or before performing maintenance or upgrades.

Unified View for Local and Remote Protection

A dedicated tab for data protection exists in the XtremIO GUI for managing XtremIO local and remote protection (Sync & Async replication). In the Data Protection Overview screen, a high level view displays the status for all Local and Remote protections, as shown in ‎Figure 15.

002

Data Protection Overview Screen

The Overview section includes:

The minimum RPO compliance for all Local and Protection sessions
Protection sessions status chart
Connectivity information between peer clusters

From the overview screen it is easy to drill down to the session

Consistency Group Protection View

With the new unified protection approach in one view it is easy to understand the protection for the consistency group. The Topology View pane displays the local and remote protection topology of the Consistency Group, as shown in ‎Figure 16. Clicking each of the targets displays the detailed information in the Information pane.

003

Secured Snapshots

The purpose of this feature is to allow a customer to protect snapshots created by a “Local Protection Session” against an accidental user deletion:

Once a “Local Protection Session” creates the snapshot, it is automatically marked as “Secured”
The snapshot’s protection will expire once the snapshot is due for deletion by its retention policy

Once a “Local Protection Session” is set as create secured snapshots:

It cannot be set to create non-secure snapshots

Once a snapshot is set as “Secured”:

The snapshot cannot be deleted
The snapshot-set containing this snapshot cannot be deleted
The contents of this snapshot cannot be restored nor refreshed

Secured Snapshots – Override
Procedure
- Require Case – Legal Obligation
New XMCLI Command
- Tech-Level Command
To remove the “secured” flag, a formal ticket must be filed to DellEMC.
This is a legal obligation!
Once filed, a technician level user account can use the new “remove-secured-snap-flag” XMCLI command to release the “secure” flag.

Secured Snapshots – Examples – Create a Protected Local Protection

The following output displays the creation of a new “Local Protection Session” with a “Secured Flag” setting

The following output displays the modification of an existing “Local Protection Session” to start creating snapshots with a “Secured Flag” setting

Secured Snapshots – Examples – Snapshot Query & Release

The first output displays the query of the properties of a “Secured Snapshot”

The second output displays an attempt of deleting a “Secured Snapshot”

The third output displays how to release the “Secured” flag (using a tech level user account)

Below, you can also see how a created protection policy on a consistency group look like when you decide to not allow the option to delete the snapshot

And this is the error you will get if you (or someone else) will try to remove this volume

IPv6 Dual Stack
Dual IP Versions (4 & 6)
XMS Management
- External IPv4 & IPv6
- Internal IPv4 or IPv6
Storage Controller
- iSCSI IPv4 & IPv6
Native Replication IPv4 Only

This feature allows a customer to assign multiple IP addresses (IPv4 and IPv6) to a single interface

Let’s discuss the different network interfaces used in XtremIO and see what changed

XMS Management traffic is used to allow a user to connect to various interfaces (such as WebUI, XMCLI, RestAPI, etc.)

Previously, the user could configure either an IPv4 or an IPv6 IP Address.

The new feature allows assigning two IP Addresses – One of each version

XMS Management traffic is also used internally to connect to clusters (SYM and PM)

The behavior on this interface wasn’t changed – All managed cluster should use the same IP version

Storage Controllers’ iSCSI traffic allows external host connectivity

Previously, the user could only configure IP addresses from a single IP version

The new feature allows assigning multiple IP Addresses with different versions

Native Replication behavior remains the same – These interfaces are limited to IPv4 IP addresses

IPv6 Dual Stack – XMS Management – New XMCLI commands

Multiple parameters were added to the “show-xms” XMCLI command to support this feature:

Figure 1 Two new parameters which determine the IP versions

Figure 2: The “Primary IP Address” parameter names remained as is (to conform with backward-compatibility requirements)

Figure 3: Various new parameters to describe the new “Secondary IP Address”

Two additional XMCLI commands were introduced to support this feature:

Figure 1 The “add-xms-secondary-ip-address” XMCLI command sets the XMS management interface “Secondary IP Address” and “Secondary IP Gateway”

Figure 2:
The “remove-xms-secondary-ip-address” XMCLI command removes the XMS management interface “Secondary IP Address” and “Secondary IP Gateway”

Note that there is no “modify” command – To modify the secondary IP address, remove and set it again with its corrected values.

IPv6 Dual Stack – Storage Controller Management – Interfaces

To support the changes made to the Storage Controllers’ iSCSI interfaces settings, the following changes were implemented:
Per “iSCSI Portals” – The user can now configure multiple iSCSI portals with different IP versions on the same interface
Per “iSCSI Routes” – The user can now configure multiple IPv4 and IPv6 iSCSI routes

As explained earlier, the Storage Controller Native Replication interface behavior remains as is – The interfaces only allow IPv4 IP addresses.

Scalability

Scalability Increase

The new 6.3.0 release supports an increased number of objects:
Number of volumes and copies (per cluster) was increased to 32K
Number of SCSI3 Registrations and Reservations (per cluster) was increased to 32K

Below you can see a demo, showing how to configure Sync Replication from the GUI

And here you can see how Sync replication works with VMware SRM in conjunction with our unique point in time failover in case where you don’t want to failover to the last point in time

↧

The Kubernetes CSI Driver for Dell EMC Isilon v1.0 is now available

December 11, 2019, 1:33 pm

≫ Next: The Kubernetes CSI Driver for Dell EMC PowerMax v1.1 is now available

≪ Previous: XtremIO 6.3 is here, Sync, Scale & Protect!

Isilon, the de-facto scale out NAS platform, so much of it’s Ptb are being used all over the world, no wonder why i’m getting a lot of requests to have the Kubernetes CSI plugin available for it, well, now you have it!

Product Overview:

CSI Driver for Dell EMC Isilon enables integration with Kubernetes, open-source container orchestrator, infrastructure and delivers scalable persistent storage provisioning operations for Dell EMC Isilon storage system.

Highlights of this Release:

The CSI driver for Dell EMC Isilon enables Isilon use as a persistent storage in the Kubernetes clusters. The driver enables fully automated workflows for dynamic and static persistent volume (PV) provisioning and snapshot creation. Driver uses SmartQuotas to limit volume size.

Software Support:

This introductory release of CSI Driver for Dell EMC Isilon supports the following features:

Supports CSI 1.1

Supports Kubernetes version 1.14.x
Supports Red Hat Enterprise Linux 7.6 host operating system
Persistent Volume (PV) capabilities:
- Create from scratch
- Create from snapshot
- Delete
Dynamic PV provisioning
Volume mount as NFS export
HELM charts installer
Access modes:
- SINGLE_NODE_WRITER
- MULTI_NODE_READER_ONLY
- MULTI_NODE_MULTI_WRITER
Snapshot capabilities:
- Create
- Delete

Note:
Volume Snapshots is an Alpha feature in Kubernetes. It is recommended for use only in short-lived testing clusters, as features in the Alpha stage have an increased risk of bugs and a lack of long-term support. See Kubernetes documentation for more information about feature stages.

Refer to Dell EMC Simple Support Matrix for the latest product version qualifications.

Resources:

CSI Driver for Dell EMC Isilon files and documentation are available for download on:

Github: https://github.com/dell/csi-isilon
Docker Hub:
https://hub.docker.com/r/dellemc/csi-isilon
Dell EMC Community forums for CSI Drivers and Containers

below you can see a demo about how it all works:

↧

The Kubernetes CSI Driver for Dell EMC PowerMax v1.1 is now available

December 12, 2019, 9:54 pm

≫ Next: Dell EMC PowerProtect 19.3 is available, Kubernetes Integration? You bet!

≪ Previous: The Kubernetes CSI Driver for Dell EMC Isilon v1.0 is now available

Product Overview:

CSI Driver for PowerMax enables integration with Kubernetes open-source container orchestration infrastructure and delivers scalable persistent storage provisioning operations for PowerMax and All Flash Arrays.

Highlights of this Release:

The CSI Driver for Dell EMC PowerMax has the following features:

Supports CSI 1.1
Supports Kubernetes version 1.13, and 1.14
Supports Unisphere for PowerMax 9.1
Supports Fibre Channel
Supports Red Hat Enterprise Linux 7.6 host operating system
Supports PowerMax – 5978.444.444 and 5978.221.221
Supports Linux native multipathing
Persistent Volume (PV) capabilities:
- Create
- Delete
Dynamic and Static PV provisioning
Volume mount as ext4 or xfs file system on the worker node
Volume prefix for easier LUN identification in Unisphere
HELM charts installer
Access modes:
- SINGLE_NODE_WRITER
- SINGLE_NODE_READER_ONLY

Software Support:

Supports Kubernetes v1.14
Supports PowerMax – 5978.221.221 (ELM SR), 5978.444.444 (Foxtail)
CSI v1.1 compliant

Operating Systems:

Supports CentOS 7.3 and 7.5
Supports Red Hat Enterprise Linux 7.6

Resources:

CSI Driver for Dell EMC PowerMax files and documentation are available for download on:
Github: https://github.com/dell/csi-powermax
Docker Hub:
https://hub.docker.com/r/dellemc/csi-powermax
Dell EMC Community forums for CSI Drivers and Containers

↧