Tag Archives: PowerShell

Taking FlashArray Snapshots with Veeam

In April 2018, Veeam released the Universal Storage API which enabled storage vendors like Pure Storage to create integrations for Veeam with their storage system. At a high level, this functionality allows Veeam to leverage storage system snapshots when performing backups as well as take snapshots of volumes for instant restore of VMs or granular file restoration.

In the initial release of the Pure Storage FlashArray plugin, the ability for Veeam to see and utilize existing snapshots on the FlashArray is unavailable. Additionally, it’s not currently possible for Veeam to take snapshots of all the volumes associated with a Protection Group. Joint customers have expressed the desire for this functionality but development takes time.

In the mean time, I created a script that gives customers the ability for Veeam to create snapshots of all the volumes in a FlashArray Protection Group. This script is designed to be run automatically using Windows Task Scheduler; however, you can run it from a PowerShell command prompt for a quick, one time use.

The most significant use case I created this for was recovering file shares faster if it was encrypted by a malware attack. It’s totally possible to immediately remediate the most extreme case where the whole file share is encrypted by overwriting the volume from a storage snapshot but what if it’s just a user’s home directory or a small subset of the file share?

In the following example, I have snapshots on the FlashArray that were taken by Veeam:

veeam-snapshots-on-flasharray

From Veeam’s view:

veeam-snapshots-in-veeam

When selecting a snapshot, you can see each VM protected by that snapshot:
veeam-recovery-options

This integration is extremely powerful as it provides instant VM, guest file, and application item recovery from FlashArray snapshots instead of backup.

In a sample test, I recovered a single Windows Server 2016 VM in just over a minute:

vm-recovery

Veeam performs this operation similar to how it operates when restoring from a backup, with the exception that it creates a volume on the FlashArray from the snapshot,  presents it to the applicable host, rescans the hosts’s HBA, mounts the volume, and adds the VM to vCenter.

vm-recovered

Known Limitations

Currently the first version of this script only supports volume-based Protection Groups. If your Protection Group’s members are hosts or host groups, the script will not work. I anticipate fixing this in an upcoming release as well as adding the ability to specify a volume instead of a Protection Group. Additionally, this script doesn’t limit the number of snapshots taken so please monitor your usage. A future version will address this issue as well.

If you have questions about installing and configuring the Pure Storage FlashArray plugin for Veeam, check out Stephen Owens’ blog posts:

 

To download the script, head over to the script’s repository on my GitHub page.

Use Case for Invalidating PernixData Read Cache

When sizing an acceleration resource for PernixData FVP, we look to recommend a resource that can contain the working set of data for the VMs using that resource. The majority of the time this isn’t a problem. However, we recently came across a use case where the working set of the VM was so large that the VM used the whole flash device and began garbage collection after just 3 days. Let’s take a closer look at what’s going on and how we found a feasible solution for the customer.
The workload is a batch processing application that runs once a day and churns through a lot of data during it’s processing time. Before implementing FVP, this was taking approximately 2.25 hours. The first day that it ran after installing FVP, the processing time was reduced a noticeable amount as FVP accelerates every write operation in write back but caches blocks for read acceleration as the VM requests them. By the 2nd or 3rd day, batch processing time was nearly reduced by 50% and customer was thrilled. But the 4th day, processing time was creeping back up to around 2.25 hours. It turned out that even with an 800 GB SSD, the drive filled up and was affected by frequent write amplification and garbage collection to take on newly read blocks and service the writes. So what about stringing some SSDs together in RAID0?
This is a very common question that we get so let’s investigate SSD and RAID a bit.
First, it’s important to understand that SSDs don’t fail like HDD. As SSDs near failure, they might continue to service IO but at an extremely high and invariable latency whereas we typically see consistent performance. In turn, a single SSD that is failing will drag down performance of the RAID array because the array can only performa at the latency and throughput of the slowest SSDs.
As a side note, FVP has adaptive resource management. As soon as FVP determines that going to a given flash device is slower than simply using the backend SAN, it will correctly deactivate said flash device from the FVP cluster. This is technology that is built from ground up to deal with the failure characteristics of SSDs (i..e good ssd vs slow SSD vs failed SSD). RAID was never built to make a distinction between a slow SSD and a failed SSD. Being highly available with an impractical-to-use flash device is pointless — that is actually worse than being not available.
The customer is also using blade servers so a larger capacity PCIe card isn’t an option. So the customer has the largest available capacity in SSD format. Currently the customer needed a solution to consistently achieve the reduction in batch processing time.
What can we do?
PernixData FVP ships with some very robust PowerShell cmdlets that allows us to manage the environment and automate all the things. We first considered using PowerShell to remove the VM from the FVP cluster every few days to force the cache to drop but that would also lose all the VM performance graphs. Instead, our solution was to use PowerShell to blacklist the VM every 3 days. The customer is happy with this solution because even on the first run with batch processing and FVP, run time was still better than straight to the array.
What does this look like?
#Setup some variables
$fvp_server = "fvpservername"
$fvp_username = "domain\username"
#TODO: This isn't very scalable for large environment. I know there's a better way.
$vms = "vm01", "vm02"
#A file we're going to use to store the password in an encrypted format
$passwordfile = "c:\path\fvp_enc.txt"

import-module prnxcli

# Authentication and connection strings removed for brevity. 

#Loop through the list of VMs and set each to be blacklisted, wait 30 seconds then add them back to cluster in WB
foreach ($vm in $vms)
{
     write-host "Blacklisting $vm to invalidate cache"
     Set-PrnxProperty $vm -Name cachePolicy –value 1
     Start-Sleep -seconds 10
     write-host “Adding $vm back to FVP cluster in write back"
     Set-PrnxProperty $vm -Name cachePolicy -value 3
     Set-PrnxAccelerationPolicy -WaitTimeSeconds 30 -name $vm -WB -NumWBPeers 1 -NumWBExternalPeers 0
}

Disconnect-PrnxServer

This scenario isn’t very typical but with large data sets, sometimes you need to get creative!

Clone this repo on GitHub to get the full script or fork and make it fancy! https://github.com/bdwill/prnx-invalidate-cache