FlexGroup volumes are NetApp ONTAP’s scale out NAS solution, providing a blend of capacity, performance and simplicity by leveraging what has always worked for ONTAP – FlexVols.
While the use cases for FlexGroup volumes are numerous, we’ve generally touted FlexGroup volumes as a way to enhance high file count environments with high amount of ingest of data. Most of that is centered around the extreme amount of performance you can achieve with a FlexGroup volume, as evidenced in our SPEC SFS®2014_swbuild submissions:
- Oops! FlexGroup Volumes Did It Again: Record Shattering SPEC SFS®2014_swbuild Test Results on the AFF A800!
- Keep on Scalin’: How FlexGroup Volumes Scale Out Performance
Naturally, when you do something really well, you want to talk about it!
Recently, we had a customer interested in another use case for FlexGroup volumes – a repository for MSSQL backups over SMB. This use case was interesting, as generally, our performance numbers have focused on NFSv3. Now, we get to see how well an SMB workload could do on a FlexGroup volume.
Given the architecture of the feature, I expected it to fare well. But you never know until it’s proven out. Luckily, the customer’s account team was gracious enough to let me “borrow” their testing data.
Testing goals and config
In this environment, this customer wanted to perform compressed backups of 5,000 MSSQL servers over SMB. This test was done with around 200 servers to vet out the solution, with a slow ramp up over the course of a few months.
But this won’t just be a backup target – it will also be replicated to a DR site via SnapMirror for extra data protection.
Each site would have a 6-node FAS8200 cluster running ONTAP 9.4 using 6TB nearline SAS encrypted drives. Each cluster would hold 3PB of usable capacity. The clusters would use 30 FlexGroup volumes and leveraging qtrees within the volumes for data organization.
The FlexGroup volumes themselves were 64TB each and the member volumes were 2.6TB apiece, with 4 members per node across 6 nodes (24 total members per FlexGroup volume).
The environment would look something like this:
This customer needed a single namespace that could collect ~150TB worth of MSSQL backup data over a 12 hour period. That’s ~12TB/hour at ~3.5GB/sec.
Testing utilized 222 servers at site A and 171 servers at site B. During the test, each cluster’s CPU was at 95% utilization (which meant we were doing work!) and the backup jobs (sequential writes) were able to accomplish around 8.4GB/second – a whopping ~2.4x the amount of throughput the job needed. That means the backups would complete in around ~5 hours, rather than the 12 hour window. Additionally, this SMB workload was able to perform around 120,000 IOPS. When more clients are added to this workload, we’d expect the throughput to max out at around 9GB/sec.
The screen capture below shows the throughput and total operations achieved during the test runs:
In addition to the performance seen on the FlexGroup for the production workload, this customer was also able to achieve a high rate of transfer for the SnapMirror relationships between sites – 8.4GB/sec for the SnapMirror transfer. This means that the replication window for a 150TB dataset would be about 5.5 hours for the initial transfer. After that, the deltas should be able to complete well within the required transfer window, providing a solid disaster recovery plan for these MSSQL backups.
Scale out performance
This 6 node cluster was able to push over 8.4GB/second to a FlexGroup volume. In our Customer Proof of Concept labs, we’ve seen near-linear performance gains by adding nodes to a cluster. The following graphs show a single node A700 and a 2-node A700 throughput results.
So if we wanted to add even more performance to our backup workload, we could add more nodes!
Not only is a FlexGroup volume great for small file/high file count workloads such as EDA and software builds, but it also can handle high throughput requirements for larger streaming files and offer the ability to reduce backup windows by providing a way to scale out storage across multiple nodes and leverage all your cluster resources and providing performance even with spinning disk.
Be sure to check out the following blogs and Technical Reports for more information on FlexGroup volumes: