When ONTAP first boots up, each FlexVol volume gets mapped to a CPU thread. Think of it as an “arranged marriage,” of sorts.
The reason volumes get mapped to CPUs is to avoid the need to process WAFL messages serially. Parallel operations give ONTAP more flexibility to perform better. ONTAP was originally created for the single-CPU systems of 1994. There was parallelism, but each subsystem in ONTAP was mapped to a domain, and all of WAFL was serialized. Systems now have multiple CPUs. The NetApp A800, for example, sports 2 CPUs per node (4 per chassis) with 24 cores for each CPU! With the ability to use more CPU, ONTAP had to evolve to keep up.
Classical Waffinity (2006)
In 2006, ONTAP started to leverage something called “file stripes” that rotated over a set of message queues called “stripe affinities.” An affinity scheduler would dynamically assign affinities to CPU threads, while a serial affinity would process work outside of file stripes.
However, any serial processing is going to eventually be a bottleneck, so ONTAP had to evolve again.
Hierarchical Waffinity (2011)
In 2011, ONTAP evolved again and made an adjustment to waffinity that was more fine-tuned and customized for WAFL and helped remove the serial processing from the previous waffinity. Rather than only having affinities mapped to file stripes, affinities were now also mapped to volumes. Aggregates now had their own affinities. Parallelism wasn’t just done at the read/write level, but also with metadata operations by way of volume to CPU mappings. As a result, a metadata-heavy workload would spend around half of its time in serial processing with classical waffinity, but now moved those operations to the volume affinity and increased throughput by 23%:
In addition, this change allowed workloads to use more CPU, which is a good thing – you don’t want to buy a bunch of CPU and have most of it just sitting around twiddling its thumbs. With the hierarchical waffinity changes, we saw a 95% average occupancy for CPU:
While this was a pretty substantial jump, this is NetApp – we’re never satisfied with “good enough.” We saw room for improvement, especially since not all workloads operate the same.
Hybrid Waffinity (2016)
Around the ONTAP 9 timeframe, some pretty major changes took place for ONTAP. For one, we started selling all-flash ONTAP systems and were optimizing these systems for flash, which added great performance gains. We also split the consistency points between aggregates to prevent hot aggregates from causing performance issues across the entire platform. Another area that got a makeover was waffinity.
The new waffinity model considered workloads that might access two different file blocks, such as databases. Hybrid waffinity essentially takes the hierarchical waffinity concept and combines it with fine-grained locking. Particular blocks are protected with locking from multiple affinities and allows for incremental development.
Hybrid waffinity added even more parallelization and added more cores to a workload, which provided for a 91% throughput increase over the older hierarchical waffinity for sequential overwrite operations.
These changes to waffinity were intended to improve a smaller and smaller set of workloads that were not already optimized. Essentially, ONTAP was closing gaps.
Expect more in the way of waffinity improvements in the future, but in the meantime, ONTAP also provided a clever feature that took more advantage of these waffinity improvements within a single namespace.
Enter the FlexGroup
Because FlexVols each map to a single affinity, you could always squeeze more performance out of a system by creating multiple FlexVol volumes per workload. However, this approach created administrative headaches. The FlexVol volumes all appeared as folders to file systems, required independent export policies or SMB shares, and didn’t scale easily. NetApp engineering took the creative approach that the field was using (just use more FlexVols!) and made it into a simpler, easier to consume feature called a FlexGroup volume.
The FlexGroup volume takes multiple FlexVol member volumes and obfuscates them behind a single container – ONTAP handles the file creation balancing and re-direction across member volumes. Not only does this help parallelize workloads that were still operating in serial (such as write metadata operations in Electronic Design Automation), but it also helped solve the notion of “scale-out,” busted through the 100TB FlexVol limit and provided an efficient way to handle a very large number of files in a file system.
For more information on the FlexGroup volume, see:
- NetApp FlexGroup Volumes: An Evolution of NAS
- NetApp FlexGroup Volume: Best Practices and Implementation Guide
FlexGroup volumes takes the concept of volume affinities and use the idea to the storage administrator’s benefit.
What’s better than volume affinities? More volume affinities!
Because of the resounding success of FlexGroup volumes and the very warm reception by ONTAP users/NetApp customers, the feature has become a focal point for NAS workloads at NetApp. Because waffinity has served us so well over the past 12 years and because FlexGroup volumes use them so well, it was time to take them to the next level.
Larger systems mean more CPU cores, so the most logical step was to simply increase the number of volume affinities in a system. In lower end nodes (prior to ONTAP 9.4), there were 8 affinities per node available (4 per aggregate). As a result, our FlexGroup volume best practices were set to 8 member FlexVol volumes per node. While it doesn’t necessarily hurt to have more than 8 per node, it also didn’t show a lot of benefit.
After ONTAP 9.4, larger platforms (such as the A700 and A800 series) now offer 16 affinities per node (8 per aggregate). In some of our internal testing (and soon to be external facing performance results), we’re seeing significant benefits to increasing the amount of available volume affinities. As such, we’re also going to be bumping up our best practice recommendations for EDA workloads to 16 member volumes per node.
To do this, we would need 2 aggregates per node. With AFF systems, we normally had recommended 1 aggregate per node due to the cost of disk (for instance, we need 3 disks for dual parity), but with Advanced Disk Partitioning and RAID-TEC, the costs of splitting into multiple aggregates per node is considerably lower and the benefits of using 2 aggregates to get more affinities also amends the aggregate best practices for all systems when using FlexGroup volumes to using 2 per node. We’ll be adjusting the best practice guides (TR-4571) to reflect this for the next ONTAP release.
Stay tuned for more information on FlexGroup volumes, as well as some potential new blazing fast test results!