While FabricPool is not brand new, NetApp has done a little bit more of a push on it recently with their emphasis on cloud. As more and more people are becoming cloud-aware, whether it’s public or private, keeping the cost of performance-tier storage manageable is always a relevant conversation.
I should start off by saying FabricPool isn’t for everyone, as it only works on all-SSD aggregates. You can enable it on a FAS or an AFF but the aggregate can’t have any spinning drives on it. If you’ve got an all-SSD aggregate and you’re putting all this data in it because it has to be “performance-tiered,” a lot of times, you’re wasting that space because so much of that data is inactive. If your environment fits into this category, one of the things you can do instead of buying additional shelves for your NetApp storage is to use FabricPool.
Basically, what FabricPool does is it takes all of the cold data off of the performance tier and moves it to the “capacity tier” that lives out in Azure, AWS, or in a StorageGRID appliance. It could save you the hassle of having to buy shelves over and over again. It also saves you money in the long run for environments with high growth in the performance tier.
Here are a few things to think about if you’re considering FabricPool.
Defining your own policies
There are 4 policies that you can apply on the aggregate to determine when and how frequently data gets moved from the performance tier to the capacity tier. First is Snapshot-only. If a block hasn’t been accessed within two days, then it gets tiered out to the cloud. Next, there’s Auto, which is going to be the most likely policy for most environments. The default on Auto is 31 days. When a block hasn’t been accessed for 31 days and it is not part of the active file system, it gets moved off. You can adjust the timing to anywhere between 2 and 63 days. Next is the backup policy, which is primarily used for DR (more on that later). For the backup policy, you need to have a non-production SnapVault or SnapMirror destination volume. There’s no temperature scan required for this policy, everything just gets sent to the capacity tier. If you want your data immediately moved to the capacity tier you need to do a vol move. Finally, the fourth policy is none. This would be applied to a volume that you don’t want tiered at all.
Using FabricPool for DR
In most cases, if you have a 25TB system in production, you’d have at least a 25TB system in DR, if not more, so you can retain more data in DR. But the data in a DR environment is rarely ever used. With FabricPool, you could have a less than 1:1 ratio. You can have 25TB in production but only 10TB in DR and the rest of it is just tiered off to capacity because DR data is all cold. Instead of managing a DR box in a datacenter that has 25 shelves in it, you could use FabricPool for the DR system, then you would only need 5. That’s less rack space, cooling, and ultimately, less cost and maintenance.
But what happens when I lose connection?
If you’re running your performance tier and you have a loss of internet, or AWS, Azure, or StorageGRID goes down, what happens? If the application is only requesting blocks from the performance tier, no one would be the wiser. If, for some reason, someone is trying to access a cold block, the answer is: it depends. Depending on the protocol they’re using, they could end up with a “status: internal error” message. Then, once the internet comes back on, or that connection gets re-established to the capacity tier, the files will be accessible. For NFS, it depends on the version (Version 3 and Version 4 have different errors that it would send out).
Block protocols would probably give people the most pain in the event of an outage. If a connection gets cut off and an app or user needs to access an inactive, cold block, then they’ll get an “unrecoverable read/write” error. It’s most likely that they’ll have to restart the entire application before the app starts working again. You definitely don’t want that to happen, but it depends on if you’re okay with that or not.
Some important considerations on capacity usage
Any aggregate can’t have more than a 1:20 ratio from the performance tier to the capacity tier. If you have an 800TB volume in the performance tier, you can’t go any more than 16PB total in capacity. Once it hits that 20X ratio, FabricPool will basically say you’ve got too much data in tiering and we’re cutting you off.
When you want to reclaim something back down to your performance tier, it doesn’t need the whole object. When it’s read as active in the performance tier, it’s marked as inactive on the capacity tier because it’s no longer cold. If it’s a 4MB file that’s fragmented, for example, you may only be using 2MB. That whole file is taking up space in your S3 bucket that you’re getting charged for. That’s why you want to adjust these numbers depending on how often you’re pulling data down. Once that 4MB file has a certain amount of free space, it rewrites that file to save you space. And you can adjust that pull to whatever you want.
If your performance tier is 70% used, when you read a cold block out of the capacity tier, it doesn’t write it and mark it as active in the aggregate on the performance tier. Unless you change your policy to none and forcefully bring it back, once the performance tier hits 70%, you’re stuck in the cloud until you can free up space whether by adding another SSD shelf or freeing up that aggregate before it starts bringing back active blocks. If you notice a performance issue on some of the blocks, it could be because they’re stuck. Every time you’ve got to go access that block, it’s always got to go to Azure to access it because your aggregate usage is over 70%.
A word about licensing
If a destination target is StorageGRID, you don’t need a license because it’s a NetApp product. You only need licensing if you’re running Azure or AWS. Then, the first 10TB are free, and that offer is good until Q4FY19. The rest is quoted out on a per-terabyte basis.
Typically, with NetApp, you get a license key for any of the software features that you want to enable. With FabricPool, it’s a little different. It’s licensed per cluster, so you have to set up the cluster first. You can purchase it, but they won’t give you a license until you set up the cluster. Then once you have the cluster running, you just do a “cluster identity show,” get the UUID, and then they will give you a license file that’s particular to that cluster.
NetApp is the only storage provider that does anything like this, and they’re giving away the first 10TB for free right now. If you’re buying anything new, you can tag this on as a new purchase for free. It’s definitely a differential play for NetApp