Notes & Tools

Notes on ZFS snapshot retention

There are roughly two camps for ZFS snapshot retention. One is “snap everything, keep forever, the dedup will save you.” It will not. The other is “snap on a schedule, prune by age.” That one works.

My current zfs-auto-snapshot config:

frequent   keep=4   every 15 minutes
hourly     keep=24
daily      keep=14
weekly     keep=8
monthly    keep=6

Total worst case is 4 + 24 + 14 + 8 + 6 = 56 snapshots per dataset. With recordsize=128K and reasonable churn, that’s a few hundred MB of overhead per snapshot, not a few GB. The trap is the frequent tier: if you have a noisy temp directory inside a snapped dataset, each 15-minute snap pins a whole chunk of writes.

The fix is zfs set com.sun:auto-snapshot=false on tank/var/cache, tank/var/log (if you don’t need it snapped), tank/tmp, and any build-output dataset. It’s the property pattern, not a global exclude — you can flip it per dataset and the agent respects it.

The rule of thumb: a snapshot you cannot identify in 10 seconds (“what state of the world is this?”) is not a backup, it’s clutter. Keep fewer, keep them on a schedule, and always name your manual ones (zfs snapshot -r tank/home@before-upgrade-2024-11-23).