I struggle with this too.
For something like a NAS there are a number of layers each with multiple factors, I don't yet understand them all.
For most of the benchmarks I run I use pv, iotop and iftop. I often use pv to test sequential read/write on drives and within filesystems which helps determine the maximum capability of drives/raid arrays/filesystems/etc even remotely over varios protocols like smb/nfs. top, iotop and iftop allow you to monitor what's going on while tests are going on which provides some insight as well. I usually follow this up with a more real-world test like moving a large number of files via rsync.
The part I struggle with is often how to actually improve or figure out what the bottleneck is exactly. My real-world speeds tend to be less than half the top benchmark, likely due to filesystem overhead and random vs. sequential read/write. I haven't come up with a good way to quantify exactly how those factors interact or how to improve them much.