mcowger/storagethoughts.md

## storagethoughts.md

      
    Raw
  

              storagethoughts.md
            
          
    Start with the second half of this article  to understand how they differ:
https://www.ontrack.com/blog/2018/02/22/the-evolution-of-storage-file-storage-vs-block-storage-vs-object-storage-part-1/
When it comes to how to evaluate, it varies by system:
Block:  (Sometimes called SAN, but thats not really correct) Usually people here will focus on a couple things:

Raw Performance:  How many operations per second can I execute?   Usually called ‘IOPS’, we are talking ranges of a couple thousand for low end home-type systems, all the way up to millions of operations per second for high end systems.  How many you need depends on your application
Latency: How long does each request take...on average, on 95th percentile?   This is dominated by the access method (direct attach, iSCSI, Fiber Channel, NVME, etc) and the type of media (spinning drives, traditional flash, NVME flash, etc).   Usually we are talking about 20ms on the poor-performance end, to 200us (200 micro seconds or 0.2ms) for the very fastest systems.
Protection Options: Are we using clever techniques to protect data?  Standard RAID methods of simply smearing data across the physical devices, or something more clever (but also slower) like erasure coding?    Do we have the ability to replicate the data to another indepdenent system?  With what kind of guarentees?  Do we guarentees that all writes are captured before ackownledging, or not?   Do we allow for a 5s loss of data?  5 minutes?   5 hours?
Features: Do we support compression of the data?   Deduplication?   Clever integrations with host software/databases/etc to ensure workloads are safe?  Do we support snapshots of the data?   Can we offload those to cheaper storage (like spinning disk or S3?)

When it comes to a focus on performance, block is almost always the winner.   Generally these systems are 'transactional' - meaning data are modified block by block, not entire volumes at a time.   Usually the most $$$/GB.  This is where you run your high end database.  Common examples of these systems: Dell's SC, VMAX/PowerMax, and VNX systems, Pure Storage FlashBlade, VSAN (sort of), ScaleIO, etc.
File/NAS:
In this model, the storage system itself manages the filesystem, so we have a different focus.

Compatability: Can we use it with NFS?   SMB?  SMB3?  AFS?
Metadata: Can we index or otherwise evaluate the contents of the files on the system?   Can we use that to make decisions about how to protect and store the data?
Performance: usually these systems focus less on pure latency and more on throughput - how many GB/sec can I pump through it?.  Latencies often in the 3-200ms time frames.
Scale: Can the system scale beyond the basic 2 controller design?   Can I get to a Petabyte of data?   exabytes? What happens to performance when I do?
Protection: See above for block - how are the data protected?
Features: Compression, automatic snapshots, metadata indexing, duplicate file detection, etc are all common features here.

These systems may be transactional, and usually support modifying just part of a file if needed.  This is where you store user files, or maybe even more critical data that doesn't need sub millisecond access times.  Examples include NetApp Filers, Isilon, etc.
Object:
In this model, usually we dont allow for transactional work (e.g. you pull down the whole object, modify, and reupload).   These systems are usually focused on scale, durability and ease of access/management first, with performance a distant seconds.

Compatability: usually a mininum of compat. with the S3 protocol (HTTP based), but also might support others like OpenStack Swift, or even a 'shim' layer like NFS or SMB support.
Metadata: Usually very rich, configurable metadata options available.   Its usually possible to set arbitrary HTTP-Header-like tags on objects, so you can query them without actually downloading the data.
Scale: Anything in this space that couldn't scale to at least 1PB would be a joke, where as 1PB for file and block are at the top end of the range.
Durability: Systems like this generally provide something on the order of a 10+ 9's durability rating - its vanishingly unlikely you'll lose data short of total destruction of the system itself.   These systems are also really good at global distribution of data to prevent that even being a problem.
performance: Comparatively, these systems are slow.   Think 100ms minimum access times, all the way up to 1 second even.

These tend to be used for bulk data with low IO intensity.  Think things like your Google Photos library, X-ray images in the hospital, your facebook profile photo, etc.    For those systems, 100ms is MORE than faster enough, and so we prefer the scale.  Usually the least $$$/GB.