sg-s/data-falsehoods.md

## data-falsehoods.md

      
    Raw
  

              data-falsehoods.md
            
          
    The data exists

If data is split across multiple files that are sequentially numbered, then it is foolish to beleive that every file exists.
Some files can go missing, or be corrupted.
Files with the same name and the same size contain the same data

I learnt the hard way that one version can be corrupted, and the other version is fine, and it's all too
easy to replace to good version with the corrupted version
The sampling frequency is going to be the same across a data set

Yeah it can change because a fixed sampling rate would make your analysis too easy
The sampling frequency is going to be some round number

Did I expect the sampling frequency to be some multiple of a microsecond? Yes. Was it, in reality? No. In some random subset of the data, it can be 1.0004 microseconds.
Metadata exists

Ha ha jokes on you who needs metadata.
Metadata is accurate

If you find a channel that is labelled "temperature", do not assume that it measures temperature. It could be literally anything.
Channels have a well-defined number

Just for fun, the data collector can decide to add some data channels. Or throw some away.
Channels have a well-defined name

Remember that channel called "temperature"? It can be accurate in the first half of the data, but wrong in the second half.