Skip to content

Instantly share code, notes, and snippets.

@datacustodian
Last active April 17, 2017 00:13
Show Gist options
  • Save datacustodian/457cbff246d3d2fc616bc2fb96cfb406 to your computer and use it in GitHub Desktop.
Save datacustodian/457cbff246d3d2fc616bc2fb96cfb406 to your computer and use it in GitHub Desktop.
Data Transfer Notes

Data Transfer Notes

Lessons learned from transferring roughly 14TB of data onto four 4TB HDs.

  • Avoid direct data transfers across mixed platforms using hardware solutions
  • Use NTFS as the common file system across mixed OS environment
  • Use Windows to format exFAT volumes
  • Don’t let Windows touch a Mac or Linux formatted exFAT volume
  • It takes roughly 3 hours to transfer 1TB over USB 3.0 from HFS+ to exFAT using a HD caddy
  • Create a data transfer plan before you start the process

Avoid direct data transfers across mixed platforms using hardware solutions

Use software solutions like shared volumes on NAS to transfer data across different platforms if you can; hardware solutions are problem prone. But such is life; we’ve gotta play the hand we’re dealt.

Use NTFS as the common file system across mixed OS environment

Most modern Mac OS X and many Linux distros support NTFS out of the box. So NTFS is a good choice for common file system across operating systems. However, use Windows to format NTFS volumes if possible. While NTFS drivers are more reliable than say, exFAT drivers, you use Windows to format NTFS volumes to avoid complications.

Use Windows to format exFAT volumes

Microsoft designed the exFAT file system. Based on personal experience and corroborated by Google search results, exFAT volumes created by Windows can be read by Windows, Linux, and Mac OS with appropriate drivers. However, exFAT volumes created by another OS cannot be read by Windows. This is yet another example (like CSV and Unicode) where Microsoft follows the Standard when no one else does. That is, Microsoft did the theoretical right thing but the de facto wrong thing.

Don’t let Windows touch a Mac or Linux formatted exFAT volume

Windows will corrupt Linux formatted exFAT volumes. Initially exFAT volume mounts fine with Linux, unmount then attempt to mount on Windows fails. Unable to mount on Linux again with “invalid VBR checksum” error. The only HD that still mounts on Linux is also the only one untouched by Windows.

The solution (to be verified when we complete data transfer) is to exploit the first 24 sectors of exFAT structure. Sector 11 contains checksum info, and Sector 12 to Sector 23 contain periodic backups of Sector 0 to Sector 11. Somehow Windows corrupted the first 12 sectors of each affected exFAT volume, but it’s rare that all 24 sectors are corrupted.

http://www.sans.org/reading-room/whitepapers/forensics/reverse-engineering-microsoft-exfat-file-system-33274

First, make a copy of the first 24 sectors:

$ sudo dd if=/dev/sdb of=exfat_sec_24 bs=512 count=24

Examining using a hex editor, one discovers that the first 12 sectors differ from the second 12 sectors. For unaffected exFAT volumes, the first 12 sectors are identical to the second 12 sectors.

Overwrite the first 12 sectors with the second 12 sectors:

$ sudo dd of=/dev/sdb if=exfat_sec_24 bs=512 count=12 skip=12

After overwriting the first 12 sectors with data from the second 12 sectors, the corrupted exFAT volumes is “repaired” and mounted automatically. Unmount and then mount the affected volume if necessary. No more errors.

It takes roughly 3 hours to transfer 1TB over USB 3.0 from HFS+ to exFAT using a HD caddy

The theoretical throughput of USB 3.0 is 5 Gb/s; older SATA III is 3 Gb/s and newer SATA III is 6 Gb/s. 3 hours for 1 TB is roughly one-sixth of 5 Gb/s. You’d expect to get at least half of ly’n cheat’n claims of the Standard. No such luck. Your mileage will vary with HD caddy design and quality assurance.

Create a data transfer plan before you start the process

As a rule of thumb, transfer larger files first and then fill up remaining space with small files later; do not try to minimize the number of times you swap in and out HDs. Although we have four 4TB HDs and “only” 14 TB of data, I realized upon copying the second HD that it might be possible to run out of storage space if I were not careful allocating files to HDs. Two reasons for this. First, each HD has 4 ly’n cheat’n TBs (HD manufacturers define KB as 1000 bytes and not 1024 bytes) and after formatting it’s only 3.7 TB. Secondly, among the files transferred there are three 2TB files. Do the math and you’ll realize these three files must go on three separate HDs. Again, plan ahead before you start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment