Skip to content

Instantly share code, notes, and snippets.

@mrdaemon
Last active June 25, 2024 15:09
Show Gist options
  • Save mrdaemon/925f149da05b92cf3aea3f3a38f9db3e to your computer and use it in GitHub Desktop.
Save mrdaemon/925f149da05b92cf3aea3f3a38f9db3e to your computer and use it in GitHub Desktop.
Archiving CDROMs on Linux, A Quick and Dirty Guide

A quick and dirty guide to dumping images of your CDs on Linux

https://untrusted.website/@mr_daemon

This roughly describes the process I use to dump my old CDs to image files while attempting to retain as much of the original data as possible.

This covers mostly data cds such as games and software, and has mixed results with some copy protections, even if you dump subchannel data.

Some old protection schemes like SafeDisc and SecuROM are not really usable from image formats most things expect, so YMMV.

All the formats below are intended to be lossless and are suitable for usage in things like DOSBox or 86box, or even to be written back to a CD one day, if you find new old stock.

Hardware

I'm using a cheap TSSTcorp usb DVD writer attached to my Linux laptop. This is the same controller in every chinesium drive available on Amazon for the price of a fancy sandwich, and it will biodegrade if you expose it to direct sunlight. If you have better, feel free to use it.

Software

All of this is likely available in your distribution repositories. Sometimes some are packaged together, even.

Hint: on Debian-likes, apt-cache search should help you find them.

  • disktype for identifying the type of CD
  • cdrdao for dumping hybrid images or anything with a weird layout
  • readom for dumping straightforward ISOs, as it does checks
  • ddrescue for attempting to salvage degraded or otherwise beat up CDs
  • cdparanoia for dumping audio tracks as files (optional)
  • flac for losslessly compressing the audio tracks (optional)
  • sha256sum for checksum generation/validation (optional)

If you're on Windows or Mac OS, all of this probably works but you'll be on your own to figure out the exact semantics of accessing your optical drive from cygwin or whatnot. There are likely better guides and nice frontends for this too.

Identifying the type of CD

Assuming your optical drive presents as /dev/sr0, you can use disktype to determine the type of CD you're dealing with. For standard stuff with just a data track, even if it has both Apple HFS and iso9660 sections, you can just straight up dump the iso.

For anything that is mixed mode, has a weird layout, or especially has audio track with a lead out that extends past the data track, you'll need to use cdrdao and also save a TOC or CUE file that documents how to piece it together.

Example of a traditional data CD that can be just dumped with readom:

$ disktype /dev/sr0

--- /dev/sr0
Block device, size 241.9 MiB (253599744 bytes)
CD-ROM, 1 track, CDDB disk ID 02067301
Track 1: Data track, 241.9 MiB (253599744 bytes)
  ISO9660 file system
    Volume name "AFTERLIFE"
    Preparer    "OPTICAL MEDIA QUICKTOPIX 2.20"
    Data size 241.3 MiB (252981248 bytes, 123526 blocks of 2 KiB)

Example below of a hybrid cd that supports both Mac and Windows. This can also just be straight dumped with readom and will absolutely contain everything.

$ disktype /dev/sr0

--- /dev/sr0
Block device, size 598.1 MiB (627159040 bytes)
CD-ROM, 1 track, CDDB disk ID 020FF301
Track 1: Data track, 598.1 MiB (627159040 bytes)
  Apple partition map, 2 entries
  Partition 1: 1 KiB (1024 bytes, 2 sectors from 1)
    Type "Apple_partition_map"
  Partition 2: 384.9 MiB (403588608 bytes, 788259 sectors from 436051)
    Type "Apple_HFS"
    HFS file system
      Volume name "Adobe<A8> Photoshop<A8> 5.0 LE"
      Volume size 384.9 MiB (403578880 bytes, 49265 blocks of 8 KiB)
  ISO9660 file system
    Volume name "PHOTOSLE"
    Application "TOAST ISO 9660 BUILDER COPYRIGHT (C) 1997 ADAPTEC, INC. - HAVE A NICE DAY"
    Data size 597.8 MiB (626847744 bytes, 306078 blocks of 2 KiB)
    Joliet extension, volume name "PHOTOSLE"

Example of a mixed mode CD that needs cdrdao since it contains audio tracks:

$ disktype /dev/sr0

--- /dev/sr0
Block device, size 624.8 MiB (655173632 bytes)
CD-ROM, 11 tracks, CDDB disk ID A010A90B
Track 1: Data track, 106.6 MiB (111755264 bytes)
  ISO9660 file system
    Volume name "QUAKE101"
    Data size 105.7 MiB (110837760 bytes, 54120 blocks of 2 KiB)
Track 2: Audio track, 51.92 MiB (54437040 bytes),   5 min 08 sec
Track 3: Audio track, 24.56 MiB (25756752 bytes),   2 min 26 sec
Track 4: Audio track, 84.16 MiB (88244688 bytes),   8 min 20 sec
Track 5: Audio track, 61.47 MiB (64454208 bytes),   6 min 05 sec
Track 6: Audio track, 74.79 MiB (78425088 bytes),   7 min 24 sec
Track 7: Audio track, 87.18 MiB (91415184 bytes),   8 min 38 sec
Track 8: Audio track, 56.51 MiB (59256288 bytes),   5 min 35 sec
Track 9: Audio track, 65.43 MiB (68605488 bytes),   6 min 28 sec
Track 10: Audio track, 35.76 MiB (37495584 bytes),   3 min 32 sec
Track 11: Audio track, 53.40 MiB (55991712 bytes),   5 min 17 sec

Simple ISO dump

For a straightforward ISO dump, you can use readom:

$ readom retries=4 dev=/dev/sr0 f=imagename.iso

This will save the entire CD to imagename.iso, with minimal error correction and retries. You can try to increase the number of retries, but it is likely to not help much. If this fails early due to read errors, you can try ddrescue.

Dumping a degraded or damaged CD

$ ddrescue -b 2048 -r4 -v /dev/sr0 imagename.iso imagename.map

We read using 2048 byte blocks, which is standard for CDs, with 4 retries. The peculiarity here is the map file, which keeps tracks of what was read and what wasn't. This allows you to try to read only the missing bits again, either after cleaning the media, or in a different drive.

To do another pass, simply run it again with the same iso and map file.

There are a few options you can twiddle to do further passes, but this is beyond the scope of this document.

Mixed mode CD dump (data + audio tracks)

For a mixed mode CD, you'll need to use cdrdao to dump the audio tracks and the data track separately. It will also generate a TOC file for you that describes the layout of the tracks, so it can be reproduced if you ever write it back to a CD.

 $ cdrdao read-cd --read-raw --driver generic-mmc:0x20000 --device /dev/sr0 --datafile imagename.bin imagename.toc

Since TOC files, while great, are not really supported by anything of interest these days beyond cdrdao itself, you can also use the toc2cue tool to convert the TOC file to a CUE file that is more widely supported. This tool is part of the cdrdao package and so you should have it already by this point.

$ toc2cue imagename.toc imagename.cue

I would recommend keeping both the TOC and CUE files, as the TOC file is more accurate and complete, but the CUE file is more widely supported.

IMPORTANT: By default, the adio tracks will be saved with some whack1 byte order that is also not really understood by anything either except cdrdao, again. Make sure to pay special attention to the --driver option and especially the 0x20000 parameter, otherwise the audio tracks will come out as static garbage when played back or written from the CUE file. You could omit the parameter to store an image that is very accurate but also of very limited use, practically. Unless your thing eats TOC files, this is probably not what you want.

Dumping Audio Tracks as Files

If you want to dump the audio tracks as files, you can use cdparanoia. I tend to do this to also keep these as useful files for listening to or whenever a modern source port of a game can use them directly for the soundtrack.

Just make sure to not dump the first track, which contains data. We can do this by specifiying the range like so:

$ mkdir audio
$ cd audio/
$ cdparanoia -B "2-"

You'll end up with a directory full of WAV files. If you want, you can losslessly compress them to flac (or something else) and keep that instead.

$ flac *.wav
$ rm -f *.wav

Saving checksums

To ensure the dumps remain fresh and uncorrupted, I would recommend saving checksums of the images. I use SHA256 because we're in the modern era.

$ sha256sum * | tee SHA256
fa723eabc28fd8fdc3333034b70c7a9f459608b40af119b137dd64f6bceadc57  quake106.bin
a78aa83f731affbb886ae831598016e5369cc28586baf1235c0f55dda09c1496  quake106.cue
422b90fc21a30f9b0ba2a57f8866122070643bbf4fd1eaaff656dcfbc6695570  quake106.toc

This can be later on verified like so:

$ sha256sum -c SHA256
quake106.bin: OK
quake106.cue: OK
quake106.toc: OK

All in one extravaganza

$ cdrdao read-cd --read-raw --driver generic-mmc:0x20000 --device /dev/sr0 --datafile imagefile.bin imagefile.toc && toc2cue imagefile.toc imagefile.cue && mkdir audio && pushd audio && cdparanoia -B "2-" && flac -8 track*.wav && rm -f *.wav && popd && (sha256sum * | tee SHA256) && eject

Hope this helps! Don't forget to upload cool weird stuff you find to the internet archive if it isn't already there, someone's trash driver CD is another's treasure.

Footnotes

  1. By default, it's Big-Endian, just like how physical CDs present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment