CD and DVD imaging and quality control notes
Beware that these notes were primarily written for personal use to keep track of my tests with various imaging tools. They are also pretty unorganised, and need cleaning up at some point. All mentioned tools tested under Linux (Mint); many of them can be used under Windows using Cygwin.
Find the path to the CD drive
/dev/sda1 on / type ext4 (rw,errors=remount-ro) /dev/sr0 on /media/johan/REBELS_0 type iso9660 (ro,nosuid,nodev,uid=1000,gid=1000,iocharset=utf8,mode=0400,dmode=0500,uhelper=udisks2)
So, CD drive is
Path on Windows (Cygwin)
In my case
/dev/scd0 worked fine (internal drive);
/dev/scd1 for external USB drive.
Information about the CD / DVD
lsblk. E.g. to get only the label name:
lsblk /dev/sr0 -n -o LABEL
Get size in bytes:
lsblk /dev/sr0 -n -o SIZE -b
Image CD-rom with dd
This is the simplest tool, mainly suitable for CDs/DVDs that are not damaged. Command-line:
dd if=/dev/sr0 of=mydisk.iso
1236416+0 records in 1236416+0 records out 633044992 bytes (633 MB) copied, 255,874 s, 2,5 MB/s
Image CD-rom with readom
readom dev=/dev/sr0 f=mydisk.iso
Error trying to open /dev/sr0 exclusively (Device or resource busy)... retrying in 1 second.
Then repeat. Works now! Output:
Read speed: 4234 kB/s (CD 24x, DVD 3x). Write speed: 0 kB/s (CD 0x, DVD 0x). Capacity: 309104 Blocks = 618208 kBytes = 603 MBytes = 633 prMB Sectorsize: 2048 Bytes Copy from SCSI (10,0,0) disk to file 'gordi_virussen_readom.iso' end: 309104 addr: 309104 cnt: 44 Time total: 259.287sec Read 618208.00 kB at 2384.3 kB/sec.
retries may be needed to avoid excessive processing times in case of damaged blocks - default is 128.
Image CD-rom with ddrescue
Following Blood Report (page 11):
ddrescue -b 2048 -r4 -v /dev/sr0 REBELS_0.iso REBELS_0.log
-b 2048: block size 2048 bytes
-r4: retry bad sectors up to four times
-v: verbose output mode
From the online help:
Exit status 0 for a normal exit, 1 for environmental problems (file not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or invalid input file, 3 for an internal consistency error (eg, bug) which caused ddrescue to panic.
In production environment, command line should be added to metadata (PREMIS event).
# Rescue Logfile. Created by GNU ddrescue version 1.17 # Command line: ddrescue -b 2048 -r4 -v /dev/sr0 REBELS_0.iso REBELS_0.log # current_pos current_status 0x28AF0000 + # pos size status 0x00000000 0x28AFA800 +
The log file structure is explained here. So in this case everything went fine. In production environment, contents of log file should be added to metadata.
Image CD-rom with dcfldd
Alternatively use the
dcfldd bs=2048 if=/dev/sr0 of=Handbook_dcfldd.iso errlog=Handbook_dcfldd.log hashlog=Handbook_dcfldd.md5
In my tests, activating the
errlog option gave a segmentation fault at the end of the imaging process (the tool does write an intact ISO file; although both the log file and the hash log are empty). Looks like a bug. (Using version 1.3.4-1, which is from 2006; apparently no more recent versions exist?!)
Also, from the Precautions section of forensicswiki.org (link here):
- dcfldd is based on an extremely old version of dd: it's known that dcfldd will misalign the data in the image after a faulty sector is encountered on the source drive (see the NIST report), and this kind of bug (wrong offset calculation when seeking over a bad block) was fixed for dd in 2003 (see the fix in the mailing list);
- similarly, dcfldd can enter an infinite loop when a faulty sector is encountered on the source drive, thus writing to the image over and over again until there is no free space left.
So this may not be such a great option after all ...
Roundup of image tool results
All of the above tools are usable, but readom is probably the best options here, followed by ddrescue. The potential advantage of readom is that it uses a library that was specifically written for dealing with optical media, so it might be a bit "smarter" in some respects than ddrescue, which is more generic. On the other hand, ddrescue may often be a better choice CD-ROMs that are damaged (in which case readom gives up easily).
Verify ISO against physical CD
Run checksums on both CD and image, e.g.:
md5sum REBELS_0.iso md5sum /dev/sr0
Note that readom already does this (dd doesn't). Actually I don't think readom does this either! Needs reference.
Verify ISO image with isovfy
Part of isoinfo. Command line:
Root at extent 13, 2048 bytes [0 0] No errors found
So everything OK. Here's an example where the verification fails on an incomplete ISO file:
Root at extent 16, 2048 bytes [0 0] isovfy: Short read on old image
In a production environment, both isovfy command line and its output should be added to metadata (PREMIS event).
Detecting incomplete or truncated ISO files (experimental)
The isovfy documentation isn't very clear what specific checks it performs. In some of my tests I encountered broken (incomplete) ISO images, which were not detected by isovfy. More info here:
This prompted me to write a simple verification script that calculates the expected file size of an ISO from its Primary Volume Descriptor, and compares this against the actual size of the file. It is available here:
Get information about ISO image with isoinfo
Isoinfo has quite a number of options; here are the ones that look most useful.
Information from the primary volume descriptor
isoinfo -d -i REBELS_0.iso
CD-ROM is in ISO 9660 format System id: Volume id: REBELS_0 Volume set id: Publisher id: Data preparer id: Application id: NERO - BURNING ROM Copyright File id: Abstract File id: Bibliographic File id: Volume set size is: 1 Volume set sequence number is: 1 Logical block size is: 2048 Volume size is: 333151 Joliet with UCS level 3 found NO Rock Ridge present
The same information can also be extracted directly from the CD-rom (i.e. prior to the imaging process) using:
isoinfo -d -i /dev/sr0
This could be useful for automating some of the ddrescue input, such as block size and volume id (which can be used as a base name for the iso file).
Get label (AKA volume ID) from CD/DVD
blkid -o value -s LABEL /dev/sr0
(Note that this will also output the device name, but this is sent to stderr).
isoinfo -f -i REBELS_0.iso
/AUTORUN.EXE;1 /AUTORUN.INF;1 /DISK0 /LICENSE2.TXT;1 /LICENSEF.TXT;1 /LICENSEU.TXT;1 /SETUP.EXE;1 /DISK0/CONTROLS.CFG;1 /DISK0/DISK0;1 :: :: etc
It looks like all items that are followed by
;1 are files, and those that aren't are directories. Also, the
-loption gives a detailed dir listing that includes additional file attributes (size, date, etc.).
How to verify integrity of image file (comparison against source medium)?
Audio CD, readom
readom dev=/dev/sr0 f=hooked.bin -clone
-clone option "retrieve(s) the full TOC and all data".
Read speed: 4234 kB/s (CD 24x, DVD 3x). Write speed: 0 kB/s (CD 0x, DVD 0x). TOC len: 180. First Session: 1 Last Session: 1. 01 10 00 A0 00 00 00 00 01 00 00 01 10 00 A1 00 00 00 00 0D 00 00 01 10 00 A2 00 00 00 00 47 13 21 01 10 00 01 00 00 00 00 00 02 00 01 10 00 02 00 00 00 00 05 08 0A 01 10 00 03 00 00 00 00 0B 32 2D 01 10 00 04 00 00 00 00 11 35 0A 01 10 00 05 00 00 00 00 17 33 2D 01 10 00 06 00 00 00 00 1D 15 05 01 10 00 07 00 00 00 00 22 21 0F 01 10 00 08 00 00 00 00 26 2A 3F 01 10 00 09 00 00 00 00 2B 08 28 01 10 00 0A 00 00 00 00 30 06 46 01 10 00 0B 00 00 00 00 35 38 0A 01 10 00 0C 00 00 00 00 3A 05 21 01 10 00 0D 00 00 00 00 40 2F 46 Lead out 1: 320808 Capacity: 320808 Blocks = 641616 kBytes = 626 MBytes = 657 prMB Sectorsize: 2048 Bytes Errno: 5 (Input/output error), mode select g1 scsi sendcmd: no error CDB: 55 10 00 00 00 00 00 00 14 00 status: 0x2 (CHECK CONDITION) Sense Bytes: 70 00 05 00 00 00 00 0A 00 00 00 00 26 00 00 00 Sense Key: 0x5 Illegal Request, Segment 0 Sense Code: 0x26 Qual 0x00 (invalid field in parameter list) Fru 0x0 Sense flags: Blk 0 (not valid) cmd finished after 0.002s timeout 40s Copy from SCSI (10,0,0) disk to file 'hooked.bin' end: 320808 addr: 320808 cnt: 80 Time total: 264.775sec Read 766931.62 kB at 2896.5 kB/sec.
Result: 1 toc file + 1 bin file. Optionally convert toc to cue format:
cueconvert hooked.bin.toc hooked.bin.cue
Doesn't work, apparently no standard toc file?
Audio CD, cdrdao
CD-I (Green Book), cdrdao
Bin file seems to work:
cdrdao read-cd --read-raw --datafile histopathologie_cd_i.bin --device /dev/sr0 --driver generic-mmc-raw histopathologie_cd_i.toc
Reading toc and track data... Track Mode Flags Start Length ------------------------------------------------------------ 1 DATA 4 00:00:00( 0) 48:39:00(218925) Leadout DATA 4 48:39:00(218925) PQ sub-channel reading (data track) is supported, data format is BCD. Raw P-W sub-channel reading (data track) is supported. Cooked R-W sub-channel reading (data track) is supported. Copying data track 1 (MODE2_RAW): start 00:00:00, length 48:39:00 to "histopathologie_cd_i.bin"... Reading of toc and track data finished successfully
Resulting image is slightly larger than aize of CD according to lsblk command. Both how to use it? Readom and ddrescue both fail at imaging these discs at all.
Multi-track / mixed content CDs
These are CDs with one data track and multiple audio tracks.
Make bin / cue file
Tried to use cdrdao to create bin / cue files, following these instructions. Before doing anything with cdrdao we first need to unmount the disc using[^1]:
Then run cdrdao with the following command line:
cdrdao read-cd --read-raw --datafile no.bin --device /dev/sr0 --driver generic-mmc-raw no.toc
Result: image successfully created; output .toc and .bin file. The*.toc* file looks like this:
CD_DA // Track 1 TRACK AUDIO NO COPY NO PRE_EMPHASIS TWO_CHANNEL_AUDIO ISRC "USIR70200001" FILE "no.bin" 0 02:10:53 // Track 2 TRACK AUDIO NO COPY NO PRE_EMPHASIS TWO_CHANNEL_AUDIO ISRC "USIR70200002" FILE "no.bin" 02:10:53 02:17:34 :: etc
But closer inspection shows that only the audio tracks were copied, not the data track! E.g. compare the above output with below example from thecdrdao documentation:
CD_ROM TRACK MODE1 DATAFILE "data_1" ZERO 00:02:00 // post-gap TRACK AUDIO SILENCE 00:02:00 // pre-gap START FILE "data_2.wav" 0 TRACK AUDIO FILE "data_3.wav" 0
In particular I would expect the .toc file to start with
CD_ROM, and I would also expect one
TRACK MODE1 item. So why is this?
Also, cdrdao's output included this message (all output sent to stderr, not stdout):
Found 61 Q sub-channels with CRC errors. Reading of toc and track data finished successfully.
Assuming that this refers to cyclic redundancy check errors (which are similar to checksums), I'm assuming here that this implies some data errors. Not quite sure how to interpret this in terms of how serious this is. In any case, for (non-mixed) audio CDs cdparanoia looks like a better choice.
Conversion between CUE and TOC
The cdrdao man page mentions that it can read CUE files, whereas it actually writes TOC files. Most emulation-related refs only mention CUE, which appears to be more common. Conversion between the two can be done with the cuetools package. To install it:
sudo apt-get install cuetools
Then convert with the cueconvert tool. To convert a TOC file to CUE:
cueconvert no.toc no.cue
The resulting CUE file looks like this:
FILE "no.bin" WAVE TRACK 01 AUDIO ISRC USIR70200001 INDEX 01 00:00:00 TRACK 02 AUDIO ISRC USIR70200002 INDEX 01 02:10:53 :: etc
Get info about CD/DVD
This also useful (again only works after unmounting the disc first):
cdrdao disk-info --device /dev/sr0
CD-RW : no Total Capacity : n/a CD-R medium : n/a Recording Speed : n/a CD-R empty : no Toc Type : CD-DA or CD-ROM Sessions : 2 Last Track : 18 Appendable : no
Result in case of CD-I:
CD-RW : no Total Capacity : n/a CD-R medium : n/a Recording Speed : n/a CD-R empty : no Toc Type : CD-I Sessions : 1 Last Track : 1 Appendable : no
Question: what happens if we do this with a DVD?
Finally I tried to image the same CD using ddrescue, just to see what happens. I had to interrupt ddrescue manually because it stopped making any progress after some time. Interestingly, isovfy didn't find any errors in the resulting image, whereas
isoinfo -d resulted in:
CD-ROM is NOT in ISO 9660 format
Which makes me wonder at how thorough the check by isovfy really is!
How to make imaging of mixed content CDs work correctly with cdrdao?
How to render a BIN/CUE image in e.g. an emulated environment?
Isoinfo on audio CD:
isoinfo -d -i /dev/sr0
isoinfo: Input/output error. Read error on old image
cdrdao disk-info --device /dev/sr0
CD-RW : no Total Capacity : n/a CD-R medium : n/a Recording Speed : n/a CD-R empty : no Toc Type : CD-DA or CD-ROM Sessions : 1 Last Track : 13 Appendable : no
Rip audio CD with cdparanoia
sudo apt-get install cdparanoia
Rip CD in batch mode, each track to a separate WAV file:
cdparanoia -B -L
cdparanoia -B -l
-L switch results in the generation of a detailed log file;
-l produces a summary log (name cdparanoia.log). Strangely a parse error occurs when I specify user-defined file names here. Also, it seems the summary file gives more detailed info than the detailed one? Needs more in-depth look!
- How are ripped files foreseen to be rendered (play CD as a whole vs separate tracks)? Also needs metadata on track order, perhaps media player play lists.
Automatic disc type detection
From StackOverflow - Determine optical media type (Audio CD, DVD, blu-ray) by using UDEV and scripts.
First install package cdtool:
sudo apt-get install cdtool
cdir -d /dev/sr0
Output for cd-rom:
unknown cd - 34:32 in 1 tracks 34:30.35 1 [DATA]
Output for audio CD:
unknown cd - 71:19 in 13 tracks 5:06.10 1 6:42.35 2 :: etc
Output for mixed content audio/data CD:
unknown cd - 37:52 in 18 tracks 2:10.53 1 2:17.56 2 :: :: 5:12.36 17 1:29.63 18 [DATA]
Output for DVD:
So the output of the tool makes it possible to differentiate between these 4 disc types. This suggests that it would be feasible to build a workflow that automatically picks the best imaging sub-workflow, based on whether the inserted disc is a cd-rom, audio CD, mixed content CD or DVD.
Rescuing a CD-ROM / DVD with read errors
In this case use ddrescue as it is much more resilient in case of read errors. Also, even if a CD-ROM gives read errors, it may be possible to recover additional data by re-running ddrescue using multiple CD drives. The ddrescue manual gives an example:
Example 3: Rescue a CD-ROM in /dev/cdrom using two CD drives from two different computers, writing the image into an USB drive >nounted on /mnt/mem.
ddrescue -n -b2048 /dev/cdrom /mnt/mem/cdimage /mnt/mem/mapfile ddrescue -d -r1 -b2048 /dev/cdrom /mnt/mem/cdimage /mnt/mem/mapfile (umount the USB drive and move both USB drive and CD-ROM to second computer) ddrescue -d -r1 -b2048 /dev/cdrom /mnt/mem/cdimage /mnt/mem/mapfile (if errsize is zero, /mnt/mem/cdimage now contains a complete image of the CD-ROM and you can write it to a blank CD-ROM)
[^1]: If you don't do this you will get the error message
ERROR: Unable to open SCSI device /dev/sr0: Device or resource busy.