klange/PDF ISO Hybrid.md

## PDF ISO Hybrid.md

      
    Raw
  

              PDF ISO Hybrid.md
            
          
    Making a PDF + Bootable ISO Hybrid Polyglot

If you've not seen my résumé, that is what this post is about, and you should probably look at it first to get an idea of what's happening.
A Tale of Two Files

PDF and ISO9660 are two very different file formats. PDF is actually a surprisingly human-readable (but, sadly, not very human-writable) plain-text format, technically descended from PostScript. PDF is not very forgiving on changes, as it stores tables of offsets to various elements, so it is very difficult to insert or remove content manually without going through a robust PDF manipulation library.
ISO9660 is a disk filesystem, meant to be written once and read many times on many different kinds of hardware. As it is sector-based, it expects data structures to be at particular absolute offsets.
tl;dr

We're going to take an ISO, chop off the front, embed the remainder into a normal PDF, then mess with the version string / initial comment at the top of the PDF to add a jump into what's left of the first stage of a GRUB bootloader.
To make a CD, you must first make a CD

Our first task in crafting a PDF-ISO hybrid is to build our initial ISO. It's important that we do this first because we want to have an idea of how long its length is (that is to say, log-base-10; remember that PDFs are human-readable plain text, so they store integer values as plain-text ASCII decimal strings). For my résumé, I'm using a standard ToaruOS Live CD build, additionally customized to include a PDF reader as well as an additional copy of the normal PDF version of my résumé and a modified GRUB menu. To ensure you have the same bootloader as me, and thus the same sizes and offsets, you'll want to use GRUB2's grub-mkrescue tool with xorriso.
Fun with LaTeX

Once we have our ISO crafted, we can use it for our next step, which is figuring out how much PDF is going to come before our CD bits. Our PDF will be built from a LaTeX source. I use pdflatex, and I suggest you do the same if you want similar results, as I can't be sure how much other LaTeX packages/tools will differ.
Let's look at a very basic PDF with an embedded file:
\documentclass{article}

\usepackage{embedfile} % The real magic happens here.
\pdfcompresslevel=0    % PDFs can compress embedded files, but we need
\pdfobjcompresslevel=0 % to make sure this doesn't happen to our ISO.

\begin{document}

\embedfile{iso.dat}    % Embed our target file.

Hello world!

\end{document}
If we copy our ISO over to iso.dat for this step, we'll get a PDF that looks something like this:
%PDF-1.5
%���
1 0 obj
<<
/Type/EmbeddedFile/Params<</ModDate(D:20160906085329+09'00')/Size 39276544/CheckSum<F0E03DC64A8961790A59EE55FB832773>>>
/Length 39276544  
>>
stream
(garbage here)
(more actual PDF content)
We need to make space for this data in our ISO, so let's see what the first 0x200 bytes of an ISO with a GRUB bootloader looks like:
00000000: eb63 9090 9090 9090 9090 9090 9090 9090  .c..............
00000010: 9090 9090 9090 9090 9090 eb49 2412 0f09  ...........I$...
00000020: 0052 be1b 7c31 c0cd 1346 8a0c 84c9 7510  .R..|1...F....u.
00000030: be39 7ce8 7401 e93d 0146 6c6f 7070 7900  .9|.t..=.Floppy.
00000040: bb00 708e c331 dbb8 0102 b500 b600 cd13  ..p..1..........
00000050: 72d3 b601 b54f e998 0000 0080 0000 0000  r....O..........
00000060: 0000 0000 fffa eb05 f6c2 8074 05f6 c270  ...........t...p
00000070: 7402 b280 ea79 7c00 0031 c08e d88e d0bc  t....y|..1......
00000080: 0020 fba0 647c 3cff 7402 88c2 52be 807d  . ..d|<.t...R..}
00000090: e817 01be bf7d b441 bbaa 55cd 135a 5272  .....}.A..U..ZRr
000000a0: 3d81 fb55 aa75 3783 e101 7432 31c0 8944  =..U.u7...t21..D
000000b0: 0440 8844 ff89 4402 c704 1000 668b 1eb0  .@.D..D.....f...
000000c0: 7d66 895c 0866 8b1e b47d 6689 5c0c c744  }f.\.f...}f.\..D
000000d0: 0600 70b4 42cd 1372 05bb 0070 eb76 b408  ..p.B..r...p.v..
000000e0: cd13 730d 5a84 d20f 8336 ffbe 8b7d e982  ..s.Z....6...}..
000000f0: 0066 0fb6 c688 64ff 4066 8944 040f b6d1  .f....d.@f.D....
00000100: c1e2 0288 e888 f440 8944 080f b6c2 c0e8  .......@.D......
00000110: 0266 8904 66a1 b47d 6609 c075 4e66 a1b0  .f..f..}f..uNf..
00000120: 7d66 31d2 66f7 3488 d131 d266 f774 043b  }f1.f.4..1.f.t.;
00000130: 4408 7d37 fec1 88c5 30c0 c1e8 0208 c188  D.}7....0.......
00000140: d05a 88c6 bb00 708e c331 dbb8 0102 cd13  .Z....p..1......
00000150: 721e 8cc3 601e b900 018e db31 f6bf 0080  r...`......1....
00000160: 8ec6 fcf3 a51f 61ff 265a 7cbe 867d eb03  ......a.&Z|..}..
00000170: be95 7de8 3400 be9a 7de8 2e00 cd18 ebfe  ..}.4...}.......
00000180: 4752 5542 2000 4765 6f6d 0048 6172 6420  GRUB .Geom.Hard 
00000190: 4469 736b 0052 6561 6400 2045 7272 6f72  Disk.Read. Error
000001a0: 0d0a 00bb 0100 b40e cd10 ac3c 0075 f4c3  ...........<.u..
000001b0: b815 0000 0000 0000 0000 0000 0000 8000  ................
000001c0: 0200 cd1d 0425 0100 0000 a32b 0100 0000  .....%.....+....
000001d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 55aa  ..............U.

That's the GRUB first stage, and it's actually pretty big - we don't have enough room for all of it. Thankfully, we don't need all of it, as we can skip the error reporting. First, we'll chop off those useless bytes at the front - that's a jump and a bunch of no-ops.
tail -c +27 base.iso > iso.dat.1 # Get rid of the no-ops.

Now we'll take just the first 2048 bytes of what's left and set that aside - this gets us the important bytes of the bootloader first stage, though when we reassemble it some of it will be over the 512 byte border and not get loaded. 2048 was chosen arbitrarily; just make sure you have the same number of bytes here as you remove from the next step.
head -c 2048 iso.dat.1 > iso.dat.a # The important parts of the bootloader.

Next we'll take the rest of the data and chop off some space to ensure it is aligned. Remember how we said we needed to know how long the length of our ISO was - as well as the rest of the metadata from the PDF above? This is where that comes into play. After we've chopped the front of the ISO off, we need to align the rest of it so it shows up in our PDF in the same place it did in the original ISO. We do this by chopping off the difference in alignment between where this section would end up in the PDF and where it should be to be aligned at the right place. Basically, we chop off however long that metadata from earlier was.
tail -c +2049 iso.dat.1 > iso.dat.b # The rest of the ISO.
tail -c +150 iso.dat.b > iso.dat.c  # Now aligned

Don't worry about those 149 bytes we're getting rid of - the interesting parts of the ISO don't start until 0x8000 bytes in. Now we'll stick the bootloader first stage back into the file:
cat iso.dat.a iso.dat.c > iso.dat

Now we're ready to at least make a basic ISO. If we rebuild our PDF, we can then mount it and see that the ISO9660 file system is still working:
klange@luka ~/S/test# mount -t iso9660 -o loop test.pdf /mnt
klange@luka ~/S/test# ls /mnt
boot  boot.catalog  kernel  mod  ramdisk.img.gz  wallpaper.png

But we're not done, as our bootsector has been rendered non-functional. We'll need to write a trampoline to get it working again. Luckily, we have plenty of space up there in the PDF version header!
PDF's version header looks something like this:
%PDF-1.5
%<GARBAGE>
These lines start with % because that is the comment character in a PDF. That garbage on the second line is actually part of the PDF spec which says the first several bytes of your file must contain non-ASCII bytes so that FTP and similar things don't try to send PDFs as text files and convert line endings. But what bytes you use aren't actually defined, and they don't need to be on the second line, so we can use our trampoline to fulfill this requirement. The version number is also superfluous - all a real PDF needs is %PDF-. As luck may have it, %PDF- translates to a perfectly acceptable set of x86 opcodes, so we can put our jump after this, replacing the 1.5\n%..., so let's write a patch tool:
import os
import sys

filename = 'out.pdf'

with open(filename,'r+b') as f:
    f.seek(0)
    f.write('%PDF-\xEB\xF3\xEB\xF1\xEB\xEF')
Now we have a trampoline to jump us into the bootloader first stage - we try a couple of different relative jumps just in case we botched the instruction alignment. But, our ISO still won't boot. When we moved the first stage forward to get the important parts into the embed region, we also moved the special bytes that mark this as a boot sector (0x55 0xAA), so we'll need to patch those in. Remember how we said that not all of the boot sector was needed? These bytes will overwrite some error handling functions:
    f.seek(0x1FE)
    f.write('\x55\xAA')
Now our PDF should boot in QEMU - but VirtualBox will likely give us trouble for two reasons.

Our PDF isn't a multiple of the CD sector size (2048 bytes). Mine came out to 36510599 bytes.
Our PDF has the .pdf file extension!

Let's add another step to our fix-up script to pad appropriately:
size = os.stat(filename).st_size

pad = 0
while (size+pad) % 2048 != 0:
    pad += 1

with open(filename,'r+b') as f:
    f.seek(size)
    f.write('\x00'*pad)
This yields a file of 36511744 bytes, which is a multiple of 2048. Finally, we'll rename our PDF to have a .iso extension, and it should boot in VirtualBox. Yay!

Bonus Points

A fun fact about ISO9660: The sector offsets for files are all based on the start of the desk, so you can make a file with offset 0 and length equal to the size of the CD (in sectors) and get a self-referential file. Maybe call it self.pdf?