joonas-fi/understanding-linux-full-disk-encryption.md Secret

## understanding-linux-full-disk-encryption.md

      
    Raw
  

              understanding-linux-full-disk-encryption.md
            
          
    Understanding Linux full disk encryption

This article tries to explain how full disk encryption works in Linux.
There are many ways to do it, but we'll explain the LUKS way of doing it.
LUKS (Linux Unified Key Setup) is a standard specifying how full disk
(or full volume encryption) should be implemented.
LUKS is implemented with:

cryptsetup (frontend, i.e. the binaries) and
dm-crypt (kernel module, part of the device mapper infrastructure,
see diagram)

First, let's start with a great summary from an
AskUbuntu user:

Luks is an encryption layer on a block device, so it operates on a particular block device, and exposes a new block device which is the decrypted version. Access to this device will trigger transparent encryption/decryption while it's in use.
It's typically used on either a disk partition, or a LVM physical volume which would allow multiple partitions in the same encrypted container.
LUKs stores a bunch of metadata at the start of the device. It has slots for multiple passphrases. Each slot has a 256 bit salt that is shown in the clear along with an encrypted message. When entering a passphrase LUKS combines it with each of the salts in turn, hashing the result and tries to use the result as keys to decrypt an encrypted message in each slot. This message consists of some known text, and a copy of the master key. If it works for any one of the slots, because the known text matches, the master key is now known and you can decrypt the entire container. The master key must remain unencrypted in RAM while the container is in use.
Knowing the master key allows you access to all the data in the container, but doesn't reveal the passwords in the password slots, so one user cannot see the passwords of other users.
The system is not designed for users to be able to see the master key while in operation, and this key can't be changed without re-encrypting. The use of password slots, however, means that passwords can be changed without re-encrypting the entire container, and allows for use of multiple passwords.

How my partition hierarchy looks like

I installed Linux Mint with "disk encryption" ticked, and ended up with this:
$ lsblk -o name,size,fstype,label,mountpoint
NAME                    SIZE FSTYPE      LABEL MOUNTPOINT
sda                   119,2G                   
├─sda2                  488M ext2              /boot
├─sda3                118,3G crypto_LUKS       
│ └─sda3_crypt        118,3G LVM2_member       
│   ├─mint--vg-root   110,4G ext4              /
│   └─mint--vg-swap_1   7,9G swap              [SWAP]
└─sda1                  512M vfat              /boot/efi

SDA is my only physical disk.
It has three partitions:

(unencrypted) EFI stuff (nevermind about this, it's required by
UEFI -
kind of a next-gen BIOS)
(unencrypted) Boot partition
(encrypted) crypto_LUKS partition, which is a virtual block device which can
host child partitions (all of which will be encrypted)

In my case the crypto_LUKS partition contains LVM2_member (because I set up LVM)
which is another virtual device that under it contains the concrete filesystems
ext4 (/) and swap. You don't probably have to use LVM to achieve encryption,
but I used it because you can do cool stuff like extend a volume to two hard disks,
snapshotting etc.
You can think of crypto_LUKS and LVM2_member as filters: they take something as
input, and give something as an output. crypto_LUKS takes the raw encrypted blocks
from the hard disk and give the decrypted blocks as an output. LVM is another
filter (and another indirection) which gives additional features.
The above textual graph in a more visual way:
               +-----------+
               |           |
               | Hard disk |
               |           |
               +-+---+---+-+
                 |   |   |
        +--------+   |   +--------+
        |            |            |
        |            |            |
+-------v---+  +-----v----+ +-----v------------------+
|           |  |          | |                        |
|  Volume 1 |  | Volume 2 | | Volume 3               |
|  EFI      |  | /boot    | | crypto_LUKS            |
|           |  |          | | (virtual block device) |
+-----------+  +----------+ |                        |
                            +------+-----------------+
                                   |
                                   |
                            +------v------+
                            |             |
                            | Volume 1    |
                            | LVM2_member |
                            |             |
                            +--+--------+-+
                               |        |
                               |        |
                          +----v-----+ +v-----+
                          |          | |      |
                          | / (root) | | swap |
                          | ext4     | |      |
                          |          | +------+
                          +----------+

How the system boots up

Since the encryption is implemented in software (Linux kernel module), the system has to
read some stuff from the hard drive to be able to present a user interface for
asking the passphrase to recover the encryption key with (to achieve decryption),
so how can you say my data is secret if the computer can read some data off of the
disk without my passphrase?
Good question! Basically it boils down to this: the boot partition (/boot) is
unencrypted and it contains the kernel and the most important modules that are required
before actually booting the machine into usable state. These things are required pre-boot:

Kernel (/boot/vmlinuz-4.8.0-53-generic)
Initial RAM disk (/boot/initrd.img-4.8.0-53-generic, read more)
Other stuff from /boot like GRUB's
(Linux's bootloader) configuration

So the boot process goes like this:

You press the power button
The CPU starts, and jumps to a hardcoded address of
0xfffffff0, which is mapped
to BIOS ROM by the motherboard/hardware.
BIOS does a bunch of stuff, but for our purposes it is sufficient to say
that it holds a setting from which disk to boot up
BIOS digs up the MBR (Master Boot Record)
from the chosen disk and hands off execution to it. BIOS is not involved from now on.
MBR (GRUB's stage 1 loader) is so small it can't contain logic to read from /boot filesystem,
so it starts loading
GRUB stage 1.5 from DOS compatibility area of the disk,
now stage 1.5 (still too small to contain the entire GRUB) can read /boot and load/execute
GRUB ("stage 2").
GRUB now reads kernel from /boot to RAM, and hand off booting to it.
Kernel starts booting, mounts the initrd so it can access pre-boot utilities/drivers/etc
to actually begin booting the system into usable state.
The Kernel can now load modules and understand partitions like crypto_LUKS and
LVM2_member, but will not be able to read them (because it doesn't know the master key).
cryptsetup will now ask your passphrase, which is used to uncover the master key from
one of the key slots (explained later).
If you gave the correct passphrase, the Kernel can now decrypt stuff from under
the crypto_LUKS and start mounting and reading from the LVM volumes!

How does the encryption work?

Let's dive right into this by asking what metadata the LUKS system has on that hard disk:
$ cryptsetup luksDump /dev/sda3
LUKS header information for /dev/sda3

Version:        1
Cipher name:    aes
Cipher mode:    xts-plain64
Hash spec:      sha256
Payload offset: 4096
MK bits:        512
MK digest:      0e 28 66 97 5f e3 57 54 49 e1 92 95 11 f8 13 4f 0a 2d 21 0f 
MK salt:        b1 2f a4 ad 8a 4c 50 28 e1 b7 30 5b 6e 72 b3 b1 
                a8 40 0a 59 1f eb 49 8d c4 41 36 e7 21 10 ae 8b 
MK iterations:  63125
UUID:           c8286408-de03-40f5-93ef-274cf534563f

Key Slot 0: ENABLED
	Iterations:              512000
	Salt:                    fc ae 72 8e 9b 71 5c 2e 77 8a 8b 23 da e4 0f 2c 
	                         be a0 b9 5a 74 0b 9b d5 5e 67 d5 90 2e f2 7b b8 
	Key material offset:     8
	AF stripes:              4000
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

Take note from above output:

Lines with "MK" refer to master key
There are 8 key slots, of which only the first one I use.

Why key slots? It is an additional indirection, but the added complexity carries
its own weight! When you want to change the disk encryption passphrase, and if
the passphrase would be the encryption key (or even derived from), you would
have to re-encrypt the whole disk because you're effectively changing the
encryption key - not good!
Instead, LUKS uses a neat trick: the master key is not something you specify,
but it is machine generated and thus contains more entropy and thus is more
secure than passphrases that people come up with.
The master key is encrypted with your passphrase (know your passphrase =>
know the encryption key). That means that when you change your passphrase,
all LUKS has to do is re-encrypt only the master key with your new passphrase.
The master key remains unchanged, but your passphrase (and MK's encrypted form)
changes.
Additionally, LUKS supports multiple passphrases (by having multiple "key slots")
if you want to have multiple people use the same computer (or possibly for key
rotation if two passphrases have to overlap for a short while).
Moving on! Let's say the master key is hunter2 (in reality it isn't human
readable and simple like this but this is easier to explain with).
The master key is not stored on disk - only its encrypted form is. Let's remember
the LUKS metadata we looked at before, specifically:

MK salt
MK digest

MK digest could be defined like this (this is all pseudocode, and in reality uses
the more secure bruteforce-resistant PBKDF2):
masterKeyDigest = sha256(masterKey + masterKeySalt)

masterKeySalt is not private information (it's stored in plaintext
in the unencrypted portion of the disk). Let's say the salt is salty salt.
Therefore, plucking those details in to calculate the digest:
$ echo -n "hunter2salty salt" | sha256sum
ffa479b8cbc9f526c12fb50acadd27d665155fe0b0729440a405a60398da4b65

=> Our digest is the ffa479b8... string.
Now, the volume header would contain these public details:
MK digest:      ffa479b8cbc9f526c12fb50acadd27d665155fe0b0729440a405a60398da4b65
MK salt:        salty salt

Now, how about the key slots? Remember, they store the master key encrypted with
the passphrase of your choosing.
Let's say our passphrase for slot 0 is supersecret (and the master key was hunter2).
Salt for key slot 0 will be slot0salt (salt is public knowledge).
Therefore, our key slot 0 encryption key (= slot 0 passphrase and salt
concatenated) will be supersecretslot0salt.
Let's now encrypt the master key (with blowfish as an example, but again in
reality a better algorithm, PBKDF2 is used to protect brute-forcing the
encryption key):
$ echo 'hunter2' | openssl bf -a -k supersecretslot0salt
U2FsdGVkX1/odbGQLAZ95E8uA+THJLMq0Zy3H5H87R0=

Now, here are all the public metadata for the master key slot-based encryption
to work:
MK digest:      ffa479b8cbc9f526c12fb50acadd27d665155fe0b0729440a405a60398da4b65
MK salt:        salty salt

Key Slot 0: ENABLED
	Salt:                  slot0salt
	Master key encrypted:  U2FsdGVkX1/odbGQLAZ95E8uA+THJLMq0Zy3H5H87R0=

Key Slot 1: DISABLED
...
Key Slot 7: DISABLED

Putting it all together

So, let's stitch this all together to see how the system boots up!
Your machine can only read the unencrypted /boot, which contains the kernel and supporting
stuff to get the booting process started. The system knows that there are encrypted partitions,
knows the metadata we listed in the previous heading, and will ask you for the passphrase.
You enter the correct passphrase for key slot 0, supersecret.
Now we decode the master key from slot 0 by combining the passphrase and salt into it:
$ echo U2FsdGVkX1/odbGQLAZ95E8uA+THJLMq0Zy3H5H87R0= | openssl bf -d -a -k supersecretslot0salt
hunter2

Now, if the passphrase was wrong, the system doesn't know if the master key was decoded
correctly. Let's imagine that we gave passphrase wrongpw and thus the master key was
decoded as wrongmasterkey.
This is where the master key digest and salt step in. Let's try digesting the wrongmasterkey:
$ echo -n "wrongmasterkeysalty salt" | sha256sum
c7bbe305ba3092ac42e6503e54022f0102a97141e79b4568690face6dda04150

The digest should've been ffa479b8... so that passphrase for that slot was wrong.
When entering the passphrase, it doesn't know (or ask) to which slot the passphrase
belongs to, but rather just iterates all the slots (0..7) and tries to decrypt the master
key with the above process I outlined. If the try is unsuccesful, it moves to
next slot. If all slot tries are unsuccesful, it was a wrong passphrase.
Now let's try it again with the correct master key decrypted with the correct slot 0 passphrase:
$ echo -n "hunter2salty salt" | sha256sum
ffa479b8cbc9f526c12fb50acadd27d665155fe0b0729440a405a60398da4b65

That matched the metadata (MK Digest), so now we succesfully uncovered the master key
from slot 0 by knowing slot 0's correct passphrase.
Parting words

Hopefully I managed to shed some light on how Linux full-disk encryption works
(particularly, the LUKS kind with cryptsetup and dm-crypt), and by extension now
you know how other systems achieve full disk encryption, because they're somewhat
similar anyway.
Even though you can most certainly use encryption without understanding how it's
implemented under the hood, you get way more confidence if it's not just "black magic"
to you, but rather you actually understand at least the basics of what happens under the hood.
Futher reading


How does GNU GRUB work (technical internals)
Inspecting the Content of an Initrd File
Android disk encryption is similar: Revisiting Android disk encryption
Dissecting LUKS (from a crypto standpoint)