Skip to content

Instantly share code, notes, and snippets.

@joonas-fi
Created August 16, 2019 10:54
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save joonas-fi/b6bfc564a1a7f3d6b120728db5d48b61 to your computer and use it in GitHub Desktop.
Save joonas-fi/b6bfc564a1a7f3d6b120728db5d48b61 to your computer and use it in GitHub Desktop.
Unfinished blog post on Linux full disk encryption

Understanding Linux full disk encryption

This article tries to explain how full disk encryption works in Linux.

There are many ways to do it, but we'll explain the LUKS way of doing it. LUKS (Linux Unified Key Setup) is a standard specifying how full disk (or full volume encryption) should be implemented.

LUKS is implemented with:

  • cryptsetup (frontend, i.e. the binaries) and
  • dm-crypt (kernel module, part of the device mapper infrastructure, see diagram)

First, let's start with a great summary from an AskUbuntu user:

Luks is an encryption layer on a block device, so it operates on a particular block device, and exposes a new block device which is the decrypted version. Access to this device will trigger transparent encryption/decryption while it's in use.

It's typically used on either a disk partition, or a LVM physical volume which would allow multiple partitions in the same encrypted container.

LUKs stores a bunch of metadata at the start of the device. It has slots for multiple passphrases. Each slot has a 256 bit salt that is shown in the clear along with an encrypted message. When entering a passphrase LUKS combines it with each of the salts in turn, hashing the result and tries to use the result as keys to decrypt an encrypted message in each slot. This message consists of some known text, and a copy of the master key. If it works for any one of the slots, because the known text matches, the master key is now known and you can decrypt the entire container. The master key must remain unencrypted in RAM while the container is in use.

Knowing the master key allows you access to all the data in the container, but doesn't reveal the passwords in the password slots, so one user cannot see the passwords of other users.

The system is not designed for users to be able to see the master key while in operation, and this key can't be changed without re-encrypting. The use of password slots, however, means that passwords can be changed without re-encrypting the entire container, and allows for use of multiple passwords.

How my partition hierarchy looks like

I installed Linux Mint with "disk encryption" ticked, and ended up with this:

$ lsblk -o name,size,fstype,label,mountpoint
NAME                    SIZE FSTYPE      LABEL MOUNTPOINT
sda                   119,2G                   
├─sda2                  488M ext2              /boot
├─sda3                118,3G crypto_LUKS       
│ └─sda3_crypt        118,3G LVM2_member       
│   ├─mint--vg-root   110,4G ext4              /
│   └─mint--vg-swap_1   7,9G swap              [SWAP]
└─sda1                  512M vfat              /boot/efi

SDA is my only physical disk.

It has three partitions:

  1. (unencrypted) EFI stuff (nevermind about this, it's required by UEFI - kind of a next-gen BIOS)
  2. (unencrypted) Boot partition
  3. (encrypted) crypto_LUKS partition, which is a virtual block device which can host child partitions (all of which will be encrypted)

In my case the crypto_LUKS partition contains LVM2_member (because I set up LVM) which is another virtual device that under it contains the concrete filesystems ext4 (/) and swap. You don't probably have to use LVM to achieve encryption, but I used it because you can do cool stuff like extend a volume to two hard disks, snapshotting etc.

You can think of crypto_LUKS and LVM2_member as filters: they take something as input, and give something as an output. crypto_LUKS takes the raw encrypted blocks from the hard disk and give the decrypted blocks as an output. LVM is another filter (and another indirection) which gives additional features.

The above textual graph in a more visual way:

               +-----------+
               |           |
               | Hard disk |
               |           |
               +-+---+---+-+
                 |   |   |
        +--------+   |   +--------+
        |            |            |
        |            |            |
+-------v---+  +-----v----+ +-----v------------------+
|           |  |          | |                        |
|  Volume 1 |  | Volume 2 | | Volume 3               |
|  EFI      |  | /boot    | | crypto_LUKS            |
|           |  |          | | (virtual block device) |
+-----------+  +----------+ |                        |
                            +------+-----------------+
                                   |
                                   |
                            +------v------+
                            |             |
                            | Volume 1    |
                            | LVM2_member |
                            |             |
                            +--+--------+-+
                               |        |
                               |        |
                          +----v-----+ +v-----+
                          |          | |      |
                          | / (root) | | swap |
                          | ext4     | |      |
                          |          | +------+
                          +----------+

How the system boots up

Since the encryption is implemented in software (Linux kernel module), the system has to read some stuff from the hard drive to be able to present a user interface for asking the passphrase to recover the encryption key with (to achieve decryption), so how can you say my data is secret if the computer can read some data off of the disk without my passphrase?

Good question! Basically it boils down to this: the boot partition (/boot) is unencrypted and it contains the kernel and the most important modules that are required before actually booting the machine into usable state. These things are required pre-boot:

  • Kernel (/boot/vmlinuz-4.8.0-53-generic)
  • Initial RAM disk (/boot/initrd.img-4.8.0-53-generic, read more)
  • Other stuff from /boot like GRUB's (Linux's bootloader) configuration

So the boot process goes like this:

  • You press the power button
  • The CPU starts, and jumps to a hardcoded address of 0xfffffff0, which is mapped to BIOS ROM by the motherboard/hardware.
  • BIOS does a bunch of stuff, but for our purposes it is sufficient to say that it holds a setting from which disk to boot up
  • BIOS digs up the MBR (Master Boot Record) from the chosen disk and hands off execution to it. BIOS is not involved from now on.
  • MBR (GRUB's stage 1 loader) is so small it can't contain logic to read from /boot filesystem, so it starts loading GRUB stage 1.5 from DOS compatibility area of the disk, now stage 1.5 (still too small to contain the entire GRUB) can read /boot and load/execute GRUB ("stage 2").
  • GRUB now reads kernel from /boot to RAM, and hand off booting to it.
  • Kernel starts booting, mounts the initrd so it can access pre-boot utilities/drivers/etc to actually begin booting the system into usable state.
  • The Kernel can now load modules and understand partitions like crypto_LUKS and LVM2_member, but will not be able to read them (because it doesn't know the master key).
  • cryptsetup will now ask your passphrase, which is used to uncover the master key from one of the key slots (explained later).
  • If you gave the correct passphrase, the Kernel can now decrypt stuff from under the crypto_LUKS and start mounting and reading from the LVM volumes!

How does the encryption work?

Let's dive right into this by asking what metadata the LUKS system has on that hard disk:

$ cryptsetup luksDump /dev/sda3
LUKS header information for /dev/sda3

Version:        1
Cipher name:    aes
Cipher mode:    xts-plain64
Hash spec:      sha256
Payload offset: 4096
MK bits:        512
MK digest:      0e 28 66 97 5f e3 57 54 49 e1 92 95 11 f8 13 4f 0a 2d 21 0f 
MK salt:        b1 2f a4 ad 8a 4c 50 28 e1 b7 30 5b 6e 72 b3 b1 
                a8 40 0a 59 1f eb 49 8d c4 41 36 e7 21 10 ae 8b 
MK iterations:  63125
UUID:           c8286408-de03-40f5-93ef-274cf534563f

Key Slot 0: ENABLED
	Iterations:              512000
	Salt:                    fc ae 72 8e 9b 71 5c 2e 77 8a 8b 23 da e4 0f 2c 
	                         be a0 b9 5a 74 0b 9b d5 5e 67 d5 90 2e f2 7b b8 
	Key material offset:     8
	AF stripes:              4000
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

Take note from above output:

  • Lines with "MK" refer to master key
  • There are 8 key slots, of which only the first one I use.

Why key slots? It is an additional indirection, but the added complexity carries its own weight! When you want to change the disk encryption passphrase, and if the passphrase would be the encryption key (or even derived from), you would have to re-encrypt the whole disk because you're effectively changing the encryption key - not good!

Instead, LUKS uses a neat trick: the master key is not something you specify, but it is machine generated and thus contains more entropy and thus is more secure than passphrases that people come up with. The master key is encrypted with your passphrase (know your passphrase => know the encryption key). That means that when you change your passphrase, all LUKS has to do is re-encrypt only the master key with your new passphrase. The master key remains unchanged, but your passphrase (and MK's encrypted form) changes.

Additionally, LUKS supports multiple passphrases (by having multiple "key slots") if you want to have multiple people use the same computer (or possibly for key rotation if two passphrases have to overlap for a short while).

Moving on! Let's say the master key is hunter2 (in reality it isn't human readable and simple like this but this is easier to explain with).

The master key is not stored on disk - only its encrypted form is. Let's remember the LUKS metadata we looked at before, specifically:

  • MK salt
  • MK digest

MK digest could be defined like this (this is all pseudocode, and in reality uses the more secure bruteforce-resistant PBKDF2):

masterKeyDigest = sha256(masterKey + masterKeySalt)

masterKeySalt is not private information (it's stored in plaintext in the unencrypted portion of the disk). Let's say the salt is salty salt.

Therefore, plucking those details in to calculate the digest:

$ echo -n "hunter2salty salt" | sha256sum
ffa479b8cbc9f526c12fb50acadd27d665155fe0b0729440a405a60398da4b65

=> Our digest is the ffa479b8... string.

Now, the volume header would contain these public details:

MK digest:      ffa479b8cbc9f526c12fb50acadd27d665155fe0b0729440a405a60398da4b65
MK salt:        salty salt

Now, how about the key slots? Remember, they store the master key encrypted with the passphrase of your choosing.

Let's say our passphrase for slot 0 is supersecret (and the master key was hunter2).

Salt for key slot 0 will be slot0salt (salt is public knowledge).

Therefore, our key slot 0 encryption key (= slot 0 passphrase and salt concatenated) will be supersecretslot0salt.

Let's now encrypt the master key (with blowfish as an example, but again in reality a better algorithm, PBKDF2 is used to protect brute-forcing the encryption key):

$ echo 'hunter2' | openssl bf -a -k supersecretslot0salt
U2FsdGVkX1/odbGQLAZ95E8uA+THJLMq0Zy3H5H87R0=

Now, here are all the public metadata for the master key slot-based encryption to work:

MK digest:      ffa479b8cbc9f526c12fb50acadd27d665155fe0b0729440a405a60398da4b65
MK salt:        salty salt

Key Slot 0: ENABLED
	Salt:                  slot0salt
	Master key encrypted:  U2FsdGVkX1/odbGQLAZ95E8uA+THJLMq0Zy3H5H87R0=

Key Slot 1: DISABLED
...
Key Slot 7: DISABLED

Putting it all together

So, let's stitch this all together to see how the system boots up!

Your machine can only read the unencrypted /boot, which contains the kernel and supporting stuff to get the booting process started. The system knows that there are encrypted partitions, knows the metadata we listed in the previous heading, and will ask you for the passphrase.

You enter the correct passphrase for key slot 0, supersecret.

Now we decode the master key from slot 0 by combining the passphrase and salt into it:

$ echo U2FsdGVkX1/odbGQLAZ95E8uA+THJLMq0Zy3H5H87R0= | openssl bf -d -a -k supersecretslot0salt
hunter2

Now, if the passphrase was wrong, the system doesn't know if the master key was decoded correctly. Let's imagine that we gave passphrase wrongpw and thus the master key was decoded as wrongmasterkey.

This is where the master key digest and salt step in. Let's try digesting the wrongmasterkey:

$ echo -n "wrongmasterkeysalty salt" | sha256sum
c7bbe305ba3092ac42e6503e54022f0102a97141e79b4568690face6dda04150

The digest should've been ffa479b8... so that passphrase for that slot was wrong. When entering the passphrase, it doesn't know (or ask) to which slot the passphrase belongs to, but rather just iterates all the slots (0..7) and tries to decrypt the master key with the above process I outlined. If the try is unsuccesful, it moves to next slot. If all slot tries are unsuccesful, it was a wrong passphrase.

Now let's try it again with the correct master key decrypted with the correct slot 0 passphrase:

$ echo -n "hunter2salty salt" | sha256sum
ffa479b8cbc9f526c12fb50acadd27d665155fe0b0729440a405a60398da4b65

That matched the metadata (MK Digest), so now we succesfully uncovered the master key from slot 0 by knowing slot 0's correct passphrase.

Parting words

Hopefully I managed to shed some light on how Linux full-disk encryption works (particularly, the LUKS kind with cryptsetup and dm-crypt), and by extension now you know how other systems achieve full disk encryption, because they're somewhat similar anyway.

Even though you can most certainly use encryption without understanding how it's implemented under the hood, you get way more confidence if it's not just "black magic" to you, but rather you actually understand at least the basics of what happens under the hood.

Futher reading

@cipri-tom
Copy link

Very nice explanation! Thank you!

Do you know if we would be able to encrypt an external hard drive before partitioning it? I am interested in having a single key for the whole disk, while still being able to split it into different partitions.

@joonas-fi
Copy link
Author

Glad to know if it is of help! :)

Yes that should be fully supported and easier doing that way than having different keys for each partition - I think that would require you to have multiple crypto_LUKS partitions (with one regular partition inside each) if you wanted to have separate keys.

Since crypto_LUKS presents a new virtual block device, and block devices can be partitioned it should be quite trivial to partition. In my case I had an LVM member partition which also supports easy fiddling with partitions if one needs many partitions. But if you don't need LVM of course it's easier to set up just plain old regular partitions.

You can do this from command line or with GUI partitioning tools but I can't recommend any since I've always partitioned on Linux OS installation or from command line so I can't recommend any GUIs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment