Skip to content

Instantly share code, notes, and snippets.

@joepie91
Last active April 12, 2024 16:08
Show Gist options
  • Save joepie91/26579e2f73ad903144dd5d75e2f03d83 to your computer and use it in GitHub Desktop.
Save joepie91/26579e2f73ad903144dd5d75e2f03d83 to your computer and use it in GitHub Desktop.
Prefix codes (explained simply)

A "prefix code" is a type of encoding mechanism ("code"). For something to be a prefix code, the entire set of possible encoded values ("codewords") must not contain any values that start with any other value in the set.

For example: [3, 11, 22] is a prefix code, because none of the values start with ("have a prefix of") any of the other values. However, [1, 12, 33] is not a prefix code, because one of the values (12) starts with another of the values (1).

Prefix codes are useful because, if you have a complete and accurate sequence of values, you can pick out each value without needing to know where one value starts and ends.

For example, let's say we have the following codewords: [1, 2, 33, 34, 50, 61]. And let's say that the sequence of numbers we've received looks like this:

1611333425012

We can simply start from the left, until we have the first value:

1 611333425012

It couldn't have been any value other than 1, because by definition of what a prefix code is, if we have a 1 codeword, none of the other codewords can start with a 1.

Next, we just do the same thing again, with the numbers that are left:

1 61 1333425012

Again, it could only have been 61 - because in a prefix code, none of the other codewords would have been allowed to start with 61.

Let's try it again for the next number:

1 61 1 333425012

Same story, it could only have been a 1. And again:

1 61 1 33 3425012

Remember, our set of possible codewords is [1, 2, 33, 34, 50, 61].

In this case, it could only have been a 33, because again, nothing else in the set of codewords was allowed to start with 33. It couldn't have been 34 either - even though it also starts with a 3 (like 33 does), the lack of a 4 as the second digit excludes it as an option.

You can simply keep repeating this until there are no numbers left:

1 61 1 33 34 2 50 1 2

... and now we've 'decoded' the sequence of numbers, even though the sequence didn't contain any information on where one number starts and the next number ends.

Note how the fact that both 33 and 34 start with a 3 didn't matter; shared prefixes are fine, so long as one value isn't in its entirety used as a prefix of another value. So while [33, 34] is fine (it only shares the 3, neither of the numbers in its entirety is a prefix of the other), [33, 334] would not be fine, since 33 is a prefix of 334 in its entirety (33 followed by 4).

This only works if you can be certain that you got the entire message accurately, though; for example, consider the following sequence of numbers:

11333425012

(Note how this is just the last part of 16 11333425012 )

Now, let's look at the first number - it starts with a 1. However, we don't know what came before, so is it part of a 61, or is it just a single, independent 1? There's no way to know for sure, so we can't split up this message.

It doesn't work if you violate the "can't start with another value" rule, either; for example, let's say that our codewords are [1, 3, 12, 23], and we want to decode the following sequence:

12323

Let's start with the first number. It starts with a 1, so it could be either 1 or 12. We have no way to know! In this particular example we can't figure it out from the numbers after it, either, as there are two different ways to decode this sequence:

1 23 23

12 3 23

And that's why a prefix code is useful, if you want to distinguish values in a sequence that doesn't have explicit 'markers' of where a value starts and ends.

@legowerewolf
Copy link

Thank you! I just popped off a Tom Scott video talking about British phone numbers and yeah, this is really clever.

@shubham-patidar-roostify

This is very neatly explained. Thanks.

Copy link

ghost commented Apr 6, 2021

Thanks!

@Sofiullah-Iqbal-Kiron
Copy link

Sofiullah-Iqbal-Kiron commented Jan 31, 2022

Thank You, Sir.
It let's me get a better understanding about Prefix Code.

@longy2k
Copy link

longy2k commented Sep 19, 2022

Awesome tutorial!

@Moiz-Afzal
Copy link

It helps a lot. Thanks for the tutorial.

@maxwells4299
Copy link

That was awesome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment