A "prefix code" is a type of encoding mechanism ("code"). For something to be a prefix code, the entire set of possible encoded values ("codewords") must not contain any values that start with any other value in the set.
[3, 11, 22] is a prefix code, because none of the values start with ("have a prefix of") any of the other values. However,
[1, 12, 33] is not a prefix code, because one of the values (12) starts with another of the values (1).
Prefix codes are useful because, if you have a complete and accurate sequence of values, you can pick out each value without needing to know where one value starts and ends.
For example, let's say we have the following codewords:
[1, 2, 33, 34, 50, 61]. And let's say that the sequence of numbers we've received looks like this:
We can simply start from the left, until we have the first value:
It couldn't have been any value other than
1, because by definition of what a prefix code is, if we have a
1 codeword, none of the other codewords can start with a
Next, we just do the same thing again, with the numbers that are left:
1 61 1333425012
Again, it could only have been
61 - because in a prefix code, none of the other codewords would have been allowed to start with
Let's try it again for the next number:
1 61 1 333425012
Same story, it could only have been a
1. And again:
1 61 1 33 3425012
Remember, our set of possible codewords is
[1, 2, 33, 34, 50, 61].
In this case, it could only have been a
33, because again, nothing else in the set of codewords was allowed to start with
33. It couldn't have been
34 either - even though it also starts with a
33 does), the lack of a
4 as the second digit excludes it as an option.
You can simply keep repeating this until there are no numbers left:
1 61 1 33 34 2 50 1 2
... and now we've 'decoded' the sequence of numbers, even though the sequence didn't contain any information on where one number starts and the next number ends.
Note how the fact that both
34 start with a
3 didn't matter; shared prefixes are fine, so long as one value isn't in its entirety used as a prefix of another value. So while
[33, 34] is fine (it only shares the
3, neither of the numbers in its entirety is a prefix of the other),
[33, 334] would not be fine, since
33 is a prefix of
334 in its entirety (
33 followed by
This only works if you can be certain that you got the entire message accurately, though; for example, consider the following sequence of numbers:
(Note how this is just the last part of 16 11333425012 )
Now, let's look at the first number - it starts with a
1. However, we don't know what came before, so is it part of a
61, or is it just a single, independent
1? There's no way to know for sure, so we can't split up this message.
It doesn't work if you violate the "can't start with another value" rule, either; for example, let's say that our codewords are
[1, 3, 12, 23], and we want to decode the following sequence:
Let's start with the first number. It starts with a
1, so it could be either
12. We have no way to know! In this particular example we can't figure it out from the numbers after it, either, as there are two different ways to decode this sequence:
1 23 23
12 3 23
And that's why a prefix code is useful, if you want to distinguish values in a sequence that doesn't have explicit 'markers' of where a value starts and ends.