Networks must be able to transfer data from one device to another with acceptable accuracy. For most applications, a system must guarantee that the data received are identical to the data transmitted. However, data can be corrupted during transmission. Thus, applications must have a mechanism for detecting and correcting errors.
- Single-bit error. Happens when only one (1) bit of a given data unit (byte, character, packet) is changed from 0 to 1 or 1 to 0.
- Burst error Happens when two or more bits of a given data unit is changed. This kind of error is more likely to happen because most noises don’t last fast enough to affect only one byte. For example, if one is transmitting data at 1 kbps and a noise occurs at 1/100 s, 10 bits will be affected. If one is transmitting at 1 Mbps, the same noise affects 10000 bits.
To detect errors, we need to send some extra bits with our data. The process of sending additional bits along with the data is called redundancy.
Sender { message = some_bit_value encoded_message = generator(message) send(encoded_message) }
Receiver{ receive(encoded_message) isAccepted = checker(encoded_message) if(isAccepted) message = extract(encoded_message) else discardMessage() }
Redundancy is achieved through various coding schemes. The sender adds redundant bits through a process that creates a relationship between the redundant bits and the actual data bits. The receiver checks the relationships between the two sets of bits to detect or correct the errors. The ratio of redundant bits to the data bits and the robustness of the process are important factors in any coding scheme.
In block coding, we divide our message into blocks, each of kkk bits, called datawords. We add rrr redundant bits to each block to make the length n=k+rn = k + rn=k+r. The resulting n-bit blocks are called codewords.
In this model, we have a set of datawords, each of size kkk, to create a combination of 2k2^k2k datawords, and a set of codewords, each of size n=k+rn = k + rn=k+r, to create a set of 2k+r2^{k+r}2k+r codewords. The block coding process is one-to-one; the same datawords is always encoded as the same codeword. Since n>kn > kn>k, we have 2n−2k2^n - 2 ^ k2n−2k codewords that are not used.
For example, let us assume that k=2k = 2k=2 and n=3n = 3n=3. The table below shows the list of datawords and codewords.
Datawords | Codewords |
---|---|
00 | 000 |
01 | 011 |
10 | 101 |
11 | 110 |
Assume that the sender encodes datawork 01
as 011
and sends it to the receiver. Consider the following cases:
- The receiver receives
011
. It is a valid codeword. The receiver extracts01
. - The codeword is corrupted.
111
is received. This is not a valid codeword and it is discarded. - The code word is corrupted.
000
is received. This is a valid codeword and is accepted. The receiver incorrectly extracts00
. The corruption is undetected.
The Hamming distance is one of the central concepts in error control. The Hamming distance is the distance between two words of the same size.
The Hamming distance between the received codewords and the sent codewords is the number of bits corrupted during transmission.
If our code is to detect up to sss errors, the minimum distance between valid codewords must be s+1s + 1s+1.
Let’s assume a code
Datawords | Codewords |
---|---|
00 | 000 |
01 | 011 |
10 | 101 |
11 | 110 |
d(000,011)=2d(000, 011) = 2d(000,011)=2
d(000,101)=2d(000, 101) = 2d(000,101)=2
d(000,110)=2d(000, 110) = 2d(000,110)=2
d(011,101)=2d(011, 101) = 2d(011,101)=2
d(011,110)=2d(011, 110) = 2d(011,110)=2
d(101,110)=2d(101, 110) = 2d(101,110)=2
dmin=2d_{min}=2dmin=2
This code has a minimum Hamming distance of 222. And could therefore guarantee to catch an error of size 111. Since s+1=dmins + 1 = d_{min}s+1=dmin.
A code win which the exclusive OR of two valid codewords creates another valid codeword.
The minimum Hamming distance of a linear block code is the number of 1s on the nonzero valid codeword with the smallest number of 1s.
Our first code is an example of a linear block code
Datawords | Codewords |
---|---|
00 | 000 |
01 | 011 |
10 | 101 |
11 | 110 |
In this code, a k-bit dataword is changed to an n-bit codeword where n = k + 1. This extra bit, called the parity bit, is selected to make the total number of 1s even.
Datawords | Codewords |
---|---|
0000 | 00000 |
0001 | 00011 |
0010 | 00101 |
0011 | 00110 |
0100 | 01001 |
0101 | 01010 |
0110 | 01100 |
0111 | 01111 |
1000 | 10001 |
1001 | 10010 |
1010 | 10100 |
1011 | 10111 |
1100 | 11000 |
1101 | 11011 |
1110 | 11101 |
1111 | 11110 |
The parity bit is generated by adding the four bits in the datawords. The result is the parity bit.
r0=a3+a2+a1+a0(modulo2)r_0 = a_3 + a_2 + a_1 + a_0 (modulo2)r0=a3+a2+a1+a0(modulo2)
The receiver adds all the bits in the received codeword. The result is the syndrome. It’s 000 when the number of 1s is even.
s0=b3+b2+b1+b0+q0(modulo2)s_0=b_3+b_2+b_1+b_0+q_0 (modulo2)s0=b3+b2+b1+b0+q0(modulo2)
Let’s send 1011. We’ll encode it as 10111. There are 5 possible outcomes.
- No error occurs. The received codewords is 10111. The syndrome is 0. Dataword 1011 is received.
- A single-bit error occurs. 10011. The syndrome is 1. Rejected.
- A single-bit error occurs. 10110 is received. The syndrome is 1. Rejected.
- Burst error. 00110 is received. Syndrome is 0. Dataword 011 is received.
- Burst error. 01011. Syndrome is 1. Rejected.
In this code, the minimum hamming distance is 3. If m = 3, then n = 7 and k = 4.
Redundant bits are generated and verified by performing three parity checks on a2+a1+a0a_2+a_1+a_0a2+a1+a0, a3+a2+a1a_3+a_2+a_1a3+a2+a1, and a1+a0+a3a_1+a_0+a_3a1+a0+a3.
Suppose our data is a list of 4-bit numbers. In addition to these numbers, we send the sum of the numbers. For example, let’s send [7,11,12,0,6][7, 11, 12, 0 ,6][7,11,12,0,6], we will have to include [36][36][36], resulting in a final transmission of [7,11,12,0,6,36][7, 11, 12, 0, 6, 36][7,11,12,0,6,36]. The receiver adds the five numbers and compares it to the sum.
Sometimes the inverse of the sum is sent. If we use our previous example, we’d be sending [7,11,12,0,6,−36][7, 11, 12, 0, 6, -36][7,11,12,0,6,−36]. The receiver adds all the numbers. If the result is 000, there were no errors.
Behrouz A. Forouzan & Sophia Chung Fegan. Data Communications and Networking [4th Ed.], 2007.
Samuel Zheler. Crates [picture]. Available: unsplash.com
EMI. Trade ad for Beatles’ 1964 Grammys. — This is a version with just the Beatles isolated from the ad, 1965. Available: commons.wikimedia.org