noahdominic/error detection.md

## error detection.md

      
    Raw
  

              error detection.md
            
          
    Error detection

Networks must be able to transfer data from one device to another with acceptable accuracy. For most applications, a system must guarantee that the data received are identical to the data transmitted. However, data can be corrupted during transmission. Thus, applications must have a mechanism for detecting and correcting errors.
Introduction

Types of Errors


Single-bit error. Happens when only one (1) bit of a given data unit (byte, character, packet) is changed from 0 to 1 or 1 to 0.
Burst error Happens when two or more bits of a given data unit is changed. This kind of error is more likely to happen because most noises don’t last fast enough to affect only one byte. For example, if one is transmitting data at 1 kbps and a noise occurs at 1/100 s, 10 bits will be affected. If one is transmitting at 1 Mbps, the same noise affects 10000 bits.

Redundancy

To detect errors, we need to send some extra bits with our data. The process of sending additional bits along with the data is called redundancy.
Sender {
    message = some_bit_value
    encoded_message = generator(message)
    send(encoded_message)
}
Receiver{
receive(encoded_message)
isAccepted = checker(encoded_message)
if(isAccepted)
message = extract(encoded_message)
else
discardMessage()
}

Coding

Redundancy is achieved through various coding schemes. The sender adds redundant bits through a process that creates a relationship between the redundant bits and the actual data bits. The receiver checks the relationships between the two sets of bits to detect or correct the errors. The ratio of redundant bits to the data bits and the robustness of the process are important factors in any coding scheme.
Block Coding

In block coding, we divide our message into blocks, each of kkk bits, called datawords. We add rrr redundant bits to each block to make the length n=k+rn = k + rn=k+r. The resulting n-bit blocks are called codewords.
In this model, we have a set of datawords, each of size kkk, to create a combination of 2k2^k2k datawords, and a set of codewords, each of size n=k+rn = k + rn=k+r, to create a set of 2k+r2^{k+r}2k+r codewords. The block coding process is one-to-one; the same datawords is always encoded as the same codeword. Since n&gt;kn &gt; kn>k, we have 2n−2k2^n - 2 ^ k2n−2k codewords that are not used.
Example

For example, let us assume that k=2k = 2k=2 and n=3n = 3n=3. The table below shows the list of datawords and codewords.


Datawords
Codewords


00
000


01
011


10
101


11
110


Assume that the sender encodes datawork 01 as 011 and sends it to the receiver. Consider the following cases:

The receiver receives 011. It is a valid codeword. The receiver extracts 01.
The codeword is corrupted.  111 is received. This is not a valid codeword and it is discarded.
The code word is corrupted.  000 is received. This is a valid codeword and is accepted. The receiver incorrectly extracts  00. The corruption is undetected.

Hamming Distance

The Hamming distance is one of the central concepts in error control. The Hamming distance is the distance between two words of the same size.
Minimum Hamming Distance

The Hamming distance between the received codewords and the sent codewords is the number of bits corrupted during transmission.

If our code is to detect up to sss errors, the minimum distance between valid codewords must be s+1s + 1s+1.
Let’s assume a code


Datawords
Codewords


00
000


01
011


10
101


11
110


d(000,011)=2d(000, 011) = 2d(000,011)=2

d(000,101)=2d(000, 101) = 2d(000,101)=2

d(000,110)=2d(000, 110) = 2d(000,110)=2

d(011,101)=2d(011, 101) = 2d(011,101)=2

d(011,110)=2d(011, 110) = 2d(011,110)=2

d(101,110)=2d(101, 110) = 2d(101,110)=2

dmin=2d_{min}=2dmin=2

This code has a minimum Hamming distance of 222. And could therefore guarantee to catch an error of size 111. Since s+1=dmins + 1 = d_{min}s+1=dmin.
Linear Block codes

A code win which the exclusive OR of two valid codewords creates another valid codeword.

The minimum Hamming distance of a linear block code is the number of 1s on the nonzero valid codeword with the smallest number of 1s.
Our first code is an example of a linear block code


Datawords
Codewords


00
000


01
011


10
101


11
110


Simple parity check

In this code, a k-bit dataword is changed to an n-bit codeword where n = k + 1. This extra bit, called the parity bit, is selected to make the total number of 1s even.


Datawords
Codewords


0000
00000


0001
00011


0010
00101


0011
00110


0100
01001


0101
01010


0110
01100


0111
01111


1000
10001


1001
10010


1010
10100


1011
10111


1100
11000


1101
11011


1110
11101


1111
11110


The parity bit is generated by adding the four bits in the datawords. The result is the parity bit.

r0=a3+a2+a1+a0(modulo2)r_0 = a_3 + a_2 + a_1 + a_0  (modulo2)r0=a3+a2+a1+a0(modulo2)
The receiver adds all the bits in the received codeword. The result is the syndrome. It’s 000 when the number of 1s is even.

s0=b3+b2+b1+b0+q0(modulo2)s_0=b_3+b_2+b_1+b_0+q_0  (modulo2)s0=b3+b2+b1+b0+q0(modulo2)
Example

Let’s send 1011. We’ll encode it as 10111. There are 5 possible outcomes.

No error occurs. The received codewords is 10111. The syndrome is 0. Dataword 1011 is received.
A single-bit error occurs. 10011. The syndrome is 1. Rejected.
A single-bit error occurs. 10110 is received.  The syndrome is 1. Rejected.
Burst error. 00110 is received.  Syndrome is 0. Dataword 011 is received.
Burst error. 01011. Syndrome is 1. Rejected.

Hamming codes

In this code, the minimum hamming distance is 3.  If m = 3, then n = 7 and k = 4.
Redundant bits are generated and verified by performing three parity checks on a2+a1+a0a_2+a_1+a_0a2+a1+a0, a3+a2+a1a_3+a_2+a_1a3+a2+a1, and a1+a0+a3a_1+a_0+a_3a1+a0+a3.
Checksum

Suppose our data is a list of 4-bit numbers. In addition to these numbers, we send the sum of the numbers. For example, let’s send [7,11,12,0,6][7, 11, 12, 0 ,6][7,11,12,0,6], we will have to include [36][36][36], resulting in a final transmission of [7,11,12,0,6,36][7, 11, 12, 0, 6, 36][7,11,12,0,6,36]. The receiver adds the five numbers and compares it to the sum.
Sometimes the inverse of the sum is sent. If we use our previous example, we’d be sending [7,11,12,0,6,−36][7, 11, 12, 0, 6, -36][7,11,12,0,6,−36]. The receiver adds all the numbers. If the result is 000, there were no errors.
References

Behrouz A. Forouzan & Sophia Chung Fegan. Data Communications and Networking [4th Ed.], 2007.

Samuel Zheler. Crates [picture]. Available: unsplash.com

EMI. Trade ad for Beatles’ 1964 Grammys. — This is a version with just the Beatles isolated from the ad, 1965. Available: commons.wikimedia.org
Datawords	Codewords
0000	00000
0001	00011
0010	00101
0011	00110
0100	01001
0101	01010
0110	01100
0111	01111
1000	10001
1001	10010
1010	10100
1011	10111
1100	11000
1101	11011
1110	11101
1111	11110