YektaDev/Summary 3, 4, 5 - Computer Networking (A Top-down Approach).md

## Summary 3, 4, 5 - Computer Networking (A Top-down Approach).md

      
    Raw
  

              Summary 3, 4, 5 - Computer Networking (A Top-down Approach).md
            
          
    "Computer Networking: A Top-down Approach" by Jim Kurose & Keith W. Ross (8th edition)
A Partial (and Informal!) Summary of Chapters 3, 4, and 5
Ali Khaleqi Yekta - Jun 2023

📗 Chapter 3 - Transport Layer

Sockets are in App layer.
When they reach Transport layer, it merges (multiplexes) all of them & adds its own header (headers are responsible to find the correct socket).
Packet Segments:

src & dest IP
src & dest Port
transport layer

UDP Packets w same dest IP&PT & different src all go to the same socket.
  | UDP only cares about dest Port
 /| TCP cares about both src&dest Port & IP
<-Same concepts.
 \| UDP Socket diffs by:         DestPort
  | TCP Socket diffs by: SrcPort,DestPort,SrcIP,DestIP


TCP provides a byte stream abstraction.
UDP provides a message abstraction.

(de)mux happens at all layers!
UDP Segment format

Spoiler: It's way simpler than what TCP uses!
     32-bit
/-------|--------\
==================
|SRC PT | DEST PT|
------------------
| Len   | CHCKSM |
------------------
|     Payload    |
==================

Checksumming!


The same checksum in both sides isn't a PROOF that the packet is not changed.
Sender treats the content of UDP as a stream of 16-bit ints.
Add carry to the result if there's any.
sum => then "check-sum": Invert bits of the sum.
One flipped bit (in one of the two numbers) always results in a changed checksum.

UDP is no frills, meaning:


😶‍🌫️ Packets can get lost.
🔀 Packets can arrive out of order.
🤲 It's best effort (Send and hope for the best!).


All that can go wrong by packets:

😶‍🌫️ loss
🔀 reordering
🥀 corruption

RDT

OK           BROKE         OK     OK        OK
NOTHING     CORRUPTION    SEQ#  NAKFREE    LOSS
1.0       ->  2.0      -> 2.1 ->  2.2   -> 3.0

RDT 1.0: Reliable channel!

FSM: sender: 1 state      receiver: 1 state
RDT 2.0: Bit Errors => Stop&Wait + ACK + NAK + Checksum

FSM: sender: 2 states     receiver: 1 state
PROBLEM: ACK & NAK can be corrupted too! => Solution: Add seq#

RDT 2.1

FSM: sender: 4 states     receiver: 2 states
RDT 2.2: NAK-free (similar to TCP)

RDT 3.0: Add loss => Timeouts

This is a reliable & stable level BUT, the utilization is a nightmare!
Increase RDT Performance

Pipelining

x-packet pipelining increases utilization x times!
Go-Back-N

Sender


Window of up to N
Cumulative ACK (TCP uses cumulative ACK)
timeout(n): GO BACK N! Retransmit from n...

Receiver


Always send ACK of the latest in order received seq #
Only needs to remember rcv_base (The first needed seq #)
On out-of-order: buffer or don't buffer, I don't care about your imp!

Selective Repeat

Window starts at latest not ACKed/not Received packet.
Sender


Timer for each unACKed pkt

Receiver


Individual ACK for each pkt
Buffers out-of-order pkts

A FATAL FLAW: If all acks of the window are sent but not received, sender sends 012, but receiver thinks they're 345!

=> It can corrupt data!!!

The only solution is to always use a seq# space that is >= 2x window size

TCP


Point-to-point
Reliable
Byte stream (no msg boundary)
Bi-directional (Full-duplex)
MSS: Max Segment Size
Cumulative ACKs
Pipelining
Connection-oriented
Flow Control

seq # is byte stream #
ACKs seq # is the next EXPECTED byte, not the one we have!
TCP didn't specified handling out-of-order segments (again, they don't care about imp!)
Fundamentally Go-Back-N
Formulas:

EstimatedRTT = (α-1)*EstimatedRTT + α*SampleRTT
DevRTT [SKIPPED!]
TimeoutInterval = EstimatedRTT + 4*DevRTT (4 * DevRTT is just a safety margin)

TCP Fast retransmit (An optimization applied to TCP)

When sender sends, it waits to receive ACK and if it times out, it sends again. Now, if the sender is sending multiple pkts, and gets 3 duplicate ACKs, it doesn't wait for the timeout, because it KNOWS 3 duplicate ACKs indicate a loss, so it retransmits sooner instead of waiting for the timeout.
Flow Control (Layer 2 - Transport)


App layer asynchronously reads from transport buffer since it might be slower than the transport buffer
Transport buffer is limited and can be filled if sender keeps sending!
Flow control means avoiding this scenario, which is done by "Advertising" the buffer size of receiver to the sender to let it adjust itself.

2-way Handshake

Half-open connection

Client says I_WANNA_TALK
...
[X] Client says I_WANNA_TALK
Server says OK_LETS_TALK
[They talk and say goodbye...]
[X] Server says OK_LETS_TALK
But they already talked and the client is gone. So there'll be a half-open connection!

→ OR
Client says I_WANNA_TALK
...
[X] Client says I_WANNA_TALK
Server says OK_LETS_TALK
Client says I_LIKE_YOU
[Y] Client says I_LIKE_YOU
[They talk and say goodbye...]
[X] Server says OK_LETS_TALK
But they already talked and the client is gone. So, there'll be a half-open connection!
[Y] Server receives I_LIKE_YOU
+ Like the prev example, but there's also duplicate data!


So, problems with 2-way handshake:


Half-open connection
Dup data

This is why TCP uses:
3-way handshake

Opening a connection


Client sends SYN pckt
Server sends SYNACK [up until now it's a 2-way handshake]
Client sends ACK for SYNACK


👱‍♀️ On belay?
🧔‍♂️ Belay on!
👱‍♀️ Climbing...

Closing a connection


A sends FIN
B sends FINACK
A sends ACK for FINACK

Congestion & Congestion Control (Layer 3 - Network)

Congestion means a busy network.
Cost of Congestion: More Congestion!!!
Congestion Control Approaches


End-End congestion control: Transport knows nothing of the congestion since network is a different layer; but, the least it can do is infer congestion in network by observations (loss, delay)


Direct communication of network and transport to indicate congestion (IMPLEMENTED IN NON-IP NETWORKS).
It was complex, so it wasn't baked into the internet. BUT, there's an ECN (Explicit Congestion Notification) flag in TCP that does a similar thing.


[SKIPPED_STUFF!]
TCP Challenges

TCP is old and it wasn't made with some of the current world's situations in mind:

WIFI (loss over WIFI is also treated as congestion!)
Low latency environments like data centers
High latency connections like satellites
Encryption
...

📙 Chapter 4 (Data Plane) & 5 (Control Plane) - Network Layer

It is consisted of control plane & data plane.
This layers and below layers all exist in every single internet device.
Functions


Routing (Routers) => Determine route of pkts from src to dst (Like planning for a trip)
Forwarding (Routers & Switches) => Move pkt from input link to output link

Data plane: per-router, determines how pkt goes from input port to output port

Control plane: network-wide, determines how pkt is routed from src to dst

Two approaches:


Traditional routing algorithms: Implemented in routers
SDN: Implemented in servers

In traditional approach, a router includes both a data and a control plane.
In SDN, a remote controller computes and installs forwarding tables in routers.
There was different service models. IP was best-effort and it survived for the internet. Because of:

Cost
Complexity

Routers


All ports are most likely both an input and an output port. But we don't consider this for simplicity.
Queuing must happen somewhere, either at input ports or at output ports.
Queuing in output ports is better, because it allows more utilization by avoiding HOL blocking at inputs.

Lookup table and forwarding mechanism are implemented at hardware level to enable working at high speed (1 clock cycle). They leverage TCAMs for this purpose (Longest Prefix Matching) which is a tiny but expensive piece of hardware allowing the lookup to be done against all table data concurrently (hence being able to operate at a single clock cycle).
Content addressable: Presents address to TCAM
Forwarding

It is done in 2 ways:
Destination-based (Traditional): Only based on dst ip address.


Uses Longest Prefix Matching

Generalized: Based on any of headers.

Switching Fabrics

There are 3 types:

Memory: Cheap, built like computers. Each pkt crosses the memory bus twice (they get copied into memory).
Bus: One pkt at any given time, gets hot, needs lots of power, expensive.
Interconnection network: Like bus, but multiple pkts can pass in parallel, hence doesn't need that high clock rate. More complex.
Examples: Crossbar, Multistage Switch

Large queues are not good, they cause "Bufferbloat", which causes high latency & jitter, and decreases throughput.
Buffering

Output better than input.

Marking for congestion control happens in buffer control.

Average Formula Recommended by RFC3439: [SKIPPED!]

More Recent Formula: [SKIPPED!]

Pkt scheduling


FCFS
Priority
Round Robin
Weighted Fair Queueing (Combines the idea of Priority & RR)

Switches are layer-2 devices! Meaning they don't have IP address & they're transparent to the network layer.
Subnets

Device interfaces that can directly talk to each other without needing to pass a router are all in a subnet.
Requirement: Sharing the prefix of the IP address (Same subnet mask)
Each isolated network is called a subnet.
How do we get an IP upon each connection to the Internet? DHCP.

A normal day for a DHCP Server:
   🧙‍♂️ DHCP Server   👴 CLIENT
        |                 |
        |            I want an IP.
        |                 |
Do you want X?            |
        |                 |
        |       YES, That's it. Give me that X!
        |                 |
X Is now yours. Enjoy!    |
        |                 |


A DHCP request may contain the IP.
The transaction ID in a DHCP request message is used to identify the client.
A DHCP request message is sent broadcast.

[SKIPPED_STUFF!]


NAT is transparent.


NATs purpose is to make up for the shortage of IPv4 addresses.


It has a translation table.


WAN side addr <=> LAN side addr


NAT is layer 3 but also manipulates layer 4, which causes problems. HOWEVER, NAT is everywhere👀


IPv6


Initial motivation: To have more space than 32-bit IPv4 (128-bit)
Fixed len of 40-bytes for headers.
Has flows

vs IPv4


NO CHECKSUM
NO FRAGMENTATION
NO OPTIONS

Why isn't IPv6 everywhere?

Some use IPv4, some use IPv6, some routers support both!
When IPv6 needs to communicate with IPv4, Tunneling comes into play. The entire IPv6 packet becomes IPv4s payload.

Tunneling reduces MTU (maximum transmission unit)

SDN & OpenFlow

Flow table uses exact match, not longest prefix match! This section is deliberately excluded.
Routing Algorithms

Link State, Distance Vector
Link State (Dijkstra's Link-state routing algorithm)


Centralized, network topology, link costs known to all nodes
Distributed with "link state broadcast"
Generates the shortest path for one node to all other nodes
If it ties in algorithm, it doesn't matter what you choose.
O(n^2) =[More efficient implementation]=> O(nlogn)
Message complexity: O(n^2)
It's a bad idea to include the congestion in link cost, because congestion also depends on the routing and this causes "route oscillations". The congestion flops between two states!

Example of the table for executing the algorithm:
| Step | N' | D(v),p(v) | D(w),p(w) | D(x),p(x) | D(y),p(y) | D(z),p(z) |
|   0  | u  | 2,u       | 5,u       | 1,u       | Inf       | Inf       |
|   1  | ux | 2,u       | 4,x       |           | 2,x       | Inf       |
   ...   ..   ...         ...         ...         ...         ...       


Cx,y = Cost of x to neighbor y, else Inf
D(v) = Total lowest cost up until a particular step
p(v) = The neighbor that has the path with the lowest estimated cost up until a particular step
N' = Explored nodes

Distance Vector (DV)


Decentralized, only requires knowledge of the neighbors
Based on Bellman-Ford equation (DP)
The algorithm needs the exact number of iterations as the networks diameter.
Network diameter: Size of shortest path to get to the furthest node.

Algorithm:

Dx(y) = min of in each neighbor (called v) of x { Cx,v + Dv(y) }


Good news travels fast, bad news travels slow! DISTRIBUTED ALGORITHMS ARE TRIKY!
Bad news takes a lot more iterations than good news.

[SKIPPED_STUFF!]
🛣️ Types of Routing

inter-AS: Between ASes - Policy is the main focus

intra-AS: Inside ASes - Performance is the main focus


Decentralized routing is iterative.
Centralized routing allows all routers to have the complete topology and link costs.
Routes change slower in static routing.
Routes change faster in dynamic routing.

intra-AS Routing Protocols

☠️ RIP


Uses Distance Vector (hence it's distributed)
☠️ Dead simple, each router is a hop, all costs are 1.
Advertisement every 30s
If no advertisement after 180s => neighbor declared dead
Infinite distance = 16 hops

OSPF


Uses Link State (hence it's centralized)

Advanced features that are not in RIP


Security
Multiple same-cost paths allowed (RIP allows 1)
For each link, multiple cost metrics for different TOS
Hierarchical OSPF in large domains

Two-level hierarchy backbone -> local area
Contains Boundary routers, Backbone routers, Area routers


IS-IS


Nearly identical to OSPF 😂

inter-AS Routing Protocols

BGP


The de facto inter-domain routing protocol
"Glue that holds the Internet together."
Origined from DV
It's more about Policy than about Performance.
BGP's TCP connection is semi-permanent.
Transit traffic is not profitable. ASes try to avoid routing transit traffic.

BGP Messages


OPEN:

👋 To open a connection


UPDATE:

✏️ To add or delete a reachable node


KEEPALIVE:

👻 To say "HEEY! I'M STILL BREATHING!"
👍 To ACK an OPEN


NOTIFICATION:

🔚 To close a connection
❌ To send an error


It gives each AS the following:
eBGP

To see neighbor’s reachable subnets.
iBGP

To spread neighbors’ reachable subnets info to internal routers.
Determine good routes to other networks based on: reachability, policy

BGP route selection


Policy
Shortest AS-PATH
Closest NEXT-HOP (Hot potato)
Additional Criteria 😕

[SKIPPED_STUFF_LIKE_BROADCAST_AND_MULTICAST!] 👽