Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save YektaDev/74dda9372812ea475cfe12870acca65b to your computer and use it in GitHub Desktop.
Save YektaDev/74dda9372812ea475cfe12870acca65b to your computer and use it in GitHub Desktop.
A partial (and informal!) summary of chapters 3, 4, and 5 of "Computer Networking: A Top-down Approach" by Jim Kurose & Keith W. Ross
"Computer Networking: A Top-down Approach" by Jim Kurose & Keith W. Ross (8th edition)
A Partial (and Informal!) Summary of Chapters 3, 4, and 5
Ali Khaleqi Yekta - Jun 2023

📗 Chapter 3 - Transport Layer

Sockets are in App layer. When they reach Transport layer, it merges (multiplexes) all of them & adds its own header (headers are responsible to find the correct socket).

Packet Segments:

  • src & dest IP
  • src & dest Port
  • transport layer

UDP Packets w same dest IP&PT & different src all go to the same socket.

  | UDP only cares about dest Port
 /| TCP cares about both src&dest Port & IP
<-Same concepts.
 \| UDP Socket diffs by:         DestPort
  | TCP Socket diffs by: SrcPort,DestPort,SrcIP,DestIP
  • TCP provides a byte stream abstraction.
  • UDP provides a message abstraction.

(de)mux happens at all layers!

UDP Segment format

Spoiler: It's way simpler than what TCP uses!

     32-bit
/-------|--------\
==================
|SRC PT | DEST PT|
------------------
| Len   | CHCKSM |
------------------
|     Payload    |
==================

Checksumming!

  • The same checksum in both sides isn't a PROOF that the packet is not changed.
  • Sender treats the content of UDP as a stream of 16-bit ints.
  • Add carry to the result if there's any.
  • sum => then "check-sum": Invert bits of the sum.
  • One flipped bit (in one of the two numbers) always results in a changed checksum.

UDP is no frills, meaning:

  • 😶‍🌫️ Packets can get lost.
  • 🔀 Packets can arrive out of order.
  • 🤲 It's best effort (Send and hope for the best!).

All that can go wrong by packets:

  • 😶‍🌫️ loss
  • 🔀 reordering
  • 🥀 corruption

RDT

OK           BROKE         OK     OK        OK
NOTHING     CORRUPTION    SEQ#  NAKFREE    LOSS
1.0       ->  2.0      -> 2.1 ->  2.2   -> 3.0

RDT 1.0: Reliable channel!

FSM: sender: 1 state receiver: 1 state

RDT 2.0: Bit Errors => Stop&Wait + ACK + NAK + Checksum

FSM: sender: 2 states receiver: 1 state

PROBLEM: ACK & NAK can be corrupted too! => Solution: Add seq#

RDT 2.1

FSM: sender: 4 states receiver: 2 states

RDT 2.2: NAK-free (similar to TCP)

RDT 3.0: Add loss => Timeouts

This is a reliable & stable level BUT, the utilization is a nightmare!

Increase RDT Performance

Pipelining

x-packet pipelining increases utilization x times!

Go-Back-N

Sender

  • Window of up to N
  • Cumulative ACK (TCP uses cumulative ACK)
  • timeout(n): GO BACK N! Retransmit from n...

Receiver

  • Always send ACK of the latest in order received seq #
  • Only needs to remember rcv_base (The first needed seq #)
  • On out-of-order: buffer or don't buffer, I don't care about your imp!

Selective Repeat

Window starts at latest not ACKed/not Received packet.

Sender

  • Timer for each unACKed pkt

Receiver

  • Individual ACK for each pkt
  • Buffers out-of-order pkts

A FATAL FLAW: If all acks of the window are sent but not received, sender sends 012, but receiver thinks they're 345!

=> It can corrupt data!!!

The only solution is to always use a seq# space that is >= 2x window size

TCP

  • Point-to-point
  • Reliable
  • Byte stream (no msg boundary)
  • Bi-directional (Full-duplex)
  • MSS: Max Segment Size
  • Cumulative ACKs
  • Pipelining
  • Connection-oriented
  • Flow Control

seq # is byte stream # ACKs seq # is the next EXPECTED byte, not the one we have! TCP didn't specified handling out-of-order segments (again, they don't care about imp!) Fundamentally Go-Back-N

Formulas:

  • EstimatedRTT = (α-1)*EstimatedRTT + α*SampleRTT
  • DevRTT [SKIPPED!]
  • TimeoutInterval = EstimatedRTT + 4*DevRTT (4 * DevRTT is just a safety margin)

TCP Fast retransmit (An optimization applied to TCP)

When sender sends, it waits to receive ACK and if it times out, it sends again. Now, if the sender is sending multiple pkts, and gets 3 duplicate ACKs, it doesn't wait for the timeout, because it KNOWS 3 duplicate ACKs indicate a loss, so it retransmits sooner instead of waiting for the timeout.

Flow Control (Layer 2 - Transport)

  • App layer asynchronously reads from transport buffer since it might be slower than the transport buffer
  • Transport buffer is limited and can be filled if sender keeps sending!
  • Flow control means avoiding this scenario, which is done by "Advertising" the buffer size of receiver to the sender to let it adjust itself.

2-way Handshake

Half-open connection

Client says I_WANNA_TALK
...
[X] Client says I_WANNA_TALK
Server says OK_LETS_TALK
[They talk and say goodbye...]
[X] Server says OK_LETS_TALK
But they already talked and the client is gone. So there'll be a half-open connection!

→ OR

Client says I_WANNA_TALK
...
[X] Client says I_WANNA_TALK
Server says OK_LETS_TALK
Client says I_LIKE_YOU
[Y] Client says I_LIKE_YOU
[They talk and say goodbye...]
[X] Server says OK_LETS_TALK
But they already talked and the client is gone. So, there'll be a half-open connection!
[Y] Server receives I_LIKE_YOU
+ Like the prev example, but there's also duplicate data!

So, problems with 2-way handshake:

  • Half-open connection
  • Dup data

This is why TCP uses:

3-way handshake

Opening a connection

  1. Client sends SYN pckt
  2. Server sends SYNACK [up until now it's a 2-way handshake]
  3. Client sends ACK for SYNACK

👱‍♀️ On belay?

🧔‍♂️ Belay on!

👱‍♀️ Climbing...

Closing a connection

  1. A sends FIN
  2. B sends FINACK
  3. A sends ACK for FINACK

Congestion & Congestion Control (Layer 3 - Network)

Congestion means a busy network.

Cost of Congestion: More Congestion!!!

Congestion Control Approaches

  • End-End congestion control: Transport knows nothing of the congestion since network is a different layer; but, the least it can do is infer congestion in network by observations (loss, delay)

  • Direct communication of network and transport to indicate congestion (IMPLEMENTED IN NON-IP NETWORKS). It was complex, so it wasn't baked into the internet. BUT, there's an ECN (Explicit Congestion Notification) flag in TCP that does a similar thing.

[SKIPPED_STUFF!]

TCP Challenges

TCP is old and it wasn't made with some of the current world's situations in mind:

  • WIFI (loss over WIFI is also treated as congestion!)
  • Low latency environments like data centers
  • High latency connections like satellites
  • Encryption ...

📙 Chapter 4 (Data Plane) & 5 (Control Plane) - Network Layer

It is consisted of control plane & data plane. This layers and below layers all exist in every single internet device.

Functions

  • Routing (Routers) => Determine route of pkts from src to dst (Like planning for a trip)
  • Forwarding (Routers & Switches) => Move pkt from input link to output link

Data plane: per-router, determines how pkt goes from input port to output port

Control plane: network-wide, determines how pkt is routed from src to dst

Two approaches:

  • Traditional routing algorithms: Implemented in routers
  • SDN: Implemented in servers

In traditional approach, a router includes both a data and a control plane. In SDN, a remote controller computes and installs forwarding tables in routers.

There was different service models. IP was best-effort and it survived for the internet. Because of:

  • Cost
  • Complexity

Routers

  • All ports are most likely both an input and an output port. But we don't consider this for simplicity.
  • Queuing must happen somewhere, either at input ports or at output ports.
  • Queuing in output ports is better, because it allows more utilization by avoiding HOL blocking at inputs.

Lookup table and forwarding mechanism are implemented at hardware level to enable working at high speed (1 clock cycle). They leverage TCAMs for this purpose (Longest Prefix Matching) which is a tiny but expensive piece of hardware allowing the lookup to be done against all table data concurrently (hence being able to operate at a single clock cycle).

Content addressable: Presents address to TCAM

Forwarding

It is done in 2 ways:

Destination-based (Traditional): Only based on dst ip address.

  • Uses Longest Prefix Matching

Generalized: Based on any of headers.

Switching Fabrics

There are 3 types:

  • Memory: Cheap, built like computers. Each pkt crosses the memory bus twice (they get copied into memory).
  • Bus: One pkt at any given time, gets hot, needs lots of power, expensive.
  • Interconnection network: Like bus, but multiple pkts can pass in parallel, hence doesn't need that high clock rate. More complex. Examples: Crossbar, Multistage Switch

Large queues are not good, they cause "Bufferbloat", which causes high latency & jitter, and decreases throughput.

Buffering

Output better than input.

  • Marking for congestion control happens in buffer control.

Average Formula Recommended by RFC3439: [SKIPPED!]

More Recent Formula: [SKIPPED!]

Pkt scheduling

  • FCFS
  • Priority
  • Round Robin
  • Weighted Fair Queueing (Combines the idea of Priority & RR)

Switches are layer-2 devices! Meaning they don't have IP address & they're transparent to the network layer.

Subnets

Device interfaces that can directly talk to each other without needing to pass a router are all in a subnet. Requirement: Sharing the prefix of the IP address (Same subnet mask)

Each isolated network is called a subnet.

How do we get an IP upon each connection to the Internet? DHCP.

A normal day for a DHCP Server:

   🧙‍♂️ DHCP Server   👴 CLIENT
        |                 |
        |            I want an IP.
        |                 |
Do you want X?            |
        |                 |
        |       YES, That's it. Give me that X!
        |                 |
X Is now yours. Enjoy!    |
        |                 |
  • A DHCP request may contain the IP.
  • The transaction ID in a DHCP request message is used to identify the client.
  • A DHCP request message is sent broadcast.

[SKIPPED_STUFF!]

  • NAT is transparent.

  • NATs purpose is to make up for the shortage of IPv4 addresses.

  • It has a translation table.

  • WAN side addr <=> LAN side addr

  • NAT is layer 3 but also manipulates layer 4, which causes problems. HOWEVER, NAT is everywhere👀

IPv6

  • Initial motivation: To have more space than 32-bit IPv4 (128-bit)
  • Fixed len of 40-bytes for headers.
  • Has flows

vs IPv4

  • NO CHECKSUM
  • NO FRAGMENTATION
  • NO OPTIONS

Why isn't IPv6 everywhere?

Some use IPv4, some use IPv6, some routers support both!

When IPv6 needs to communicate with IPv4, Tunneling comes into play. The entire IPv6 packet becomes IPv4s payload.

  • Tunneling reduces MTU (maximum transmission unit)

SDN & OpenFlow

Flow table uses exact match, not longest prefix match! This section is deliberately excluded.

Routing Algorithms

Link State, Distance Vector

Link State (Dijkstra's Link-state routing algorithm)

  • Centralized, network topology, link costs known to all nodes
  • Distributed with "link state broadcast"
  • Generates the shortest path for one node to all other nodes
  • If it ties in algorithm, it doesn't matter what you choose.
  • O(n^2) =[More efficient implementation]=> O(nlogn)
  • Message complexity: O(n^2)
  • It's a bad idea to include the congestion in link cost, because congestion also depends on the routing and this causes "route oscillations". The congestion flops between two states!

Example of the table for executing the algorithm:

| Step | N' | D(v),p(v) | D(w),p(w) | D(x),p(x) | D(y),p(y) | D(z),p(z) |
|   0  | u  | 2,u       | 5,u       | 1,u       | Inf       | Inf       |
|   1  | ux | 2,u       | 4,x       |           | 2,x       | Inf       |
   ...   ..   ...         ...         ...         ...         ...       
  • Cx,y = Cost of x to neighbor y, else Inf
  • D(v) = Total lowest cost up until a particular step
  • p(v) = The neighbor that has the path with the lowest estimated cost up until a particular step
  • N' = Explored nodes

Distance Vector (DV)

  • Decentralized, only requires knowledge of the neighbors
  • Based on Bellman-Ford equation (DP)
  • The algorithm needs the exact number of iterations as the networks diameter.
  • Network diameter: Size of shortest path to get to the furthest node.

Algorithm:

Dx(y) = min of in each neighbor (called v) of x { Cx,v + Dv(y) }
  • Good news travels fast, bad news travels slow! DISTRIBUTED ALGORITHMS ARE TRIKY!
  • Bad news takes a lot more iterations than good news.

[SKIPPED_STUFF!]

🛣️ Types of Routing

inter-AS: Between ASes - Policy is the main focus

intra-AS: Inside ASes - Performance is the main focus

  • Decentralized routing is iterative.
  • Centralized routing allows all routers to have the complete topology and link costs.
  • Routes change slower in static routing.
  • Routes change faster in dynamic routing.

intra-AS Routing Protocols

☠️ RIP

  • Uses Distance Vector (hence it's distributed)
  • ☠️ Dead simple, each router is a hop, all costs are 1.
  • Advertisement every 30s
  • If no advertisement after 180s => neighbor declared dead
  • Infinite distance = 16 hops

OSPF

  • Uses Link State (hence it's centralized)

Advanced features that are not in RIP

  • Security
  • Multiple same-cost paths allowed (RIP allows 1)
  • For each link, multiple cost metrics for different TOS
  • Hierarchical OSPF in large domains
    • Two-level hierarchy backbone -> local area
    • Contains Boundary routers, Backbone routers, Area routers

IS-IS

  • Nearly identical to OSPF 😂

inter-AS Routing Protocols

BGP

  • The de facto inter-domain routing protocol
  • "Glue that holds the Internet together."
  • Origined from DV
  • It's more about Policy than about Performance.
  • BGP's TCP connection is semi-permanent.
  • Transit traffic is not profitable. ASes try to avoid routing transit traffic.

BGP Messages

  • OPEN:
    • 👋 To open a connection
  • UPDATE:
    • ✏️ To add or delete a reachable node
  • KEEPALIVE:
    • 👻 To say "HEEY! I'M STILL BREATHING!"
    • 👍 To ACK an OPEN
  • NOTIFICATION:
    • 🔚 To close a connection
    • ❌ To send an error

It gives each AS the following:

eBGP

To see neighbor’s reachable subnets.

iBGP

To spread neighbors’ reachable subnets info to internal routers.

Determine good routes to other networks based on: reachability, policy

BGP route selection

  1. Policy
  2. Shortest AS-PATH
  3. Closest NEXT-HOP (Hot potato)
  4. Additional Criteria 😕

[SKIPPED_STUFF_LIKE_BROADCAST_AND_MULTICAST!] 👽

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment