Formula to count the number of servers possible in a Clos = N^2/2
Formula to count the number of switches needed in a Clos = N+N/2
- Virtual chassis - Uniform latency
- Pod based - 2 types of latency (within pod Vs inter pod)
Virtual chassis are better suited for homogeneous applications like say FB Pod based Clos are more suited for Hyperscale cloud service providers (CSP)
Open source Cabling verification can be done using perscriptive topology manager (PTM)
Drawbacks Hashing is done using outer header. Sourceport of outer header is filled with checksum of inner header flow. LISP/VXLAN follow this Additional process at NIC level Increased MTU Lack of visibility - traceroute
A process running in the kernel is identified by different types of namespaces. In the case of containers, these namespaces of virtualized.
Types of namespacs: cgroups: /proc/pid/cgroups /proc/pid/mountinfo etc are virutalized to ensure process view is abstracted Network: Helps create network interface and provides ability to connect to the outside world such as interface, socket, routing table, mac table, etc., PID: Helps makes the process within the container to think it is running in kernel like a regular process, by default there will be a PID 1 created within the container. User: Helps provide which user can access/execute process within container, container can have its own root IPC: Helps enable communication of process within the container through standard libraries like POSIX msgQs, shared memory, semaphores Mount: Virtualizes the filesystem mounts, typically enabled using chroot UTS: Helps virtualize hostname and domain name for the container.
Docker creates interfaces within the network name space (netns) using virutal interfaces (veth). veths are typically created in pairs, with each end of the veth in different namespace to enable communication between containers or to the outside world. e.g. communication between 2 NS within a host. NS1 (veth1) ---------- (veth2) NS2
No network host network Single-host network Multihost network
Dockers create a bridge called docker0, every container created within the host are created with one veth in single host network mode with one end connected to docker0 and another connected to containerns.
Docker uses default subnet of 172.17.0.0/16 for docker0 bridge and assigns 172.17.0.1 to the bridge itself. Any communication from docker0 to outside world undergoes NAT using iptables
Another mode to enable single host network connectivity to outside world is using MacVlan. in this mode, each container is assigned a virutal MAC with OUI - 02:42:ac and assigned an IP address using DHCP like regular ethernet interface. However communication between macvlan interface and host require hairpinning of traffic. With Docker, inter maclvan communication can happen without traffic hairpinnign. There is no NAT in this model.
L2 or L3 communication between containers across host has implications IPAM across hosts
This creates 2 bridges in the host. One for VTEP communication across the hosts. This provides a view that containers are residing within the same L2 network. The new bridge in addtion to docker0 is called docker_gwbridge
Disable NAT, run a routing protocol instance using FRR (ospf/bgp) to advertise docker subnet across hosts. Calico follows this model.
Open Update Keepalive Notification Route Refresh
Tweaks are needed to adopt BGP in DC compared to how BGP is traditionally used in ISPs
Advertisement interval --> DCs prefer this to be 0 instead default value of 30 Keepalive & Hold timers --> Can be reduced to 1 & 3 instead of 60 & 180, Also can enable BFD Connect timer --> Can be reduced to 10 seconds from 60 seconds
Unnumbered interface for physical interfaces are obtained through the following steps.
- Use IPv6 link local address (LLA) on an interface as IP address. FRR send/expect BGP connect message via this LLA.
- Through IPv6 router advertisement, neighbor discovery is ensured.
- Using RFC 5549 capability in BGP, i.e. Advertise IPv4 NLRI over IPv6 BGP neighbor with IPv6 nexthop. This capability is called "Extended next hop"
- With this capability, MAC address of neigbor is automatically known using RA message and packet forwarding can be facilitated with just IPv4 address space.
- Show command will replace IPv6 Next hop IP to IPv4 Next hop using 169.254.0.0/16 subnet with Static ARP.
RD is an eight byte value that is added to every virtual network address to keep the address globally unique. There are 3 different types of RDs. The format used in EVPN is of 64 bits length in below format. Though VNI is 3 byte length usually, it is assumed No virtual network is more than 64000 VNI long and no silicon supports so many VNIs.
Type (2bytes) | Device Loopback (4 bytes) | VNI ID (2 bytes)
By utilizing device loopback IP in RD, no 2 device in virutal network is expected to have same RD. RD is encoded as part of NLRI in the MP_REACH_NLRI AND MP_UNREACH_NLRI
RT encodes the virtual network the prefix belongs to. Advertising router will use a specific RT called 'export RT'. A BGP speaker receiving and using advertisement uses this RT to decide which local vnet to add the routes. This is called 'import RT'.
Format of RT looks like as follows...
ASN (2 bytes) | A (bit) | Type (3 bit) | Domain ID (4 bit) | Service ID (3 byte)
A - Auto or manually derived Type - Vlan (0) or Vxlan (1) Domain ID - Typically 0, used to resolve conflicts in case of any overlap in Vxlan ID in the administrative domain.
FRR supports auto derivation of RT via 'route-target import auto'
Typically non-IPv4 route types are advertised via MP_REACH_NLRI AND MP_UNREACH_NLRI attributes. For most AFI/SAFI combinations, structure and contents are carried in UPDATE message is same across the AFI/SAFI. This is not the case with EVPN. In EVPN, there is a need to advertise MAC, IP Prefix, unicast or Multicast prefix etc., EVPN NLRI consists of differet route types
RT1 - Ethernet segment auto discovery -- Supports multihomed endpoints. (MLAG alternative)
RT4 - Designated forwarder -- Ensures only a single VTEP forwards to BUM to multihomed endpoints
RT2 - MAC, VNI, IP -- Adv. reachablity to specific MAC address in vnet & its IP address
RT3 - VNI/VTEP association -- Adv. VTEP's interest in virtual networks
RT5 - IP prefix, VRF -- Adv. IP prefixes and VRF associated with the prefix.
RT6 - Mcast group membership -- Contains information about mcast groups an VTEP is interested in.
Figure 4
![Capture](https://user-images.githubusercontent.com/7877753/81032246-ac8a7500-8e7e-11ea-8097-87f607904f1c.JPG)