Skip to content

Instantly share code, notes, and snippets.

@tsunghanlin
Last active March 12, 2022 09:15
Show Gist options
  • Save tsunghanlin/4746891e5b77b505b92d6a07cb8c8c04 to your computer and use it in GitHub Desktop.
Save tsunghanlin/4746891e5b77b505b92d6a07cb8c8c04 to your computer and use it in GitHub Desktop.
Kernel Networking related Notes

Structure Note

  • include/linux/skbuff.h: struct sk_buff
    • Timing of creation:
      • When applications pass data from socket
      • Packets arrive at NIC (dev_alloc_skb())
    • Contain real data's information and pointer to the real data
    • used by all the network layers
    • when packet is passing UP through layers, old headers are not removed.
      • Only pointer is moved to the current layer's header part
    • real packet content/data contains:
      • pointers to headers and data
      • slb_shared_info structure
      • head, data, tail, end: pointers to the real packet/data area
        • (head - data): head room
        • (data - tail): actual data
        • (tail - end): tail room
      • struct skb_shared_info
        • appended at the end of each real data area
        • including a dataref: increased when skb_clone().
        • frag_list: fragmented IP packets. Chained skb_buff objects.
        • skb_frag_t: used to store packet data that are not allocated conitnously.
          • Need NIC and its driver to support it.
          • page: the page that stores the data
          • offset: the real data that are stored in the offset of the certain page.
  • net/core/dev.c
    • adapter indenpendent
  • include/lunux/netdevice.h: struct net_device
    • abstracted information for the adapter(a network device)
    • contains both hardware and software configurations for the device
    • using alloc_netdev() to allocate an instance
    • using register_netdev() to register a network device
      • which in turns call netdev_register_kobject() to create sysfs entries
    • dev_hard_start_xmit(): remove a packet from wait queue and start to transfer it
  • per-CPU wait queue: struct softnet_data(include/linux/netdevice.h)

Work Flow and APIs

sk_buff creation

  1. Use alloc_skb() to create sk_buff
  2. Then use skb_reserve() to reserve for specific protocol headers. Used by each network layer to reserve header information's space. Also used by device drivers to align IP header of input frame. (Advance data and tail pointer to make room for headers.)
  3. Use skb_put() to make room for the real data. (Advance only tail pointer)
  4. memcpy() the real data into the data area.
  5. skb_push(): add new header information. (Decrease data pointer, because the added header will be treated as normal data at the lower network layer)
  6. Send out packet: dev_queue_xmit() (net/core/dev.c)
  7. skb_pull(): remove a header. (Advance data pointer)

Packet Transmission in Network Access Layer

Traditional Way - Interrupt

  1. device driver registers interrupt handler: request_irq(xxx, INTERRUPT_HANDLER);
  2. inside interrupt handler, device specified net_rx() is called to create sk_buff(netdev_alloc_skb())
  3. netif_rx()(net/core/dev.c) is then called. It is an interface method that works between HW-specified part and universal part. It also puts the packet on a per-CPU wait queue.
  4. Soft-irq is handled in net_rx_action()(net/core/dev.c)
  5. netif_receive_skb(): deliver to higher layer

Support for High-speed Interface - NAPI

  1. Only the first packet will cause an interrupt
  2. Then Rx interrupt is disabled in the driver
  3. The adapater is placed on a poll list
  4. Kernel polls the device to process packet until there is no more
  5. Re-enable Rx irq
  • Conditions
    • Adpaters must have DMA-ring or place to hold multiple packets
    • be able to disable Rx interrupt
    • must provide poll() function
  • API: netif_napi_add()

Network Layer - IP

  • Entry point: netif_receive_skb() -> ip_rcv()
  • if the packet is to local -> do ip_local_deliver() to further deliver to the upper netwrok layer handling.
    • Check whether packets are fragmented and if so, do queueing and re-assemble -> ip_defrag()
  • if the packet is to another host -> ip_forward()

Routing

Account for large parts of kernel's networking codes.

  • Route to local host
    • pass to protocol layer
  • Route to a system that connects directly to the local host
    • find out a proper network card
  • Need to go through gateway system
    • find the gateway system and the network card that associated with it

Routing API

  • Entry point -> ip_route_input()
    • call fib_lookup, (fib: forwarding information base), to obtain the necessary information stored in fib_result

Link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment