Skip to content

Instantly share code, notes, and snippets.

@markdgray
Last active December 29, 2023 07:09
Show Gist options
  • Save markdgray/5b22bb610e044c6f5de6c89e3efca83b to your computer and use it in GitHub Desktop.
Save markdgray/5b22bb610e044c6f5de6c89e3efca83b to your computer and use it in GitHub Desktop.

BGP

Document outling software architecture for integration of FRR into OVN for OCP and OSP. This is written in markdown for eventually addition as an enhancement proposal.

BGP Architectural Components

As we discuss use cases, we can start to distill them down to the building blocks necessary to achieve them. Some of these may be shared across multiple products.

BGP Speaker

A component capable of publishing routes only, called out separately as this may be simpler than (b) and (c)

BGP Routing Daemon

A component capable of both sending and receiving routes.

BGP Route Programming

(likely tied to (b)) The piece that knows how to take routes and program them. So far we see needs for programming both Linux routing, and OVN routing.

Project Specific Integrations

Project integrations -- project/product specific integrations of a/b/c above.

Software Components

The following are initial proposals of how OCP/OSP/OVN and FRR components could interact. This is assuming that our implementation will require the ability to publish routes to BGP peers and consume routes from BGP peers.

We will need to iterate over this to determine if it makes sense.


                               -----------------------+
                               |  +---------------+   |
                               |  |               |   |
                  +---------------+   OVN NB DB   |   |
                  |            |  |               |   |
                  |            |  +---------------+   |
                  |            |  +---------------+   |
                  |            |  |               |   |
                  |            |  |   ovn+northd  |   |
                  |            |  |               |   |
                  |            |  +---------------+   |
                  |            |  +---------------+   |
                  |            |  |               |   |
                  |            |  |   OVN SB DB   |   |
                  |            |  |               |   |
                  |            |  +---------------+   |
                  |            +----------------------+
                  |                       |    |
                  |                       |    |
            +-----+---+                   |    +-------------------+
            |         |                   |                        |
            | OSP/OCP |                   |                        |
            |         |        +---------------------------------------------+
            +---------+        |   +----------------+              |         |
                               |   |                |              |         |
                               |   | FRR Controller |    +---------v------+  |
                               |   |                |    |                |  |
                               |   +--------+-------+    | OVN Controller |  |
                               |            |            |                |  |
                               |            |            +----------------+  |
                               | +-------------------+   +----------------+  |
+------------+                 | |FRR +----------+   |   |                |  |
|            |                 | |    |          |   |   |  ovsdb+server  |  |
|  BGP Peer  <------------------------>   bgpd   |   |   |                |  |
|            |                 | |    |          |   |   +----------------+  |
+------------+                 | |    +----------+   |   +----------------+  |
                               | |    +----------+   |   |                |  |
                               | |    |          |   |   |  OVS vswitchd  |  |
                               | |    |   zebra  |   |   |                |  |
                               | |    |          |   |   +--------+-------+  |
                               | |    +--------+-+   |            |          |
                               | |             |     |            |          |
                               | |    south    |     |            |          |
                               | +-------------------+            |          |
                               +---------------------------------------------+
                                               |                  |
                               +---------------------------------------------+
                               |               |                  |          |
                               |  +------------v---+     +--------+-------+  |
                               |  |                |     |                |  |
                               |  | Kernel Routing |     | openvswitch.ko |  |
                               |  |                |     |                |  |
                               |  +----------------+     +----------------+  |
                               +---------------------------------------------+

The above proposal attempts to achieve:

  1. A single BGP stack, delivered as part of the platform and consumed by the layered products (OSP, OCP, OCP on OSP).
  2. Initial focus on FRR in the platform.
  3. FRR has southbound and northbound APIs, anything south of the Northbound API is going to be shared between the products (including integration with OVN as needed).
  4. The consumption of the Northbound API will be product specific.
  5. If possible we would like to have a shared integration layer (utility/project/other) on top of the FRR northbound API.

The basic behavior is as follows:

  • OSP/OCP configures the northbound FRR interface in order to advertise BGP routes to BGP peers
  • BGP peers publish new routes which get consumed by FRR. Zebra will then configure Linux Networking and OVN appropriately via its southbound interface.

Components

zebra

FRR is implemented as a number of daemons that work together to build the routing table. These daemons talk to zebra, which is the daemon that is responsible for coordinating routing decisions and talking to the dataplane. zebra implements a plugin architecture that allows integration with different plaform-dependent (southbound) forwarding planes (plugins).

zebra (platform-dependent components)

Zebra uses platform-dependent code to interface with the underlying (southbound) forwarding planes. (e.g. Linux Kernel Networking)

On Linux, FRR installs routing decisions into the OS kernel, allowing the kernel networking stack to make the corresponding forwarding decisions.

bgpd

'bgpd' is the routing daemon responsible for BGP. It works with 'zebra' to coordinate routing decisions with other daemons before installing routes on the dataplane. It will be the main component to interface with BGP peers for capability negotiation and route exchange and contain the BGP Protoco logic

bgpd includes a mode that will automatically advertise kernel routes to bgp peers subject to filters.

bgpd.conf

Provides the configuration for an FRR BGP instance on a host. For example, this will need to specify at a minimum:

  • router bgp: ASN for local BGP instance
  • interface : Interfaces managed by the local BGP instance
  • neighbor : IP adddress and ASN for a remote BGP Peer. There may be multiples of these.
  • network : IP network that can announced to BGP peers

This can be configured through a configuration file, vtysh (a command-line utility provided with FRR), or through an experimental (northbound) gRPC interface. An example configuration file for enabling BGP between two nodes can be seen here. This would set up two BGP peers on the same AS and exchange a "network" between each node.

host1:

hostname <hostname host1>
password zebra
router bgp 7675
 network <published network e.g. 192.168.1.0/24>
 neighbor <IP host 2> remote-as 7675

host2:

hostname <hostname host2>
password zebra
router bgp 7675
 network <published network e.g. 192.168.2.0/24>
 neighbor <IP host 1> remote-as 7675

After configuring FRR it will be possible to add a loopback address (from a published network range) and ping that address from the other node as BGP will have automatically added the necessary routes.

For example,

host 1:

ip addr add 192.168.1.1 dev lo0

host 2:

ping 192.168.1.1

On either node, it is possible to see status information about BGP by running the following commands:

$ sudo vtysh

Hello, this is FRRouting (version 7.6-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

wsfd-netdev91.ntdv.lab.eng.bos.redhat.com# show bgp summary # shows information about connected peers
wsfd-netdev91.ntdv.lab.eng.bos.redhat.com# show bgp detail # shows information about networks

Note: The terminology here is a little confusing. Loopback address refers to an IP address added to the loopback device not a loopback address from the 127.0.0.0/8 address range.

FRR gRPC

FRR provides an experimental YANG-based (northbound) gRPC interface to allow configuration of FRR by generating language-specific bindings.

This interface is experimental. What does this mean:

  • The implementation on the current stable release stable/7.4 does not currently work, giving an "assert()" error. However, on "master", it is possible to successfully start the gRPC server for a daemon. This suggests that the feature is currently in active development.
  • Configuration is not well-documented. It also requires a recent version of libyang (not available in F32) that can be compiled and installed from source.
  • There is documentation for how to use the gRPC interface but only for the Ruby programming language. Although it should be possible to generate bindings across most languages (e.g. Python, Golang) but not C (only C++).
  • I managed to generate Python bindings and hack together a PoC that worked to some extent, allowing the client to read BGP configuration. The steps are documented below.

From this, it appears some effort would be required in order to productise this interface for use. However, the other option for configuring FRR programmatically would be to write to a configuration that gets reloaded on changes, or write commands to the FRR CLI vtysh. Both of which should be sufficient for our needs.

In order to configure it, the following instructions can be followed to develop Python bindings to the Northbound FRR configuration interface.

Instructions/Notes to enable the gRPC interface for Python:

sudo dnf install git autoconf automake libtool make \
 readline-devel texinfo net-snmp-devel groff pkgconfig json-c-devel \
 pam-devel python3-pytest bison flex c-ares-devel python3-devel \
 python3-sphinx perl-core patch systemd-devel libcap-devel

sudo dnf install cmake
git clone https://github.com/CESNET/libyang.git
pushd libyang
mkdir build; cd build
cmake -DENABLE_LYD_PRIV=ON -DCMAKE_INSTALL_PREFIX:PATH=/usr\
  -D CMAKE_BUILD_TYPE:String="Release" ..
make
sudo make install

popd
git clone https://github.com/FRRouting/frr
cd frr

dnf install libyang-devel grpc-devel grpc grpc-plugins
./configure --enable-grpc
make
sudo make install

cd grpc
mkdir python
cd python
# Generate Python bindings
python -m grpc_tools.protoc -I../ --python_out=. --grpc_python_out=. ../frr-northbound.proto

sudo pip install --upgrade protobuf
sudo dnf install python3-protobuf.noarch


$ cat frr_test.py 
import frr_northbound_pb2_grpc
import frr_northbound_pb2
import grpc

channel = grpc.insecure_channel('localhost:50051')
stub = frr_northbound_pb2_grpc.NorthboundStub(channel)

request = frr_northbound_pb2.GetCapabilitiesRequest()
print(stub.GetCapabilities(request))

request = frr_northbound_pb2.GetRequest()
request.type =  0
request.encoding = 1
request.path.extend(['/frr-interface:lib'])

response = stub.Get(request)

for item in response:
    print(item)


$ python frr_test.py 
frr_version: "7.6-dev"
supported_modules {
  name: "frr-bgp"
  organization: "FRRouting"
  revision: "2019-12-03"
}
supported_modules {
  name: "frr-filter"
  organization: "FRRouting"
  revision: "2019-07-04"
}
supported_modules {
  name: "frr-interface"
  organization: "FRRouting"
  revision: "2020-02-05"
}
supported_modules {
  name: "frr-route-map"
  organization: "FRRouting"
  revision: "2019-07-01"
}
supported_modules {
  name: "frr-routing"
  organization: "FRRouting"
  revision: "2019-08-15"
}
supported_modules {
  name: "frr-vrf"
  organization: "Free Range Routing"
  revision: "2019-12-06"
}
supported_encodings: JSON
supported_encodings: XML

timestamp: 1602663958
data {
  encoding: XML
  data: "<routing xmlns=\"http://frrouting.org/yang/routing\">\n  <control-plane-protocols>\n 
  ..
  ..
  ..
  ..
  

FRR Controller

This design requires a component on the host monitoring OVN (OVN SB) for changes and then configuring FRR in reponse to those changes.

Use Cases

For the primary use cases, we can explore the above architecture to check its suitability. Initially focus on "External Service Load Balancing" and "Exposing Pods or Services Directly" as these seem to have the biggest pull from customers and will require our BGP components to publish and consume routes. They are also the least complex to implement.

Exposing Pods or Services Directly (Priority 1) [WIP]

                                                          +----------+        +----------+       +----------+
                                                          |          |        |          |       |          |
+---------------------------------------------------------> BGP Peer +--------> BGP Peer +-------> BGP Peer |
|                                                         |   (RR)   |        |          |       |          |
|   FRR sends BGP UPDATE message to peer                  +-----+----+        +-----+----+       +-----+----+
|   specifying:                                                 |                   |                  |
|   x.x.x.x/32 next hop is a.a.a.a/32                           |                   |                  |
|   y.y.y.y/32 next hop is b.b.b.b/32                           |                   |                  |
|   z.z.z.z/32 next hop is c.c.c.c/32                           |                   |                  |
|                                                               |                   |                  |
|  +-------------------------------------------+                |                   |                  |
|  |Host +---------+             +---------+   |          +-----v----+        +-----v----+       +-----v----+     +----------+
|  |     |         | Pod IP =    |         |   |          |          |        |          |       |          |     |          |
+--------+   FRR   |             | SERVICE +<-------------+  Router  +--------+  Router  +-------+  Router  <-----+  Client  |
|  |     |         | x.x.x.x/32  |         |   |          |          |        |          |       |          |     |          |
|  |     +---------+             +---------+   |          +----------+        +----------+       +----------+     +----------+
|  +-------------------------------------------+
|                       Loopback IP = a.a.a.a/32                                                                  
|
|  +-------------------------------------------+
|  |Host +---------+             +---------+   |          +----------+
|  |     |         | Pod IP =    |         |   |          |          |
+--------+   FRR   |             | SERVICE |   |          |  Router  |
|  |     |         | y.y.y.y/32  |         |   |          |          |
|  |     +---------+             +---------+   |          +----------+
|  +-------------------------------------------+
|                       Loopback IP = b.b.b.b/32
|
|  +-------------------------------------------+
|  |Host +---------+             +---------+   |          +----------+
|  |     |         | Pod IP =    |         |   |          |          |
+--------+   FRR   |             | SERVICE |   |          |  Router  |
   |     |         | z.z.z.z/32  |         |   |          |          |
   |     +---------+             +---------+   |          +----------+
   +-------------------------------------------+
                        Loopback IP = c.c.c.c/32           e.g. Leaf Router   e.g. Spine Router   e.g. DC Gateway




[Open] For exposing services, how will we do port translation? BGP won't allow that.

This would depend if we were using shared gateway mode or local gateway mode. shared gateway seems a little easier as we may not need to integrate with the linux networking stack

External Service Load Balancing (Priority 2)

Others

L3 Redundancy for Nodes

Pod Network Control Plane

Pod Network Traffic Routing and Avoiding Encapsulation

IP Anycast

Virtual Network Interconnect

L3 Fabric

@odivlad
Copy link

odivlad commented Jul 20, 2022

Hi @markdgray, I'm interested in OVN & BGP integration, did you have any progress/success with this proposal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment