-
A libfabric component. The main component files are:
- The connect_libfabric file: It contains all the implementations of the libfabric calls that are called by the libfabric component. It’s the biggest file because it has all the libfabric logic.
- The fflibfabric file: It contains the initialization and the finalization APIs.
- The recv and send files: They contain the posting of the send and receive operations through libfabric.
- The libfabric progresser file: It contains the methods used by the progresser to track the operation completion. The ffop_libfabric_progresser_progress method is where the receive and send completions queues are polled.
-
A libfabric binding file: Binds the fflib APIs with the libfabric APIs to run FFLIB with the libfabric backend.
-
3 test files running on 2 nodes:
- send_recv_libfabric test: One sends some arbitrary data and the other node receives it.
- pingpong_sched_libfabric test: The two nodes posts a FFLIB schedule. Each schedule contains a send and a receive. So each node send data and receives data like pingpong.
- allreduce_libfabric test
The test send_recv_libfabric goes like this:
- Initializing FFLIB and binding the FFLIB APIs with libfabric backend to be used.
- Creating a progresser thread that tracks the completions of the operations.
- Initializing the libfabric connection between the 2 nodes. Every node allocates the needed resources i.e, fabric, domain, memory region, endpoints, etc… and then establishes a connection with the other node through sockets.
- The sender puts the data into a buffer from the memory region that it has allocated with the endpoint. Then the send operation is posted in libfabric and scheduled in FFLIB.
- The receiver gets a buffer from the memory region allocated with the endpoint and posts a receive operation to reserve the buffer for receiving incoming data. The receive operation is also scheduled in FFLIB.
- Then each node polls for the completion of the operation. The sender polls the send completion queue while the receiver polls for the receive completion queue.
- When the send operation is completed, the sender’s role is now finished. On the other hand, the receiver waits for the receive completion. Then after it receives the message, the message is checked to see if the message content is what is sent by the sender or not.
- Finally the 2 nodes free the resources that they have allocated and exit.