cyx1231st/seastar_socket_reading.md Secret

## seastar_socket_reading.md

      
    Raw
  

              seastar_socket_reading.md
            
          
    Efficient reading from network socket in Seastar

Our requriements for socket read

a) The buffer can be a scatter-gather list internally.
b) The buffer requires to be contiguous.
c) The buffer should be contiguous with extra alignment and/or length requirements.
My evaluations

Evaluation(https://docs.google.com/spreadsheets/d/1WygP0QgxASzdIupQlEBX8q5m1LWxiv6IcyggLELTT1g/edit#gid=0) to compare optimal less-system-calls(prefetch) vs optimal less-copy(exact, with https://github.com/cyx1231st/seastar/commit/d00a866bbfcd78bf0d99e0c2f14930f48205ebaa):
Case 1) When block-size is smaller (<~64K): prefetch is better than the exact version due to minimum system-calls, even with some extra user-space-to-user-space copying.
Case 2) When block-size is larger (>~64K): this case, the number of system-calls would be at the similar level for both prefetch and exact version. The exact version is better than prefetch version because of no user-space-to-user-space copying.
This implies:

System-call is much more expensive than memory-copy.
When block-size is large, likely larger than prefetching, we can minimize user-space-to-user-space copying.

Proposed change with minimal impact


Current implementation of input_stream<CharType>::consume() with our bufferlist_consumer already meets our requirement a), no need to change;
We can introduce a new interface read_exactly2(size_t read_len, __le16 alignment=sizeof(void *), size_t extra_len=0) to implement our specific requirements with optimizations.

For b), data_source::get() can be reused with prefetch, if it cannot read out the entire buffer we need, we can copy the content to our own buffer and use input_stream<CharType>::read_exactly_part2() to fill the rest.
For c), we can still use data_source::get() with prefetch, and copy the content to our own buffer, if it is not the entire buffer we need, we can use input_stream<CharType>::read_exactly_part2() to fill the rest.


As for input_stream<CharType>::read_exactly_part2(), there is an implementation in https://github.com/cyx1231st/seastar/commit/d00a866bbfcd78bf0d99e0c2f14930f48205ebaa. It requires a new interface data_source_impl::get2(char* buf, size_t size) to provide input buffer from caller to the data_source_impl. In DPDK, I think it requires user-space-to-user-space copying to fill that special-allocated buffer.