learn-decentralized-systems/LEGO_spans.md

## LEGO_spans.md

      
    Raw
  

              LEGO_spans.md
            
          
    C++ spans: {*data,size} or {*b,*e} ?

Once Go has shown us the power of slices, there is no doubt that
std::span was a very serious omittance in the original C++.
Frankly, good old C codebases used the pointer+length metafor so
often, it is difficult to explain why C++ got the construct this
late. Probably, its designers considered non-owning pointers too
rough or maybe too dangerous?  An attempt to introduce a similar
concept of std::string_view was clearly a failure and it was
deprecated. At last, std::span is very widely recommended.
Here, my point is rather blunt: std::span may also be a miss.
A span as a pair of pointers is a much more powerful idiom than
a pointer+length span or an abstract span.  Why, may you ask.
These two have the same size because ptrdiff is same size as
ptr, inevitably. Also, pointer+length seems much safer to use,
because less pointers is less trouble, right? Well, the answer
might be a bit unexpected: a pair of pointers has certain
symmetry and because of that you can recombine them like LEGO
pieces. Pointer+length is not flexible in that sense; it is a
single-use construct.
The Feed/Drain metaphor

When implementing network or storage accesses, parsing or
serialization, or other kinds of buffer shoveling, we often use
circular buffers. The reason is that we stream data. We hold a
buffer, append some new data in the end, process, discard some
old data in the beginning. Let's call the former operation
feed and the latter one drain.
How does a circular buffer look like? It may look like this:
  +---consumed-space--+----filled-space---+--idle-space---+
  ^                   ^                   ^               ^
  p0                  p1                  p2              p3

It takes four pointers to describe a circular buffer. We may
also see it as three spans. Now, how do we feed the data into
the buffer? We take the idle span, put the data in the
beginning, move the p2 pointer forward. How do we drain
processed data? We take the pending span, consume the data, move
the p1 pointer forward. See the pattern? It boils down to
pointer-pointer spans and we may see a circular buffer as a
combination of three such spans!
Now consider strings, vectors, queues... same story. A string or
a vector has the allocated range and the filled range. Can we
see it as two or three concatenated spans? Sure, no problem!
So, suppose some IO routine gets some data from some IO device.
If we provide it with pointer+length, it will fill the range
with data and return us the length value so we can handle the
rest. That is pretty standard.
  ssize_t read(int fd, void *buf, size_t count); 
  // 3 params, 1 return value, also touches errno
What if it supports spans?
  Status Read(int fd, Span8& into); 
  // 2 params, returns Status
Now we may provide this method with an idle span of a circular
buffer, or string, or vector, or queue or give it an entire
on-stack array. It will handle everything nicely. It will move
the pointer forward, contracting the idle area and expanding the
pending area... without knowing much about the actual data
structure it writes the data to!
  Read(myfd, mystr.Idle());  // append to a string (feed)
  Write(otherfd, mystr.Filled()); // drain
Importantly, we halved the number of variables to handle (4>2).
Science says, we can comfortably handle about 7 items at once.
Having less exposed variables means spending less of that 7!
My only remaining question is: why was not this a part of C++98?
Everything that can be reduced to memory ranges, can be reduced
to pointer-pointer Spans!
Implementation

  template<typename R>
  struct Span<R> {
      R *p[2];
      
      size_t Length() const { return p[1] - p[0]; }
      
      Status Feed(Span& b) {
          auto l = std::min(Length(), b.Length());
          memmove(p[0], b.p[0], sizeof(R)*l);
          p[0] += l;
          b.p[0] += l;
          return OK;
      }
  };

  using Span8 = Span<uint8_t>;
  using Span8c = Span<const uint8_t>;

  class Buffer {
      uint8_t *p[4];

      Span8c& Filled() { 
          return *(Span8c*)(p+1); 
      }
      Span8& Idle() { 
          return *(Span8*)(p+2); 
      }
      const Span8c& Consumed() const { 
          return *(Span8c*)(p+0);
      }
  };
String concatenation

In higher-level languages, string concatenation is often done
by a built-in operator +. Idiomatic C++ uses string streams
and operator <<. Idiomatic Java had verbose StringBuilder.
Good old C has printf which is loved by so many and ported
to so many languages.
Let's see how we can implement that in the p-p span paradigm.
  Status FeedText(Span8& idle) {return OK;}
  template<typename... Args>
  Status FeedText(Span8& idle, const char* cstr, Args... args) {
    auto l = strlen(cstr);
    if (idle.Size()<l) return NOSPACEYET;
    memcpy((void*)idle.p[0], (void*)cstr, l);
    idle.p[0] += l;
    return FeedText(idle, args...);
  }
  template<typename... Args>
  Status FeedText(Span8& idle, int i, Args... args) {
    // ... write the int
    return FeedText(idle, args...);
  }
  
  // ...
  
  String con{};
  con.Reserve(64);
  FeedText(con.Idle(), "concatenate ", 3, " arguments"); // bingo
  
  // ...a printf equivalent would be...
  
  char str[64];
  snprintf(str, 64, "concatenate %i arguments", 3);

Note that FeedText may write to span-based buffers, strings, vectors and so on.
It is as flexible as snprintf and it does not leave the wiring exposed.
Which is pretty good.