Skip to content

Instantly share code, notes, and snippets.

@learn-decentralized-systems
Last active July 16, 2022 06:28
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save learn-decentralized-systems/279fbb5552c216c5f8dba8b3d9c16190 to your computer and use it in GitHub Desktop.
Save learn-decentralized-systems/279fbb5552c216c5f8dba8b3d9c16190 to your computer and use it in GitHub Desktop.

C++ spans: {*data,size} or {*b,*e} ?

Once Go has shown us the power of slices, there is no doubt that std::span was a very serious omittance in the original C++. Frankly, good old C codebases used the pointer+length metafor so often, it is difficult to explain why C++ got the construct this late. Probably, its designers considered non-owning pointers too rough or maybe too dangerous? An attempt to introduce a similar concept of std::string_view was clearly a failure and it was deprecated. At last, std::span is very widely recommended.

Here, my point is rather blunt: std::span may also be a miss. A span as a pair of pointers is a much more powerful idiom than a pointer+length span or an abstract span. Why, may you ask. These two have the same size because ptrdiff is same size as ptr, inevitably. Also, pointer+length seems much safer to use, because less pointers is less trouble, right? Well, the answer might be a bit unexpected: a pair of pointers has certain symmetry and because of that you can recombine them like LEGO pieces. Pointer+length is not flexible in that sense; it is a single-use construct.

The Feed/Drain metaphor

When implementing network or storage accesses, parsing or serialization, or other kinds of buffer shoveling, we often use circular buffers. The reason is that we stream data. We hold a buffer, append some new data in the end, process, discard some old data in the beginning. Let's call the former operation feed and the latter one drain.

How does a circular buffer look like? It may look like this:

  +---consumed-space--+----filled-space---+--idle-space---+
  ^                   ^                   ^               ^
  p0                  p1                  p2              p3

It takes four pointers to describe a circular buffer. We may also see it as three spans. Now, how do we feed the data into the buffer? We take the idle span, put the data in the beginning, move the p2 pointer forward. How do we drain processed data? We take the pending span, consume the data, move the p1 pointer forward. See the pattern? It boils down to pointer-pointer spans and we may see a circular buffer as a combination of three such spans!

Now consider strings, vectors, queues... same story. A string or a vector has the allocated range and the filled range. Can we see it as two or three concatenated spans? Sure, no problem!

So, suppose some IO routine gets some data from some IO device. If we provide it with pointer+length, it will fill the range with data and return us the length value so we can handle the rest. That is pretty standard.

  ssize_t read(int fd, void *buf, size_t count); 
  // 3 params, 1 return value, also touches errno

What if it supports spans?

  Status Read(int fd, Span8& into); 
  // 2 params, returns Status

Now we may provide this method with an idle span of a circular buffer, or string, or vector, or queue or give it an entire on-stack array. It will handle everything nicely. It will move the pointer forward, contracting the idle area and expanding the pending area... without knowing much about the actual data structure it writes the data to!

  Read(myfd, mystr.Idle());  // append to a string (feed)
  Write(otherfd, mystr.Filled()); // drain

Importantly, we halved the number of variables to handle (4>2). Science says, we can comfortably handle about 7 items at once. Having less exposed variables means spending less of that 7!

My only remaining question is: why was not this a part of C++98? Everything that can be reduced to memory ranges, can be reduced to pointer-pointer Spans!

Implementation

  template<typename R>
  struct Span<R> {
      R *p[2];
      
      size_t Length() const { return p[1] - p[0]; }
      
      Status Feed(Span& b) {
          auto l = std::min(Length(), b.Length());
          memmove(p[0], b.p[0], sizeof(R)*l);
          p[0] += l;
          b.p[0] += l;
          return OK;
      }
  };

  using Span8 = Span<uint8_t>;
  using Span8c = Span<const uint8_t>;

  class Buffer {
      uint8_t *p[4];

      Span8c& Filled() { 
          return *(Span8c*)(p+1); 
      }
      Span8& Idle() { 
          return *(Span8*)(p+2); 
      }
      const Span8c& Consumed() const { 
          return *(Span8c*)(p+0);
      }
  };

String concatenation

In higher-level languages, string concatenation is often done by a built-in operator +. Idiomatic C++ uses string streams and operator <<. Idiomatic Java had verbose StringBuilder. Good old C has printf which is loved by so many and ported to so many languages. Let's see how we can implement that in the p-p span paradigm.

  Status FeedText(Span8& idle) {return OK;}
  template<typename... Args>
  Status FeedText(Span8& idle, const char* cstr, Args... args) {
    auto l = strlen(cstr);
    if (idle.Size()<l) return NOSPACEYET;
    memcpy((void*)idle.p[0], (void*)cstr, l);
    idle.p[0] += l;
    return FeedText(idle, args...);
  }
  template<typename... Args>
  Status FeedText(Span8& idle, int i, Args... args) {
    // ... write the int
    return FeedText(idle, args...);
  }
  
  // ...
  
  String con{};
  con.Reserve(64);
  FeedText(con.Idle(), "concatenate ", 3, " arguments"); // bingo
  
  // ...a printf equivalent would be...
  
  char str[64];
  snprintf(str, 64, "concatenate %i arguments", 3);

Note that FeedText may write to span-based buffers, strings, vectors and so on. It is as flexible as snprintf and it does not leave the wiring exposed. Which is pretty good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment