zeux/codec-versioning.md

## codec-versioning.md

      
    Raw
  

              codec-versioning.md
            
          
    Problem

meshoptimizer currently ships with index and vertex codecs with the API along these lines:
size_t encode(destination, destination_capacity, source, source_size)
void decode(destination, destination_size, source, source_size)

The next version of meshoptimizer needs to make backwards incompatible changes to the encoding format to allow for better compression.
The format has a version byte, so decode signature doesn't need to change. However, how would encode know which version to encode?
It's tempting to always encode the new format. However, this can lead to issues with version upgrades: upgrading meshoptimizer version
will have to happen first for the code that runs decoding (so that it supports new format), and only then for the code that runs
encoding (to use new format). When the code has different release cadence etc. it becomes easy to accidentally encode new format
before it can be decoded by all clients.
Solution 1: explicit encoding version

One solution is to change encode signature to force the user to pick the version:
size_t encode(destination, destination_capacity, source, source_size, int version)

This sidesteps the deployment issues since the calling code is now in control. However, this makes the API less intuitive - assuming
the version changes very rarely, this requires all users to use an extra parameter.
What should this parameter be set to? This would mean we need some sort of MESHOPTIMIZER_INDEX_CODEC_LATEST_VERSION macro if we want
to avoid users having to know which version is the latest. However, this, again, means that during source upgrades the code by
default silently adopt the new format - and the cost of this is having each user deal with an extra argument.
This could be slightly prettier with an enum argument, although the names of the enum variants would probably be "IndexCodec2" et al.
Solution 2: separate function for version-aware encoding

We can keep the use of the encoding functions simple if we introduce two variants:
size_t encode(destination, destination_capacity, source, source_size)
size_t encodeVersion(destination, destination_capacity, source, source_size, int version)

Where encode uses the latest version, but encodeVersion uses the specified version. That way users who care about stability
of the encoded format can use encodeVersion, and users who don't can automatically gain benefits of the new format when it ships.
However, if an application already used encode there's no way for it to know that it might use a format that isn't supported by
old decoders...
Solution 3: all format upgrades require new functions

If we want a solution that is maximally robust we can have separate encoding functions for format upgrades. That way, when the
application is written, it can use the latest function available; library changes don't change the application behavior:
size_t encode(destination, destination_capacity, source, source_size)
size_t encode2(destination, destination_capacity, source, source_size)
size_t encode3(destination, destination_capacity, source, source_size)
... etc

This is somewhat similar to solution 1 but it remains backwards compatible even if the original version of the API didn't include
version, and allows to add extra arguments to tune encoding for later format versions.
It also means that FFI interfaces don't need to use enums/#defines out of meshoptimizer.h, although this is probably not a big deal
in practice?
However, meshoptimizer right now is at version 0; since we need to bump the format, and the new format is superior to the old format, this will lead to a situation where encode should not be used in practice. Since it's the most obvious name to reach out for, why should future users pay for past legacy?
Solution 4: global encoding version

One way to decouple the interface from the version specification here is to have a way to set the encoding version globally separately.
The defaults for the version can slowly increase following updates to the formats, but applications that need this can set the version
once at startup.
size_t encode(destination, destination_capacity, source, source_size)
void setEncodeVersion(int version)

This means that users that care about version stability can call the version setter; by default the version can either always use
the latest codec.
This doesn't let us easily add extra encoding parameters, but keeps the simple use cases separate from more complex use cases on
the API level.
???

I'm really unsure which option is best here.
On one hand, I'd like to keep the library interface really simple. Version parameters don't seem like something users should care about by default. Global version setting is a way out but feels gross / like a hack.
On another hand, any application that uses the default version will eventually face a situation when the library update bumps the
version up - is it reasonable to protect the users against this at the cost of an extra API parameter?
I'm currently leaning towards option 4 because it seems like the "new encoder, old decoder" situation is comparatively rare.