dstillman/schema-updates.md Secret

## schema-updates.md

      
    Raw
  

              schema-updates.md
            
          
    Proposal:


The schema version becomes an abridged semver: [major].[minor]. A major version change means something in code needs to be changed to support the new data (to the extent that we think it matters — we could decide that something wasn't a big deal if slightly misformatted). Minor version changes are safe to apply — they might add a field to something, but not in a way that's expected to cause any problems.


A given client version comes bundled with a given schema version, and it can fetch minor schema updates for its major version, on daily checks or when it sees a Zotero-Schema-Version header when starting a sync. This would be client-based, not server-based, so for the daily check we'd make schema versions available as /schema/4 or something (which might serve version 4.2). If a sync returned Zotero-Schema-Version and it was a higher major version, the client would ignore it.


When uploading data to a library, the client would include its current schema version in a Zotero-Schema-Version request header, and the server would store the version with each library if it was a known schema version (to avoid mischief) and greater than the currently stored version for that library.


When starting a sync for a library, the client would check the Zotero-Library-Schema-Version header, which would be the stored version for that library, meaning it's the highest-possible version for data in that library. If it was higher than the client's current schema version, the client would stop syncing that library and say that syncing that library required a newer version of the app. (If it was only a higher minor version, it might mean that the minor schema update (from Zotero-Schema-Version at the start of the main sync process) failed for some reason, and it could just show a temporary error rather than saying the client needs to be updated.)


Unknown properties shouldn't ever happen under this scheme, so they would cause the object download to fail. The items would be retried on a backoff schedule (in case there was a server-side problem) or after an upgrade (in case there was a client bug), as they already are now (at least in the desktop app).


Issues:


Just because a client with a given schema version writes to a library, it doesn't necessarily mean the data is incompatible, but we have no good way of knowing that, so it requires a sync cut-off. (I think it would be crazy for the API to start comparing the data to all past schema versions, for example.)


This cut-off would happen even when we added new object types (e.g., annotations) that an older client wouldn't try to download anyway because it didn't know about them. (I think this would, in fact, mean that there was no reason to track library versions separately for different object types, as Michal said he was doing, becuase the client wouldn't even try to sync the library if it didn't support the new object type.)


This only partly solves the beta problem. It means that we can make a new major version available on the server and also bundle it with a beta, which is necessary for testing new sync-dependent features (good), but if the beta writes to a library, no non-beta clients will be able to sync with that library (bad).


We'd still want as much as possible in the schema, to minimize major versions. So as Michal says, item type image URLs (of various sizes) should be in there, and we'd want to think about other things that might help avoid major versions.


Bonus Proposal:


The best way to keep the cut-off from affecting too many people would be to roll out app versions that could support a new major schema version but that didn't expose the associated functionality in the UI until they were offered the new major schema version from the API. That would let us remotely turn on features after most users had upgraded to a compatible version. Unfortunately, the semver approach on its own prevents that, because it means the client, rather than the server, decides which clients to send a new major version to. (Doing it server-side also wouldn't be very nice to unofficial clients.)


A hybrid approach could be to do semver but also set a maximum major version in the client that it could upgrade to if available, and hide features until the major version was offered. So if the client had schema 2.4 but it had a maxSchemaVersion of 3, it would check /schema/3 before checking /schema/2, and only use /2 if /3 was a 404. Similarly, Zotero-Schema-Version from the API would offer a comma-separated list of the latest available version for each major version, and if the client with 2.4 and maxSchemaVersion of 3 saw that a 3.2 was available, it could upgrade to that and expose the hidden functionality.


We would test this by dropping in a 3.2 schema file locally and/or by adding 3.2 to apidev responses.


It's a little weird to turn on functionality remotely — and it does increase the chances of a bug that suddenly appeared even though someone hadn't upgraded (perhaps purposely) — but I think it'd be the best way to minimize sync cut-offs.


I'm not sure if Apple has some app store rule against enabling new functionality like this.


The whole idea of ever getting a message that says you need to upgrade to sync is sort of unpleasant, and a major departure from our historical practice (where we didn't cut off anything for many years and then cut off 4.0 only after 5.0 had been out for about a year), but I don't see a better option, and this last part would at least keep most regular upgraders from seeing such a message, at least when we went to the trouble of adding forward-compatibility.