2019-09-15
RE: httpwg/http-extensions#913 (and 790, and 629, ...)
I've been thinking about data types and models again. My premise is that a type is a combination of a range/domain of values, and a set of operations that can be performed on those values.
A 'token' has one operation: identity comparison. But we're not going to do just that with our tokens, because implementation specs are going to say things like "case-insensitive" (so we have textual operations like toupper/tolower/casecmp), or they're going to say things like "if it starts with 'text/' default to utf-8" (so we have substring operations like split/match). So I think what we currently call a 'token' is going to be treated as a string-without-quotes.
Then there's the recurring "in this location you could find a string or
a token", implying that there's no semantic difference between the two.
(Aside: I'm still keen to see one of these, BTW. I can't imagine how it
could exist and not be resolved by using sh-item
instead.)
Further, the argument that reintroduced tokens (nee identifiers) in httpwg/http-extensions#629 seems to have been more about aesthetics than types. (Aside: Are domain names in origins compared case-insensitively? Because if so, that counts against using tokens to carry origins.)
I don't think underspecifying an immature concept is going to help us in any future revisions or extensions. We should either make strings and tokens serialisations of the same underlying data type*, or take a hard stance on what a token is and where/how it should be used**.
* I think that means: rename sh-token
so it's not exposed as an sh-
construct and alternate its ABNF in with sh-string
's, get rid of 3.7
Tokens, and get rid of 4.1.7. Serializing a Token, plus some editorial
stuff.
** I think that means: explaining why token and string aren't the same thing, and why you have to expose specific API hooks to convert your language's native string type to one vs the other, and why you shouldn't have said those things I mentioned above about case or substrings or whatever.