sipa/bip-witaddr.mediawiki Secret

Last active March 17, 2017 09:06

Star 3 You must be signed in to star a gist
Fork 0 You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/sipa/c291da162f6ef8cc770bfc7f015c6c49.js"></script>
Save sipa/c291da162f6ef8cc770bfc7f015c6c49 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

bip-witaddr.mediawiki

Moved here.

petertodd commented Feb 15, 2017

I'm really puzzled as to why "b" is now left out - it's very distinct in handwriting, while "2"/"z" definitely isn't.

Also, note that that NIST site says: "The algorithm, consisting of the distance measure, the scoring formula, and the character similarities are mostly just my estimations." <- e.g. there's no science behind the character similarity scores.

Author

sipa commented Feb 15, 2017

@petertodd From the data, there are potential confusions between 'b' and ['6', 'd', 'h'], and between 'B' and ['8', 'E', 'P', 'R']. I'm aware that there is nothing scientific about it, but it's the most extensive similarity data found anywhere.

gmaxwell commented Feb 16, 2017 •

edited

I don't think handwriting is at all our primary target: primary lossy target is read from a screen, out loud, hear, and type in. I think handwriting is certainly secondary. Esp since any handwritten representation will start with a screen printed step and end with typing.

The data we really want doesn't exist. I'd be happy to fund mechnical turking it (and I'm sure that many in our community would be willing to participate to create ground truth data...) but I know nothing about the relevant APIs and loathe web programming.

Expirement would be something liek showing random base36 strings (each string all upper or all lower), and ask users to type them in.. encourage them to be fast so the error rate isn't zero. Handwriting could be covered by having them write them and take pictures and someone else transcribes, but I think we should consider handwriting more out of scope. (also 2/Z is completely unambiguous handwritten if you stroke the Zs).

petertodd commented Feb 16, 2017

Hmm, I think you both made good points, so I'll accept it the way it is.

Anyway, come to think of it the real case where handwritten addresses come up is private keys - not public keys - and for that we already use safer and much more verbose encoding; I probably had that use-case (incorrectly) in the back of my head.

petertodd commented Feb 16, 2017 •

edited

@gmaxwell Can I quote you on that? Specifically, the exact phrase "if you stroke the Zs"? :P
FWIW I asked around my design friends for research on this stuff, and all I got back was some papers on OCR! They did know of similar visual simularity problems, but it sounds like in other fields it's more focused on issues like shapes of switches and knobs and the light (e.g. aviation). I wonder if part of the thinking is if you're transcribing text, all hope is lost already by their standards...

baryluk commented Feb 17, 2017 •

edited

What about use case of entering addresses by hand (without QR scanning) on ATM style machines for buying bitcoin? It might be error prone to do it correctly on a first go, especially on low quality touchscreens. I wouldn't say that speaking (i.e. over the phone), or handwriting are the only good use cases. Also having just 32 characters to choose from on a screen, instead of lower and upper cases, and full alphanumeric keyboard, would allow for much quicker input, and bigger on screen keys, reducing risk of click / touching wrong one. In fact it can apply too all touch screen based system, where we do not copy address automatically (i.e. qr scan, copy paste, nfc, etc), but get it from somewhere else (be it somebody speaking to us, or we reading it from a paper, even printed one with clear labels, etc).

I am still not sure what would be exact layout of the 32 to characters on a screen. Or maybe it should be full qwerty keyboard (with upper case characters most likely) + digits (+ backspace), and do the full ambiguity conversion in software transparently.

/me friend of sipa.

gmaxwell commented Feb 24, 2017 •

edited

You need a passing vector that begins with at least 8 zero bits in the witness hash. (maybe an all zeros one would be good)

Author

sipa commented Feb 26, 2017

@gmaxwell Added one with 3 zeroes.

gmaxwell commented Mar 3, 2017 •

edited

Immunity to easily confused non-address sequences.

Transaction IDs or other similar encoded data may be easily confused for addresses by users. This specification provides for improved resistance to this common class of confusions beyond what is provided by the checksum.

The minimal padding requirement in this specification means that no input with length under 11, over 74, or congruent to 0, 3, or 5
mod 8 can ever be mistaken as a valid Bitcoin address. This means that no common hex sequence length (8, 16, 32, 40, or 64
characters) would be accepted by this specification.

Similarly, any string with the common base64 maximum line length of 76 characters can interpreted as an address.

A short base64 encoded string (with length 12,16,20,24,28,32,36,40,44,48,52,56,60,64,68, or 72) could potentially be misinterpreted as an address, however the probability of this happening for any uniformly selected random base-64 string is never greater than 1 : 2^60 due to the improbability of matching the prefix and partial overlap of the character sets.
.....

or something like that.

gmaxwell commented Mar 7, 2017

I just supported someone today to was running into getting an invalid address while trying to send funds. Looks like his chat software was adding invisible characters which then the rpc was rejecting, but when sent to bc.i were just ignored. We might want to include a vector with such a character and have advice that UIs should do something useful in those cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment