Last active
December 19, 2015 02:59
-
-
Save sysr-q/5887248 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;; require'd [digest] and [base64-clj.core :as base64] | |
; This excessively large string is an SSH RSA key. | |
; You're meant to base64 decode then hash the result to get a key fingerprint. | |
(def k "AAAAB3NzaC1yc2EAAAADAQABAAABAQDt6wSbWhx/lQ1kqzqy4ET7ogSZqZngcDzaYiS8/ZWKgamkt4o9+2RebcysJT2DX/8t0Mif3jovSsUjW+6dCLY8rO0+fGMctwWL4HqAlHFgWY6xA2M4/ZLYvlm53WUKt02ygFeO9M4Fj9w9MoTRhQjS52Z6PA5OE0ppjjupvLWUp3wu23usUUWQucye50mTPBE4tZAbnh+H3w7FXTHOROsNTSuNbDYQ8pPqHy66hbJ5t2Dz5/3yTpL6mzc4rFHJPt5O8Wxlur4kzNSYlhPYsbrDoZF4lpxjdrkrE4qGxJUk48Hufr/VJ+3cafLumk7DsdVNnDeAO8lgpgXh2Hvr9ZYl") | |
(digest/md5 (base64/decode k)) ; => 8befc4d6d025ea89bd5603ea2bf7ecdd | |
; My question is this: in any other language, the above gives an output of: | |
; 287893e77d820df340d92b833e942128 | |
; I've tested (independently and together) both the md5 and base64 functions, to verify that they're not cocking up the output and spewing out random crap, but they're giving valid info | |
(digest/md5 "hello") ; => 5d41402abc4b2a76b9719d911017c592 (just like in Python) | |
(base64/decode "aGVsbG8=") ; => "hello" (just like in Python) | |
; Why is it that in all cases but the one I actually need valid output, they don't reach the right hash? | |
; The string I'm trying to decode and hash is exactly the same as it is in Python/other places, but in Clojure it refuses to hash into the md5 I want it to. ;_; | |
; Halp me dobry-den-kanobi, you're my only hope. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Change this:
to this:
I know you would only ask me this as a last resort, so here is a crash course in debugging infuriating byte/string/encoding-level which Clojure actually makes pretty simple.
Anyways, to let you know how I debugged it, the first thing I did was take a look at what
base64/decode
actually returns. REPL output when debugging this kind of stuff is misleading since it, of course, can't display bytes that can't be displayed.But
(map int x)
makes it trivial to see the actual values.The 65533 values instantly revealed that shit was cray.
Bytes of course are represented as 8 bits (00000000), so there are 256 possible combinations.
Java has signed bytes which means that the left-most bit is used to indicate +/-.
So integer values this high indicates we're seeing a round trip through UTF-8 unicode.
Which means that there's a coercion ->String somewhere.
And since the purpose of Base64 is to encode bytes->ascii and decode ascii->bytes, it doesn't make sense to cast a Base64 decode to a string. And it's impossible in a language with signed bytes. ascii table is unsigned 0-255.
Sure enough, that's what is going on here.
But the
new String()
constructor tries to coerce bytes that are represented as negative ints, and since there is no such character it replaces it with the U+FFDDD "replacement character" which, thanks to you, I just learned about (http://stackoverflow.com/questions/3526965/unicode-issue-with-an-html-title-question-mark-65533). And it's represented as 65533 in base10.I doubt you needed the history of computing but maybe you are like how I was 5 months ago (before my attempt at implementing Bitcoin -_-) and are a bit rusty.
Finally, this is actually an area where Java libraries are nice: You know exactly what you're getting and they are perf-optimized for you.
For my Bitcoin project, I used the Codec lib from Apache Commons (http://commons.apache.org/). Basically, Apache Commons solves the problem of Java stdlib hell by providing nice wrappers. Definitely look there when you want some included batteries.
For instance: