Skip to content

Instantly share code, notes, and snippets.

@Chubek
Last active February 15, 2024 13:09
Show Gist options
  • Save Chubek/3ba77870e3a81ac46a5988e47be083e6 to your computer and use it in GitHub Desktop.
Save Chubek/3ba77870e3a81ac46a5988e47be083e6 to your computer and use it in GitHub Desktop.
DJB2 Hash Function in Aarch64/ARM64 & x86-64 Assembly

Intro

DJB2 is a very simple, yet powerful, hash function. I am currently implementing DEFLATE + Zlib in Rust. DEFLATE needs a hash function to keep track of duplicate strings. The fastest way we can do this without SIMD is impelementing it in Assembly.

DJB2 hash function, as defined in http://www.cse.yorku.ca/~oz/hash.html, is:

    unsigned long
    hash(unsigned char *str)
    {
        unsigned long hash = 5381;
        int c;

        while (c = *str++)
            hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

        return hash;
    }

Operation

DJB2 relies on a magic number, 5381. It simply multiplies prime number 33 with this magic number and adds every single byte of the message to it.

The message size for my use is ony 3 bytes. However you can expand upon it based on your needs. There are two files in this Gist:

1- Aarch64/ARM64 DJB2 for 3 Bytes (djb2.aarch64.asm) 2- x86-64/AMD64 DJB2 for 3 bytes (djb2.amd64.asm)

In both these implementations, we will hash the message "djb" encoded in ASCII.

If we check with od, we will see that:

$ echo -e -n 'djb' | od -t x1
0000000 64 6a 62

...the ASCII bytes for "djb" in decimal are 100, 106 and 98 respectively and a quick trip to the serpant will tell us what the final value should be:

$ python3 -c "h=5381;[exec(f'h=h*33+{c}', globals()) for c in b'djb'];print(h)"

193489493

Disclaimer

These are very simple implementations, nothing to write home about. USE THEM AT YOUR OWN RISK. I am not responsible for future-proofness, performance, or functionality of this code. I am using it not for a piece of software I am paid for, rather, a project I am making for educational and recreational reasons. You've been warned.

;; Fair License
;; 2023 Chubak Bidpaa
;; Usage of the works is permitted provided that this instrument is retained with the works, so that any entity that uses the works is notified of this instrument.
;; DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY.
;; NOTICE: Some assemblers may not recognize comments of this style
;; We store all our values on volatile registers because they are cleared out by Rust's calling convention
;; We just need a double as the maximum the digest can be is 193663362 --- 28bits, so let's use w registers
data:
mov w8, #0x64 ;; letter 'd' in ASCII codec
mov w9, #0x6a ;; letter 'j' in ASCII codec
mov w10, #0x62 ;; letter 'b' in ASCII codec
mov w0, #0x1505 ;; djb2 magic number, carries result of the hash operation also
djb2:
lsl w1, w0, #5 ;; hash << 5 stored in w1
add w0, w1, w0 ;; (hash << 5) + hash stored in w0
add w0, w0, w8 ;; value of w8 (d in this case) added, we have ((hash << 5) + hash) + c
;; Repeat the same operation for the other two bytes with w0 serving as the result
lsl w1, w0, #5
add w0, w1, w0
add w0, w0, w9
lsl w1, w0, #5
add w0, w1, w0
add w0, w0, w10 ;; w0 now holds our digest (193489493)
@mkxto
Copy link

mkxto commented Feb 15, 2024

isn't it hash * 32 + c instead of 33?

@mkxto
Copy link

mkxto commented Feb 15, 2024

isn't it hash * 32 + c instead of 33?

mb, there is a + hash!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment