Skip to content

Instantly share code, notes, and snippets.

@Chubek
Created April 5, 2023 05:33
Show Gist options
  • Save Chubek/22fe39609b13832626af153083f71043 to your computer and use it in GitHub Desktop.
Save Chubek/22fe39609b13832626af153083f71043 to your computer and use it in GitHub Desktop.
DJB2 Hash in Aarch64 & x86-64 Assembly

The following two files contain implementation of DJB2 hash algorithm in both Aarch64 and x86-64 Assembly languages.

What is DJB2?

DJB2 is a simple hash function. It is defined as:

unsigned long hash(char *str) {
        unsigned long hash = 5381;
        int c;
        while ((c = *str++))
            hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
            return hash;
}

It's a non-crypto hash by Daniel Bernstein. DJB2 is similar to a Linear congruential generator and uses prime numbers 5381 and 33 to generate a psuedo-random unsigned quad given a sequence of signed bytes. In most implementations of DJB2, for the sake of speed, instead of multiplication by 33, shifting and adding is used. In the following implementations I did the same. You may say that's an extra instruction, but remember that instruction != cycle. A mul instruction takes more cycles than a shift, plus and an add instruction.

Calling Convention

Both the assembly files have an entry point called jdb2_ccc. ccc, of course, stands for C Calling Convention. But not any calling convention as C has several calling conventions across platforms. Plus I did not adhere to the convention fully. I did save all the callee-save registers, and the function yields no results, rather, saves the result in the address carried by the second argument. The following table shows all the clobbered, saved, argument and result registers.

Arch Register(s) Status Usage
x86-64 R12 Callee-Saved A the result of shift
Aarch64 X5 & X6 Callee-Saved As holder of loaded byte and address to byte string
x86-64 R11 Clobbered As holder of final hash
Aarch64 X0 Clobbered As holder of final hash
x86-64 RSI Modified Arg 2, Dereferenced and final hash put into it
Aarch64 X1 Modified Arg 2, Dereferenced and final hash put into it
x86-64 RDI Preserved Arg 1, Pointer to byte string
Aarch64 X0 Clobbered Arg 1, Pointer to byte string

Judging by this table, you can decide if it's safe to use the implementations.

Why Pass a Pointer to Result?

I believe it's much safer to pass a pointer to the location which the user needs result to be saved at. The concept of 'variable' is a very volatile thing! I was experiencing issues because C, in main function, was doing bit extension and ruining the result that I was returning in X0/RAX. This made sure the result is preserved as it should. In general it's much more solid and grounded anyways. If you have any idea why the bit extension was happening, please tell me.

Methods of Usage

In C

Just make a C file like this and call it djb2.c:

#include <stdio.h>

int main()
{
    unsigned long hash = 0;
    djb2_ccc("my_null_terminated_str", &hash);
    printf("%lu\n", hash);
}

And then pass the assembly file for the arch you want AFTER the C file name to GCC or Clang:

gcc djb2.c djb2.<arch>.s -o djb2 && ./djb2

And it will compile and run the file.

As Shared Library in Python

First, create a C file like this:

unsigned long djb2_hash(char *stream)
{
    unsigned long hash = 0;
    djb2_ccc(stream, &hash);
    return hash;
}

First create a shared object like so:

gcc -c -shared -fPIC djb2.c djb2.<arch>.s -o djb2.so

And then you can use ctypes to import and use it:

from ctypes import *

djb2 = CDLL("/abs/path/to/djb2.so")

string = create_string_buffer(b"my_null_terminated_str\0")
hash = djb2.djb2_hash(byref(string))
print(hash)

By Linkage in Rust

Do the previous steps to generate the shared library. Then:

use std::ffi::{c_char, c_ulong};

extern "C" {
    fn djb2_hash(message: *const c_char) -> c_ulong;
}

fn djb2_hash() -> u64 {
    unsafe {
        const MSG_PTR: *const c_char = {
            const BYTES: &[u8] = b"my_null_terminated_message\0";
            BYTES.as_ptr().cast()
        };
        let hash = djb2_hash(MSG_PTR);

        hash as u64
    }
}

Notice that Python and Rust codes are untested! You may need to make changes to it to work.

Running Aarch64 on Non-ARM Machines

You need to use QEMU's Userspace emulators to run the Aarch64 code in non-ARM machines. There are several guides out there for this you can consult.

Disclaimer

These implementations are provided with NO WARRANTY that they will work, funcion as intended, work properly, do no clobber your registers, that the convention won't harm your flow, ZILCH. If they break, do not hold me responsible.

Comments

This is my first working assembly code. After this one I am going to implemented fixed-point integers in 6502 assembly. I am seeking comments to improve my skills. Please contact me on Discord at Chubak#7400.

Have a good day.

// Copyright 2023 Chubak Bidpaa
// Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
// The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
// THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
// The following in an implementation of DJB2 hash in x86-64 Assembly. usa GNU Assembler to compile.
// Syntax used is AT&T/GAS.
.arch generic64
.data
.equ DJB2_MAGIC, 5381
.global djb2_ccc
.text
_djb2:
movb (%rdi, %rcx, 1), %al
inc %rcx
cmpb $0, %al
je fin
movq %r11, %r12
shlq $5, %r12
addq %r12, %r11
addq %rax, %r11
jne _djb2
fin:
ret
djb2_ccc:
push %r12
movq $DJB2_MAGIC, %r11
movq $0, %rcx
call _djb2
movq %r11, (%rsi)
pop %r12
ret
// Copyright 2023 Chubak Bidpaa
// Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
// The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
// THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
// The following is an implementation of DJB2 hash in Aarch64 Assembly. Use GNU Assembly to compile.
// You can use QEMU to run it in non-ARM64 machines.
.arch armv8-a
.data
.equ DJB2_MAGIC, 5381
.global djb2_ccc
.text
_djb2:
ldrb w5, [x6], #1
cmp w5, #0
b.eq fin
lsl x1, x0, #5
add x0, x0, x1
add x0, x0, x5
b.ne _djb2
fin:
ret
djb2_ccc:
stp x5, x6, [sp, #-16]!
stp lr, x0, [sp, #-16]!
mov x0, DJB2_MAGIC
ldp x3, x6, [sp], #16
bl _djb2
mov lr, x3
mov x0, [x1]
ldp x5, x6, [sp], #16
ret
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment