Skip to content

Instantly share code, notes, and snippets.

@Chubek
Last active April 28, 2023 15:25
Show Gist options
  • Save Chubek/ef006014a105f90bc6ac654ff0b10cd5 to your computer and use it in GitHub Desktop.
Save Chubek/ef006014a105f90bc6ac654ff0b10cd5 to your computer and use it in GitHub Desktop.
PRNG subroutine for x86-64 & Hash subroutine for Aarch64

The following gist contains two Assembly files, one x86-64 in AT&T syntax and the other, an Aarch64 Assemly code. It also has a C file expressing how the two subroutiens may be used, as the subroutines in both files are callable in C. Keep in mind that both .S files are annotated with comments on the very top of the file.

Intro and Disclaimer

These subroutines were mainly authored for the sake of education. My cousin passed away yesterday and I'm upset so I wanted to keep busy. Please do not use these two subroutines in any serious manner unless you have scrutinized them well and you are sure they will endow you with the necessary functionality.

Note on the Assembler Macros and Diretives

Note that both these files contain GNU Assembler macros, and as we will see, they are meant to be compiled using GCC or Clang along with a C program declaring them as external symbols.

Files

  • qandum.x64.s -> A subroutine called qandum that is basically a PRNG using TSC system register. You must pass it rotate and shift look up tables as seed. These values may not succeed the value of 63.
  • bytestew.a64.s -> A subroutine called bytestew that takes an unsigned character string and its length, and returns a hash in the form of an unsigned double word.
  • vonjerriman.c -> C interface for the two subroutines, including example with stdout print.

Methodology

The qandum PRNG

This subroutine uses x86-64's TSC register to seed, but also uses the look-up tables given by the users to do a barell hash of the seed. RDTSC will load the upper and lower parts of the system register into EAX and EDX. But qandum combines them into a single 64-bit integer, and then proceeds to do the barrel hash using Rotate Right, Shift-Right-and-Add and Rotate Left instructions. At the end the number is sufficiently randomized. You may modulo this with the size of your array to select a random index, and so on. This subroutine only uses integer and accumulator opertions. No multi-cycle operations are used.

The bytestew Hash

This subroutine will take the given string of unsigned bytes, read them one-by-one, and use Aarch64's bit manipulation instructions, Bit-Field Insert and Bit-Field Extract to shuffle every two bits of the byte around. And it will then combine it with the final digest. The digest is 64-bits, and it by itself goes through the same Insert-and-Extract pipeline at every loop. One thing to keep in mind, Aarch64's bit manipulators, according to ARM Cortex optimization guide, are indeed multi-cycle. So threfore this function will not be super-performant.

Compile and X-Compile

To compile the x86-64 code, just run gcc -D __qandum__ vonjerriman.c qandum.x64.s. But if you wish to X-Compile the Aarch64 code on a usual PC, you will need aarch64-linux-gnu-gcc. After you have acquired the executable, or have built it form sourc,e just run aarch64-linux-gnu-gcc -D __bytestew__ vonjerriman.c bytestew.s. Then run the executable with qemu-aarch64 -L /usr/aarch64-linux-gnu ./a.out.

Other Projects

Please visit my Github profile to view my other projects. I have many projects brewing at the same time, paid and unpaid, and due to the aforementioned bereavenet I may have to code to keep my mind busy as the death is not the only issue, my aunt's gone crazy and threatened to shank my uncle, thus making my mom very upset. You can expect form me morre projects. Whilst you are here, please visit my project PoxHash and another simple Assembly gist DJB2 Hash. This one is an early Assembly project of mine and I have progressed much since then. It's good for contrast. These two subroutines themselves are very simple.

Thanks, and take care.

/*
* Aarch64
* VonJerrimanUtils: bytestew --- A hash using A64's bit manipulation instructions, following ELF hash algorithm
* bytewtew takes a byte string and a length, and returns a hash of that string
* Calling Convention:
** x0 -> arg0
** x1 -> arg1
** x0 -> retv
* Callee-Saved:
** ---
* Clobbered:
* x9
* x10
* x11
* x12
* C Signature:
extern qand(unsigned char *rorlut, unsigned char *shrlut, unsigned char *rollut, unsigned long lutslen);
* Example Call:
```
char message[] = "AAAC";
unsigned long messagelen = 4;
unsigned long hash = bytestew(message, 4);
```
* Notes:
** The hash is one-way and reproducible
** Since bit manipulation instructions, according to Cortex-A57 Optimization guide, have a latency of 2, this subroutine is not considered micro-optimized
*/
.data
.global bytestew
.bss
.macro zroutreg reg
eor \reg, \reg, \reg
.endm
.macro bytestew dst, src
bfi \dst, \src, #6, #2
orr \src, \src, \src, lsr #2
bfi \dst, \src, #2, #2
orr \src, \src, \src, lsr #2
bfi \dst, \src, #4, #2
orr \src, \src, \src, lsr #2
bfi \dst, \src, #0, #2
orr \src, \src, \src, lsr #2
.endm
.macro dwordstew dst, src
bfi \dst, \src, #48, #16
orr \src, \src, \src, lsr #16
bfi \dst, \src, #32, #16
orr \src, \src, \src, lsr #16
bfi \dst, \src, #0, #16
orr \src, \src, \src, lsr #16
bfi \dst, \src, #6, #16
orr \src, \src, \src, lsr #16
.endm
.macro elf hash, byte, high
orr \hash, \hash, \hash, lsl #4
add \hash, \hash, \byte
ands \high, \hash, #0xF0000000
b.ne L1
b.eq L2
L1:
eor \hash, \hash, \high, lsr #24
bl L2
L2:
bic \hash, \hash, \high
.endm
.text
bytestew:
str lr, [sp, #-8]!
ZROUTREG x9
ZROUTREG x10
ZROUTREG x11
ZROUTREG x12
mov x11, x0
ZROUTREG x0
bl 1f
1:
ZROUTREG x10
ZROUTREG x12
ldrb w10, [x11, x1]
BYTESTEW w12, w10
ELF x0, x12, x10
ZROUTREG x10
DWORDSTEW x10, x0
ZROUTREG x0
mov x0, x10
sub x1, x1, #1
tst x1, x1
b.eq 2f
bl 1b
2:
ldr lr, [sp], #8
ret
/*
* x86-64
* vonjerrimanutils: qand --- a pseudo-random number generation tool
* qand subroutine takes three luts as seed, reads the time stamp counter, and uses it to generate a random integer
* calling convention:
** rdi -> arg0
** rsi -> arg1
** rdx -> arg2
** rcx -> arg3
** rax -> retv
* callee-saved:
** r12
* clobbered:
* r11
* c signature:
extern unsigned long bytestew(unsigned char *message, unsigned long messagelen);
* example call:
```
unsigned char ror_lut[4] = {1, 41, 13, 11}; // rotate right lut
unsigned char shr_lut[4] = {12, 3, 11, 1}; // shift right lut
unsigned char rol_lut[4] = {12, 3, 11, 1}; // rotate left lut
unsigned long len_luts = 4; // luts' length
unsigned long rand = qandum(ror_lut, shr_lut, rol_lut, len_luts);
```
* notes:
** retv will always be a 64-bit integer
** unsigned bytes in all 3 luts must not exceed 63. the bytes as and'd with 63 to make sure they do not exceed 63 bits
** rdtsc instruction is a model-specific register, it may act differently across architectures
*/
.data
.global qandum
.bss
.macro zroutreg reg
xor \reg, \reg
.endm
.text
qandum:
push %rbp
push %r12
mov %rsp, %rbp
push %rdx
rdtsc
shl $32, %rdx
or %rdx, %rax
pop %rdx
mov %rcx, %r12
1:
ZROUTREG %r11
ZROUTREG %rcx
mov (%rdi, %r12), %cl
and $63, %cl
mov %rax, %r11
ror %cl, %r11
mov (%rsi, %r12), %cl
and $63, %cl
shr %cl, %r11
add %r12, %rax
mov (%rdx, %r12), %cl
and $63, %cl
rol %cl, %rax
dec %r12
test %r12, %r12
jz 2f
jmp 1b
2:
pop %r12
pop %rbp
ret
#include <stdio.h>
#ifdef __qandum__
extern unsigned long qandum(unsigned char *ror_lut, unsigned char *shr_lut, unsigned char *rol_lut, unsigned long len_luts);
int main() {
unsigned char ror_lut[4] = {1, 41, 13, 11};
unsigned char shr_lut[4] = {12, 3, 11, 4};
unsigned char rol_lut[4] = {222, 42, 12, 1};
unsigned long len_luts = 4;
unsigned long rand = qandum(ror_lut, shr_lut, rol_lut, len_luts);
printf("%lu\n", rand);
}
#elif __bytestew__
extern unsigned long bytestew(unsigned char *message, unsigned long messagelen);
int main() {
char message[] = "DeathShallBringRespite";
unsigned long messagelen = 22;
unsigned long hash = bytestew(message, 4);
printf("%lu\n", hash);
}
#endif
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment