Skip to content

Instantly share code, notes, and snippets.

View magurosan's full-sized avatar

Masaki Ota magurosan

  • Nagoya, Aichi, Japan
View GitHub Profile
@magurosan
magurosan / xoshiro256_x8_avx512.c
Created May 26, 2022 14:40
x8 xoshiro256 (AVX-512)
#include <immintrin.h>
#include <stdint.h>
union PARALLEL_XOSHIRO_AVX512_STATE {
__m512i state512[4];
uint64_t state64[32];
uint32_t state32[64];
};
typedef union PARALLEL_XOSHIRO_AVX512_STATE xoshiro256_x8_avx512_state_t;
class inst l/t cpi ipc
reg64 add latency 3.633499e-01 2.752168e+00
reg64 add throughput 7.145596e-02 1.399463e+01
reg64 lea latency 1.119077e-02 8.935934e+01
reg64 lea throughput 1.119077e-02 8.935934e+01
reg64 xor dst,dst latency 4.510581e-02 2.217009e+01
reg64 xor dst,dst throughput 4.510581e-02 2.217009e+01
reg64 xor latency 4.510093e-02 2.217249e+01
reg64 xor throughput 4.511082e-02 2.216763e+01
reg64 load latency 9.582788e-01 1.043538e+00
@magurosan
magurosan / xoshiro256.c
Created December 6, 2020 06:46
Multiple instance capable xoshiro256 implementation on C
/*
xoshiro256+/++/* implementation for multiple instances
(C)2020 Masaki Ota. Some rights reserved.
Original codes are written by
2019 by David Blackman and Sebastiano Vigna (vigna@acm.org)
*/
#include "xoshiro256.h"
static void jump_common(xoshiro256_t *state, const uint64_t jump_array[4]) {
@magurosan
magurosan / strlen_avx512bw.cpp
Last active November 19, 2020 00:14
strlen : AVX2/AVX512BW instrinsics implementation
/*
require http://homepage1.nifty.com/herumi/soft/xbyak_e.html
g++ -O3 -fomit-frame-pointer -march=core2 -msse4 -fno-operator-names strlen_sse42.cpp && ./a.out
Xeon X5650 2.67GHz + Linux 2.6.32 + gcc 4.6.0
ave 2.00 5.04 7.03 9.95 12.03 16.35 20.04 33.08 66.62 132.98 261.78 518.13 1063.83
strlenLIBC 11.74 5.00 3.99 3.21 2.84 2.30 2.05 1.42 0.85 0.55 0.38 0.29 0.24
strlenC 13.94 8.18 6.69 5.43 4.87 4.16 3.76 3.08 2.58 2.29 2.17 2.05 2.02
strlenSSE2 12.72 6.30 4.97 3.81 3.27 2.55 2.20 1.53 0.94 0.56 0.36 0.25 0.19
strlenSSE42 8.84 3.73 3.01 2.59 2.44 2.16 1.99 1.46 0.84 0.54 0.35 0.27 0.21
@magurosan
magurosan / png-masker.cpp
Last active January 7, 2020 15:20
PNG(8-bit RGBA) texture file masking tool for VRoid Studio
#include <iostream>
#include <png++/png.hpp>
int main(int argc, char *argv[])
{
if (argc != 4) {
std::cerr << "usage: " << argv[0] << " [source.png] [mask.png] [dest.png]\n";
return 1;
}
try {
@magurosan
magurosan / jtr_result.txt
Last active January 12, 2018 17:51
John the Ripper benchmark result / Nano Pi Fire3 / Ubuntu 16.04 LTS aarch64
pi@NanoPi-Fire3:~/work/JohnTheRipper/run$ ./john --test
Will run 8 OpenMP threads
Benchmarking: descrypt, traditional crypt(3) [DES 128/128 NEON]... (8xOMP) DONE
Warning: "Many salts" test limited: 57/256
Many salts: 3662K c/s real, 462893 c/s virtual
Only one salt: 2417K c/s real, 342083 c/s virtual
Benchmarking: bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations) [DES 128/128 NEON]... (8xOMP) DONE
Speed for cost 1 (iteration count) of 725
Warning: "Many salts" test limited: 28/256
@magurosan
magurosan / ArduboyLTTimerPronama.ino
Last active December 3, 2017 06:05
LTタイマー兼LTキーボード(プロ生仕様)
/*
最新ソース一式をこちらに移動しました
https://github.com/magurosan/Arduboy-LT-timer-Pronama
*/
#include <Arduboy2.h>
#include "Keyboard.h"
#include <stdint.h>
#include <avr/pgmspace.h>
@magurosan
magurosan / ESPr-DFPlayer.ino
Last active September 24, 2017 14:59
ESPr Developer + DFPlayer mini sketch sample (for IoTLT nagoya vol7)
/*
MP3 webserver sample, using SSCI ESPr(R) Developer with DFRobots DFPlayer mini
Copyright (c) 2017 Masaki Ota / MagurosanTeam
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
@magurosan
magurosan / static_tzcnt.hpp
Last active August 15, 2017 20:47
static tzcnt
struct static_tzcnt {
constexpr size_t operator ()(size_t x) const {
return (x == 0) ? (sizeof(size_t) * 8)
: ((x & 1)? 0 : (1 + operator()(x >> 1)));
}
};
@magurosan
magurosan / strlen_generator_avx512bw.cpp
Last active August 21, 2017 01:09
C++/Xbyak strlen generator for AVX512BW
#include <xbyak/xbyak.h>
#include <xbyak/xbyak_util.h>
#include <stdint.h>
class StrlenGenerator : Xbyak::CodeGenerator {
public:
//
// e.g.
// StrlenGenerator gen(sizeof(char), true) => strlen_s
//