Skip to content

Instantly share code, notes, and snippets.

WRITE of size 8 at 0x0065720656c0 thread T0
#0 /usr/local/google/home/benoitjacob/iree/iree/hal/dylib/dylib_executable.cc:93:21 iree::hal::dylib::DyLibExecutable::Initialize(iree::hal::ExecutableSpec)
#1 /usr/local/google/home/benoitjacob/iree/iree/hal/dylib/dylib_executable.cc:28:3 iree::hal::dylib::DyLibExecutable::Load(iree::hal::ExecutableSpec)
#2 /usr/local/google/home/benoitjacob/iree/iree/hal/dylib/dylib_executable_cache.cc:46:10 iree::hal::dylib::DyLibExecutableCache::PrepareExecutable(iree::hal::ExecutableLayout*, iree::hal::ExecutableCachingMode, iree::hal::ExecutableSpec const&)
#3 /usr/local/google/home/benoitjacob/iree/iree/hal/api.cc:1970:3 iree_hal_executable_cache_prepare_executable
#4 /usr/local/google/home/benoitjacob/iree/iree/modules/hal/hal_module.cc:667:5 iree::hal::(anonymous namespace)::HALModuleState::ExecutableCachePrepare(iree::vm::ref<iree_hal_executable_cache> const&, iree::vm::ref<iree_hal_executable_layout> const&, unsigned int, iree::vm::ref<iree_vm_ro_byte_buffer_t> const&)
#5
@bjacob
bjacob / mmap_and_read.cc
Last active November 17, 2020 17:03
test program to mmap a file and read it and measure page faults.
#include <sys/mman.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <cstdio>
#include <cstdint>
#include <cstdlib>
@bjacob
bjacob / README.md
Last active November 26, 2020 16:54

A shot at the cheapest possible PRNG on ARM NEON

This is just a 8bit PRNG. Randomness is as bad as it could possibly get. The period is 256 in the scalar implementation. The implementation is widened 64x for SIMD purposes, and that has the unintentional side benefit of multiplying the period by that amount too.

Sample output on Pixel4 (pinned on the biggest CPU core by taskset 80):

@bjacob
bjacob / README.md
Last active November 30, 2020 15:48
Tool to scribble ARM binaries (only thing specific to ARM is it assumes instructions are aligned 32bit words)

To scribble LDR qN, [xN, offset] into MOVI vN.16b, 0, do:

hack_arm_binary some_aarch64_binary_file neon-load-to-movi-zero
@bjacob
bjacob / tracy_connect.jpeg
Created December 7, 2020 20:06
images for IREE Tracy docs
����JFIF��C 
  $.' ",#(7),01444'9=82<.342��C  2!!22222222222222222222222222222222222222222222222222���'"��
���}!1AQa"q2���#B��R��$3br�
%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz���������������������������������������������������������������������������
���w!1AQaq"2�B���� #3R�br�
$4�%�&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz�������������������������������������������������������������������������� ?��k�5��r�~ꎭ\�|I����G��?¨�Hv�}�y�|���ɮ���|�_�����6M����_5��`u w<n���vF0���K�U�������(���k�@�����
������}͇��V���� �i$���o��~Q��皭�h�l>�WՖ���w/migd��@���+`e��sҧ�������eZ��:o��?�YV�������K�H/��m���8�E���B��#pLeW2�@!A$��4ۏ�?��Ƨk���#�p���e;xϖF0U��9�{(�7�eZ��:o��?�YV�������GJ�������^Z���H�<�B7�b��žHޮ?��t [��l�����-�&P'���89-�s�����c����k�@�����
?�eZ�
@bjacob
bjacob / README
Last active December 7, 2020 20:19
images for IREE Tracy docs
Some images for IREE Tracy docs
0000000000003980 T ruy::Kernel8bitNeonDotprod(ruy::KernelParams8bit<8, 8> const&)
0000000000003756 T ruy::Pack8bitRowMajorForNeon(unsigned char const*, int, int, int, int, int, int, signed char*, int, int, int*, int, int)
0000000000003544 T ruy::Kernel8bitNeonDotprodA55ish(ruy::KernelParams8bit<8, 8> const&)
0000000000003068 T ruy::Kernel8bitNeonA55ish(ruy::KernelParams8bit<4, 4> const&)
0000000000002708 T ruy::Kernel8bitNeon(ruy::KernelParams8bit<4, 4> const&)
0000000000001704 T ruy::Pack8bitColMajorForNeonDotprod(void const*, void const*, void const*, void const*, int, int, int, int, int, int, signed char*, int*, int)
0000000000001316 T ruy::Kernel8bitNeon1Col(ruy::KernelParams8bit<4, 4> const&)
0000000000001260 T ruy::Kernel8bitNeonDotprod1Col(ruy::KernelParams8bit<8, 8> const&)
0000000000001208 T ruy::MakeBlockMap(int, int, int, int, int, int, int, int, ruy::CpuCacheParams const&, ruy::BlockMap*)
0000000000001108 t ruy::(anonymous namespace)::TrMulTask::Run()
@bjacob
bjacob / ez.sh
Created December 15, 2020 21:30
#!/bin/bash
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
* 86.55% FullyConnected
* 86.53% cpu_backend_gemm::Gemm
* 86.53% Mul
* 41.12% matmul shape: 128x512x384
* 41.12% TrMul (Path=0x20, max_num_threads=1, is_prepacked=(0,0))
* 41.12% TrMulImpl, general case
* 36.59% Kernel (kNeon)
* 4.47% Pack (kNeon)
* 0.04% [other]
* 0.02% MakeBlockMap
@bjacob
bjacob / MobileNet-v3-large-visualization.md
Last active January 17, 2021 17:44
Matmul shapes in MobileNet-v3-large, EfficientNet-Lite2 and EfficientNet-B4, 8bit quantized, by decreasing CPU time % on Pixel4

Visualization of the MobileNet-v3-large shapes (ordered similarly by decreasing time percentage, so the most important shapes come first).

mobilenet-v3-large-matmuls-ordered