Skip to content

Instantly share code, notes, and snippets.

View dmikushin's full-sized avatar
🤓

Dmitry Mikushin dmikushin

🤓
View GitHub Profile
@dmikushin
dmikushin / cupti.cu
Created January 7, 2015 18:47
Querying GPU environment profiles with CUPTI
// $ cat makefile
// CUPTI = /opt/cuda/extras/CUPTI
//
// all: cupti
//
// cupti: cupti.cu
// nvcc -I$(CUPTI)/include -arch=sm_30 $< -o $@ -L$(CUPTI)/lib64 -lcupti -Xlinker -rpath=$(CUPTI)/lib64
//
// clean:
// rm -rf cupti
@dmikushin
dmikushin / test.cpp
Created January 10, 2015 13:20
C++11 compilation case
template<class PriorityEvaluation, class TProcInfo>
class ProcInfoWithPriority : public TProcInfo
{
};
template<class TProcInfo, class TCustom = void>
class ProcInfoVisitor
{
};
@dmikushin
dmikushin / cmp_check.c
Last active March 22, 2016 15:17
_mm256_cmp_ps vs _mm256_cmp_pd
// This sketch essentially checks if all elements of AVX vector are greater than zero
// In infinite loop we ensure equality of two implementations
//
// gcc -mavx cmp_check.c -o cmp_check -O3 -ffast-math
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <x86intrin.h>
@dmikushin
dmikushin / reduce_mul.c
Created March 27, 2016 12:26
AVX-512 horizontal multiply for k1om (Intel Xeon Phi Knights Corner) using Intel compiler
// AVX-512 horizontal multiply for k1om (Intel Xeon Phi Knights Corner)
//
// (c) 2016 Dmitry Mikushin dmitry@parallel-computing.pro
//
// $ icc -mmic -std=c99 -O3 reduce_mul.c -o reduce_mul
// $ micnativeloadex ./reduce_mul
// -0.004276 vs -0.004276
#include <immintrin.h>
#include <stdio.h>
@dmikushin
dmikushin / test_swizzle.c
Last active March 28, 2016 22:15
AVX-512 swizzle in native assembly for k1om (Intel Xeon Phi Knights Corner)
// AVX-512 swizzle in native assembly for k1om (Intel Xeon Phi Knights Corner)
//
// (c) 2016 Dmitry Mikushin dmitry@parallel-compute.org
//
// $ icc -no-gcc -mmic -O3 -std=c99 test_swizzle.c -o test_swizzle
// $ micnativeloadex ./test_swizzle
// 1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000
// 8.000000 8.000000 8.000000 8.000000 8.000000 8.000000 8.000000 8.000000
#include <immintrin.h>

Build LLVM-based C/C++ compiler with OpenMP4 for CUDA GPUs

Create working directory:

mkdir -p ~/forge/openmp4
cd ~/forge/openmp4

Clone LLVM & Compiler Runtime & Clang sources:

JNIEXPORT jobject JNICALL Java_..._getValue(
JNIEnv *env, jobject ...)
{
...
jobject resultObj = result.toJava().getObject();
return resultObj; // Not NULL!
}
// Log shows:
@dmikushin
dmikushin / 0001-Make-Alpine-2.21-to-bounce-all-selected-messages.patch
Created September 13, 2017 00:29
Make Alpine 2.21 to bounce all selected messages
From b4a79a41f995b27038011f92f0da3c6852a8596c Mon Sep 17 00:00:00 2001
From: Dmitry Mikushin <dmitry@kernelgen.org>
Date: Wed, 13 Sep 2017 03:28:26 +0300
Subject: [PATCH] Make Alpine 2.21 to bounce all selected messages
---
alpine/mailcmd.c | 5 -----
alpine/reply.c | 12 ++++++------
2 files changed, 6 insertions(+), 11 deletions(-)
@dmikushin
dmikushin / timing.h
Created October 12, 2019 18:51
Old-style high-resolution timing function that avoids problems of <chrono> https://stackoverflow.com/questions/37426832/what-are-the-uses-of-stdchronohigh-resolution-clock
#ifndef TIMING_H
#define TIMING_H
#if defined(_WIN32)
#define CLOCK_REALTIME_WIN32 0
#include <windows.h>
struct timespec_win32
{
@dmikushin
dmikushin / gist:5d762745573361d0c923d6aacc904ecb
Created January 7, 2020 07:37
NanoPI initrd boot failure
Welcome to Ubuntu 19.10!
[ 7.111555] systemd[1]: Failed to bump fs.file-max, ignoring: Invalid argument
[ 7.149252] systemd[1]: File /lib/systemd/system/systemd-journald.service:12 configures an IP firewall (IPAddressDeny=any), but the local system does not support BPF/cgroup.
[ 7.150764] systemd[1]: Proceeding WITHOUT firewalling in effect! (This warning is only shown for the first loaded unit using IP firewalling.)
[ 7.165346] systemd[1]: /lib/systemd/system/dbus.socket:4: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket ��→ /r.
[ 7.194813] random: systemd: uninitialized urandom read (16 bytes read)
[ 7.195455] systemd[1]: Reached target Remote File Systems.
[ OK ] Reached target Remote File Systems.
[ 7.210891] random: systemd: uninitialized urandom read (16 bytes read)