Skip to content

Instantly share code, notes, and snippets.

View richardstartin's full-sized avatar

Richard Startin richardstartin

View GitHub Profile
import java.util.concurrent.ThreadLocalRandom;
public class BLSI {
public static void main(String[] args) {
long blackhole = -1L;
while (blackhole != 0L) {
blackhole = blsi(ThreadLocalRandom.current().nextLong());
}
System.out.println(blackhole);
Java HotSpot(TM) 64-Bit Server VM warning: printing of assembly code is enabled; turning on DebugNonSafepoints to gain additional output
CompilerOracle: print *JVector8.dot
Compiled method (c1) 807 864 % 3 ch.ethz.acl.ngen.precison.JVector8::dot @ 35 (134 bytes)
total in heap [0x000000010f6dd010,0x000000010f6ddbb0] = 2976
relocation [0x000000010f6dd138,0x000000010f6dd1b0] = 120
main code [0x000000010f6dd1c0,0x000000010f6dd640] = 1152
stub code [0x000000010f6dd640,0x000000010f6dd6d0] = 144
oops [0x000000010f6dd6d0,0x000000010f6dd6d8] = 8
metadata [0x000000010f6dd6d8,0x000000010f6dd6e0] = 8
scopes data [0x000000010f6dd6e0,0x000000010f6dd830] = 336
@richardstartin
richardstartin / SAXPY
Created December 23, 2017 21:59 — forked from astojanov/SAXPY
// 1. package ch.ethz.acl.ngen.saxpy;
// 2.
// 3. public class JSaxpy {
// 4. public void apply(float[] a, float[] b, float s, int n){
// 5. for (int i = 0; i < n; i += 1) {
// 6. a[i] += b[i] * s;
// 7. }
// 8. }
// 9. }
// Code being analyzed
//
// 1. package ch.ethz.acl.ngen.saxpy;
// 2.
// 3. public class JSaxpy {
// 4. public void apply(int[] a, int[] b, int s, int n){
// 5. for (int i = 0; i < n; i += 1) {
// 6. a[i] += b[i] * s;
// 7. }
// 8. }
Benchmarks:
com.openkappa.simd.saxpy.DAXPYAlignment.daxpy
# JMH version: 1.19
# VM version: JDK 9.0.1, VM 9.0.1+11
# VM invoker: C:\Program Files\Java\jdk-9.0.1\bin\java.exe
# VM options: -server -XX:-TieredCompilation -javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2017.2.5\lib\idea_rt.jar=58772:C:\Program Files\JetBrains\IntelliJ IDEA 2017.2.5\bin -Dfile.encoding=UTF-8
# Warmup: 10 iterations, 1 s each
# Measurement: 10 iterations, 10 s each
# Timeout: 10 min per iteration
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
stepping : 3
cpu MHz : 2592.000
cache size : 256 KB
physical id : 0

Keybase proof

I hereby claim:

  • I am richardstartin on github.
  • I am richardstartin (https://keybase.io/richardstartin) on keybase.
  • I have a public key ASC9Niv9ZiIZxfkha8E-H9OpnqL-vkkzLmnYqTFeJ_Chnwo

To claim this, I am signing this object:

Benchmark (mode) (size) (unroll) Mode Cnt Score Error Units
Shuffle.shuffle THREAD_LOCAL_RANDOM 100000000 8 thrpt 10 0.217 0.013 ops/s
Shuffle.shuffle:l1d_pend_miss.pending THREAD_LOCAL_RANDOM 100000000 8 thrpt 18246771955.200 #/op
Shuffle.shuffle:l1d_pend_miss.pending_cycles THREAD_LOCAL_RANDOM 100000000 8 thrpt 7280468758.133 #/op
Shuffle.shuffle THREAD_LOCAL_RANDOM 100000000 16 thrpt 10 0.233 0.001 ops/s
Shuffle.shuffle:l1d_pend_miss.pending THREAD_LOCAL_RANDOM 100000000 16 thrpt 17801360193.233 #/op
Shuffle.shuffle:l1d_pend_miss.pending_cycles THREAD_LOCAL_RANDOM 100000000 16 thrpt 7093396781.133 #/op
Shuffle.shuffle THREAD_LOCAL_RANDOM 100000000 32 thrpt 10 0.231 0.012 ops/s
Shuffle.shuffle:l1d_pend_miss.pending THREAD_LOCAL_RANDOM 100000000 32 thrpt 17736302365.233 #/op
Shuffle.shuffle:l1d_pend_miss.pending_cycles THREAD_LOCAL_RANDOM 100000000 32 thrpt 7086435577.567 #/op
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 9.
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit","Param: size"
"com.openkappa.panama.vectorbenchmarks.IntersectionCardinality.mixed","thrpt",1,20,1660.364170,39.398630,"ops/ms",1024
"com.openkappa.panama.vectorbenchmarks.IntersectionCardinality.mixed:CPI","thrpt",1,1,0.286152,NaN,"#/op",1024
"com.openkappa.panama.vectorbenchmarks.IntersectionCardinality.mixed:cycles","thrpt",1,1,1999.275956,NaN,"#/op",1024
"com.openkappa.panama.vectorbenchmarks.IntersectionCardinality.mixed:instructions","thrpt",1,1,6986.757494,NaN,"#/op",1024
"com.openkappa.panama.vectorbenchmarks.IntersectionCardinality.mixed:l1d_pend_miss.pending","thrpt",1,1,6.772748,NaN,"#/op",1024
"com.openkappa.panama.vectorbenchmarks.IntersectionCardinality.mixed:l1d_pend_miss.pending_cycles","thrpt",1,1,5.197146,NaN,"#/op",1024
"com.openkappa.panama.vectorbenchmarks.IntersectionCardinality.mixedStaged","thrpt",1,20,2035.710950,8.082510,"ops/ms",1024
"com.openkappa.panama.vectorbenchmarks.IntersectionCardinality.mixedStaged:
# JMH version: 1.20
# VM version: JDK 12-internal, VM 12-internal+0-adhoc.root.dev
# VM invoker: /home/richard/workspace/dev/build/linux-x86_64-normal-server-release/images/jdk/bin/java
# VM options: --add-modules=jdk.incubator.vector -Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.openkappa.panama.vectorbenchmarks.IntersectionCardinality.popcnt