Skip to content

Instantly share code, notes, and snippets.

View rrnewton's full-sized avatar

Ryan Newton rrnewton

View GitHub Profile
@rrnewton
rrnewton / Two threads think they won the winner check
Last active August 29, 2015 13:55
Example trace pertaining to issue #70. Note the two occurences of "past global barrier".
=================================================
*** Iteration 35, executing command: <./AddRemoveSetTests.exe -t i1 +RTS -N4>
=================================================
[*] Default test harness...
[*] Using Just 1 worker threads for testing.
AddRemoveSetTests:
[!] LVish responding to env Var: DEBUG=9
|3| wrkr0 [dbg-lvish] Main worker thread starting.
|3| wrkr3 [dbg-lvish] Auxillary worker #3 starting.
@rrnewton
rrnewton / gist:8726176
Created January 31, 2014 03:28
worker 0 disappears resulting in deadlock
[!] LVish responding to env Var: DEBUG=9
|3| wrkr0 [dbg-lvish] Auxillary worker #0 starting.
|7| !cpu 0 stealing
|7| !cpu 0 going idle...
|3| wrkr2 [dbg-lvish] Auxillary worker #2 starting.
|7| !cpu 2 stealing
|7| !cpu 2 going idle...
|3| wrkr3 [dbg-lvish] Auxillary worker #3 starting.
@rrnewton
rrnewton / test output
Created February 17, 2014 19:42
Having problems using the same test harness for CUDA backend
$ ./dist/dist-sandbox-463d1a75/build/test-accelerate-cuda-backend/test-accelerate-cuda-backend --threads=1
[Note: passing through options to test-framework]: --threads=1
[!] Testing backend: <CUDA-Backend: DeviceProperties {deviceName = "GeForce GT 430", computeCapability = 2.1, totalGlobalMem = 1072889856, totalConstMem = 65536, sharedMemPerBlock = 49152, regsPerBlock = 32768, warpSize = 32, maxThreadsPerBlock = 1024, maxThreadsPerMultiProcessor = 1536, maxBlockSize = (1024,1024,64), maxGridSize = (65535,65535,65535), maxTextureDim1D = 65536, maxTextureDim2D = (65536,65535), maxTextureDim3D = (2048,2048,2048), clockRate = 1400000, multiProcessorCount = 2, memPitch = 2147483647, memBusWidth = 128, memClockRate = 800000, textureAlignment = 512, computeMode = Default, deviceOverlap = True, concurrentKernels = True, eccEnabled = False, asyncEngineCount = 1, cacheMemL2 = 131072, tccDriverEnabled = False, pciInfo = PCI {busID = 1, deviceID = 0, domainID = 0}, kernelExecTimeoutEnabled = True, integrated = False
@rrnewton
rrnewton / 2 way
Last active August 29, 2015 13:56
Fission 2 way vs 4 way, sequential C
#include "stdlib.h"
#include "stdio.h"
#include "stdint.h"
#include "stdbool.h"
#include "math.h"
#define max(a,b) ({ __typeof__ (a) _a = (a); __typeof__ (b) _b = (b); _a > _b ? _a : _b; })
#define min(a,b) ({ __typeof__ (a) _a = (a); __typeof__ (b) _b = (b); _a < _b ? _a : _b; })
void build_evt144(int64_t inSize, int64_t inStride,
double* tmp_0_1266, double* tmp_0_1267, double* tmp_0_1268,
double v0111, double v0112, double v0113, double* aLt2_057,
@rrnewton
rrnewton / gist:9301953
Created March 2, 2014 04:31
Full log of fission 2 way on nbody example
This file has been truncated, but you can view the full file.
[rrnewton@RN-rMBP ~/accelerate/array-dsl-benchmarks/accelerate/nbody/fission1] (master)$ DEBUG=4 ACC_FISSION_FACTOR=2 time ./bench-nbody-fission1.exe 10
NBODY size on command line: N="10"
NBODY: Reading requested prefix of input file... Just 10
Read 100000 lines from file...
Done reading (took 0.015566s), converting to Acc array..
Input prefix(4) [(0.204377359711,0.58752346877,0.466465813108),(-0.124071846716,-0.430925352499,0.818155869842),(9.23475595191e-2,0.471875966527,-0.390485706739)]
Input in CPU memory and did GC (took 0.00179s), starting benchmark...
! Responding to env Var: DEBUG=4
beforeConversion, output was:
Result size of Tidy Core = {terms: 49, types: 76, coercions: 69}
main2 :: [Char]
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, WorkFree=False, Expandable=False,
Guidance=IF_ARGS [] 180 0}]
main2 =
unpackCString#
"Pattern match failure in do expression at Issue28.hs:10:3-11"
@rrnewton
rrnewton / Int_Version.hs
Last active August 29, 2015 13:59
Snippets of Int vs String IORef and basic CAS test. Spot the difference. These are both from .dump-simpl files.
case newMutVar# @ Int @ RealWorld main4 eta_Xg
of _ [Occ=Dead] { (# ipv_a7jX, ipv1_a7jY #) ->
case ({__pkg_ccall_GC atomic-primops-0.6.0.4 stg_readMutVar2zh MutVar#
RealWorld ()
-> State# RealWorld
-> (# State# RealWorld, Any () #)}_a7mV
(ipv1_a7jY
`cast` ((MutVar# <RealWorld>_N (UnivCo representational Int ()))_R
:: MutVar# RealWorld Int ~# MutVar# RealWorld ()))
ipv_a7jX)
@rrnewton
rrnewton / report_16.html
Last active August 29, 2015 14:14
Results for bintree with 2^16 leaves - incremental vs non-incremental compaction
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>criterion report</title>
<script language="javascript" type="text/javascript">
/*! jQuery v2.1.1 | (c) 2005, 2014 jQuery Foundation, Inc. | jquery.org/license */
!function(a,b){"object"==typeof module&&"object"==typeof module.exports?module.exports=a.document?b(a,!0):function(a){if(!a.document)throw new Error("jQuery requires a window with a document");return b(a)}:b(a)}("undefined"!=typeof window?window:this,function(a,b){var c=[],d=c.slice,e=c.concat,f=c.push,g=c.indexOf,h={},i=h.toString,j=h.hasOwnProperty,k={},l=a.document,m="2.1.1",n=function(a,b){return new n.fn.init(a,b)},o=/^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g,p=/^-ms-/,q=/-([\da-z])/gi,r=function(a,b){return b.toUpperCase()};n.fn=n.prototype={jquery:m,constructor:n,selector:"",length:0,toArray:function(){return d.call(this)},get:function(a){ret
@rrnewton
rrnewton / Main.dump-stg
Created February 7, 2015 21:08
Segfaulting goIncrOpt2
$wa2 [InlPrag=[0]]
:: Word32
-> Int#
-> State# RealWorld
-> (# State# RealWorld, Compact BinTree #)
[GblId,
Arity=3,
Caf=NoCafRefs,
Str=DmdType <L,1*U(U)><L,U><L,U>,
Unf=OtherCon []] =
let rec spower : int -> int code -> int code =
fun n x ->
if n = 0 then <1>
else < ~x * ~(spower (n-1) x) >
let spowern n = <fun x -> ~(spower n <x> )>
let powExp exp = ! (spowern exp)