Skip to content

Instantly share code, notes, and snippets.

@jasigal
Last active January 2, 2018 21:57
Show Gist options
  • Save jasigal/24abd01e68a453e0f4bd91ec21a3421c to your computer and use it in GitHub Desktop.
Save jasigal/24abd01e68a453e0f4bd91ec21a3421c to your computer and use it in GitHub Desktop.
Accelerate LLVM FP contract bug
name: accelerate-llvm-bug
version: 0.1.0.0
build-type: Simple
cabal-version: >= 1.10
executable accelerate-llvm-bug
hs-source-dirs: .
main-is: Main.hs
default-language: Haskell2010
build-depends: base >= 4.7 && < 5
, accelerate >= 1.1 && < 1.2
, accelerate-llvm >= 1.1 && < 1.2
, accelerate-llvm-native >= 1.1 && < 1.2
; Taken from running:
; stack exec -- accelerate-llvm-bug +ACC -fforce-recomp -ddump-cc -ddebug-cc -ddump-ld -ddump-asm -ddump-exec -ddump-sched -ddump-phases -ddump-gc -dverbose -ACC
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; ModuleID = 'map_1bf1a791bb25369d'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
; Function Attrs: nounwind
define void @map_1bf1a791bb25369d(i64 %ix.start, i64 %ix.end, double* noalias nocapture %out.ad0, double* noalias nocapture readonly %fv0.ad0) local_unnamed_addr #0 {
entry:
%0 = icmp slt i64 %ix.start, %ix.end
br i1 %0, label %while1.top.preheader, label %while1.exit
while1.top.preheader: ; preds = %entry
br label %while1.top
while1.top: ; preds = %while1.top.preheader, %while1.top
%1 = phi i64 [ %10, %while1.top ], [ %ix.start, %while1.top.preheader ]
%2 = getelementptr double, double* %fv0.ad0, i64 %1
%3 = load double, double* %2, align 8
%4 = fdiv fast double 1.000000e+00, %3
%5 = fmul fast double %4, %4
%6 = fsub fast double -0.000000e+00, %5
%7 = tail call double @tanh(double %6) #1
%8 = fadd fast double %4, %7
%9 = getelementptr double, double* %out.ad0, i64 %1
store double %8, double* %9, align 8
%10 = add nsw i64 %1, 1
%exitcond = icmp eq i64 %10, %ix.end
br i1 %exitcond, label %while1.exit, label %while1.top
while1.exit: ; preds = %while1.top, %entry
ret void
}
; Function Attrs: nounwind readonly
declare double @tanh(double) local_unnamed_addr #1
attributes #0 = { nounwind }
attributes #1 = { nounwind readonly }
# Taken from running same command as above
.text
.file ""
.section .rodata.cst8,"aM",@progbits,8
.p2align 3
.LCPI0_0:
.quad 4607182418800017408
.LCPI0_1:
.quad 0
.text
.globl map_1bf1a791bb25369d
.p2align 4, 0x90
.type map_1bf1a791bb25369d,@function
map_1bf1a791bb25369d:
pushq %r15
pushq %r14
pushq %rbx
subq $16, %rsp
movq %rsi, %r14
cmpq %r14, %rdi
jge .LBB0_3
subq %rdi, %r14
leaq (%rdx,%rdi,8), %r15
leaq (%rcx,%rdi,8), %rbx
.p2align 4, 0x90
.LBB0_2:
vmovsd .LCPI0_0(%rip), %xmm0
vdivsd (%rbx), %xmm0, %xmm0
vmovsd %xmm0, 8(%rsp)
vfnmsub213sd .LCPI0_1, %xmm0, %xmm0
callq tanh
vaddsd 8(%rsp), %xmm0, %xmm0
vmovsd %xmm0, (%r15)
addq $8, %r15
addq $8, %rbx
addq $-1, %r14
jne .LBB0_2
.LBB0_3:
addq $16, %rsp
popq %rbx
popq %r14
popq %r15
retq
.Lfunc_end0:
.size map_1bf1a791bb25369d, .Lfunc_end0-map_1bf1a791bb25369d
.section ".note.GNU-stack","",@progbits
# Created by running "llc-5.0 autogen.ll -mattr=+fma -fp-contract=on -o autogen_from_ll_fp_contract.s"
.text
.file "autogen.ll"
.section .rodata.cst8,"aM",@progbits,8
.p2align 3 # -- Begin function map_1bf1a791bb25369d
.LCPI0_0:
.quad 4607182418800017408 # double 1
.LCPI0_1:
.quad 0 # double 0
.text
.globl map_1bf1a791bb25369d
.p2align 4, 0x90
.type map_1bf1a791bb25369d,@function
map_1bf1a791bb25369d: # @map_1bf1a791bb25369d
# BB#0: # %entry
pushq %r15
pushq %r14
pushq %rbx
subq $16, %rsp
movq %rsi, %r14
cmpq %r14, %rdi
jge .LBB0_3
# BB#1: # %while1.top.preheader
subq %rdi, %r14
leaq (%rdx,%rdi,8), %r15
leaq (%rcx,%rdi,8), %rbx
.p2align 4, 0x90
.LBB0_2: # %while1.top
# =>This Inner Loop Header: Depth=1
vmovsd .LCPI0_0(%rip), %xmm0 # xmm0 = mem[0],zero
vdivsd (%rbx), %xmm0, %xmm0
vmovsd %xmm0, 8(%rsp) # 8-byte Spill
vfnmsub213sd .LCPI0_1, %xmm0, %xmm0
callq tanh
vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload
vmovsd %xmm0, (%r15)
addq $8, %r15
addq $8, %rbx
decq %r14
jne .LBB0_2
.LBB0_3: # %while1.exit
addq $16, %rsp
popq %rbx
popq %r14
popq %r15
retq
.Lfunc_end0:
.size map_1bf1a791bb25369d, .Lfunc_end0-map_1bf1a791bb25369d
# -- End function
.section ".note.GNU-stack","",@progbits
module Main where
import Data.Array.Accelerate as A
import Data.Array.Accelerate.LLVM.Native
g :: Scalar Double -> Scalar Double
g = (runN (A.map f :: Acc (Scalar Double) -> Acc (Scalar Double)))
where
f x = let y = recip x
b = (-y) * y
in y + tanh b
main :: IO ()
main = do
let r = g (fromList Z [1])
print (indexArray r Z)
resolver: lts-10.2
packages:
- .
flags:
accelerate:
debug: true
unsafe-checks: true
internal-checks: true
llvm-hs:
shared-llvm: true
@jasigal
Copy link
Author

jasigal commented Jan 2, 2018

The above is a bug in accelerate-llvm-native. To exhibit the bug, download Main.hs, accelerate-llvm-bug.cabal, and stack.yaml into one directory can run stack build, followed by:

$ stack exec -- accelerate-llvm-bug +ACC -fforce-recomp -ddump-cc -ddebug-cc -ddump-ld -ddump-asm -ddump-exec -ddump-sched -ddump-phases -ddump-gc -dverbose -ACC

The output should contain autogen.ll and autogen.s, and should crash with “stack exec -- accelerate-llvm-b…” terminated by signal SIGSEGV (Address boundary error) or similar. By running with lldb-5.0 on the generate executable:

$ lldb-5.0 -- .stack-work/install/[x86_64-linux-nopie|replace with your arch]/lts-10.2/8.2.2/bin/accelerate-llvm-bug
(lldb) target create ".stack-work/install/x86_64-linux-nopie/lts-10.2/8.2.2/bin/accelerate-llvm-bug"
Current executable set to '.stack-work/install/x86_64-linux-nopie/lts-10.2/8.2.2/bin/accelerate-llvm-bug' (x86_64).
(lldb) run
Process 8931 launched: '~/Repositories/accelerate-llvm-bug/.stack-work/install/x86_64-linux-nopie/lts-10.2/8.2.2/bin/accelerate-llvm-bug' (x86_64)
Process 8931 stopped
* thread #1, name = 'accelerate-llvm', stop reason = signal SIGSEGV: invalid address (fault address: 0x1de0b0)
    frame #0: 0x00000042001de032
->  0x42001de032: vfnmsub213sd 0x1de0b0, %xmm0, %xmm0
    0x42001de03c: callq  0x42001de0c8
    0x42001de041: vaddsd 0x8(%rsp), %xmm0, %xmm0
    0x42001de047: vmovsd %xmm0, (%r15)
(lldb) 

we see that the offending instruction is ln. 30 in autogen.s (and ln. 33 in autogen_from_ll_fp_contract.s.) Note that autogen.s and autogen_from_ll_fp_contract.s are virtually identical due to the -fp-contract option being passed to llc. Without this option, very different assembly is generated.

The vfnmsub213sd instruction is a "fused negative multiply-subtract of scalar double-precision floating-point values" and appears to originate from lines 22 and 23 of autogen.ll. The issue appears to be that the instruction is used in absolute addressing mode:

vfnmsub213sd	.LCPI0_1, %xmm0, %xmm0

vs.

vfnmsub213sd	.LCPI0_1(%rip), %xmm0, %xmm0

By making this manual change to autogen.s, compiling with clang-5.0 autogen.s -c -o autogen.o, and inserting the object file into Accelerates cache (~/.accelerate/accelerate-llvm-1.1.0.0/accelerate-llvm-native-1.1.0.1/x86_64-pc-linux-gnu/broadwell/rel, but obviously different on different architectures) and running the executable:

$ stack exec accelerate-llvm-bug
0.23840584404423515

produces the correct result of 0.23840584404423515 = 1 + tanh(-1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment