Skip to content

Instantly share code, notes, and snippets.

@mmcloughlin
Last active May 20, 2018 05:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mmcloughlin/bb80af98cdc45fa11b011db4c94a04a3 to your computer and use it in GitHub Desktop.
Save mmcloughlin/bb80af98cdc45fa11b011db4c94a04a3 to your computer and use it in GitHub Desktop.
package asmdoc
//go:noescape
func addAsm(a, b float64) float64
var Add = addGeneric
func useAsm() bool {
// check cpuid bits here
return true
}
func init() {
if useAsm() {
Add = addAsm
}
}
// +build !amd64
package asmdoc
// Add returns the sum of a and b.
func Add(a, b float64) float64 {
return addGeneric(a, b)
}
package asmdoc
import "testing"
func TestAdd(t *testing.T) {
if Add(3, 4) != 7 {
t.Error("add doesn't work")
}
}
#include "textflag.h"
// func addAsm(lat, lng float64) float64
TEXT ·addAsm(SB),NOSPLIT,$0
MOVSD lat+0(FP), X0
MOVSD lng+8(FP), X1
ADDSD X0, X1
MOVQ X1, ret+16(FP)
RET
package asmdoc
// addGeneric provides a pure Go implementation of add.
func addGeneric(a, b float64) float64 {
return a + b
}

I have a function that I would like to provide an assembly implementation for on amd64 architecture. For the sake of discussion let's just suppose it's an Add function, but it's actually more complicated than this. I have the assembly version working but my question concerns getting the godoc to display correctly. I have a feeling this is currenty impossible, but I wanted to seek advice.

Some more details:

  • The assembly implementation of this function contains only a few instructions. In particular, the mere cost of calling the function is a significant part of the entire cost.
  • It makes use of special instructions (BMI2) therefore can only be used following a CPUID capability check.

The implementation is structured like this gist. At a high level:

  • In the generic (non-amd64 case) the function is defined by delegating to addGeneric.
  • In the amd64 case the function is actually a variable, initially set to addGeneric but replaced by addAsm in the init function if a cpuid check passes.

This approach works. However the godoc output is crappy because in the amd64 case the function is actually a variable. Note godoc appears to be picking up the same build tags as the machine it's running on. I'm not sure what godoc.org would do.

Alternatives considered:

  • The Add function delegates to addImpl. Then we pull some similar trick to replace addImpl in the amd64 case. The problem with this is (in my experiments) Go doesn't seem to be able to inline the call, and the assembly is now wrapped in two function calls. Since the assembly is so small already this has a noticable impact on performance.
  • In the amd64 case we define a plain function Add that has the useAsm check inside it, and calls one of addGeneric and addAsm depending on the result. This would have an even worse impact on performance.

So I guess the questions are:

  1. Is there a better way to structure the code to achieve the performance I want, and have it appear properly in documentation.
  2. If there is no alternative, is there some other way to "trick" godoc?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment