I demonstrate what I mean with this using HomotopyContinuation.jl.
using HomotopyContinuation
@var x
f = System([4*x^4+3*x^3+2*x^2+x-5])
Given a symbolic system of polynomials f
, in HC.jl there are two ways
to optimize them for numerical evaluation.
First the expression is optimized with similar techniques as you do (common subexpression elimination and then a simple multivariate horner-schema).
Then the optimized expression is preprared for repeated numerical evaluation. This can be done using an InterpretedSystem
which is very similar to what you do (record all operations in arrays and "play" them for evaluation). The other is CompiledSystem
where
an actual straight line program is compiled:
julia> F = CompiledSystem(f)
Compiled: System of length 1
1 variables: x
-5 + x*(1 + x*(2 + x*(3 + 4*x)))
# This code is automatically generated for the evaluation of `F`
julia> HomotopyContinuation.ModelKit._evaluate!_impl(typeof(F))
quote
begin
@boundscheck checkbounds(u, 1:1)
@boundscheck checkbounds(x, 1:1)
@boundscheck p === nothing || checkbounds(p, 1:0)
end
@inbounds begin
_ι1 = 4 * x[1]
_ι2 = 3 + _ι1
_ι3 = x[1] * _ι2
_ι4 = 2 + _ι3
_ι5 = x[1] * _ι4
_ι6 = 1 + _ι5
_ι7 = x[1] * _ι6
_ι8 = -5 + _ι7
u[1] = _ι8
end
u
end
# We can see that this is turned into not many assembly instructions
julia> @code_native debuginfo=:none evaluate!([0.0], F, [2.0])
.section __TEXT,__text,regular,pure_instructions
subq $40, %rsp
movq %rdi, %rax
movabsq $5250033792, %rcx ## imm = 0x138ED2880
vmovaps (%rcx), %xmm0
vmovups %xmm0, 24(%rsp)
cmpq $0, 24(%rdi)
jle L148
vmovups %xmm0, 8(%rsp)
cmpq $0, 24(%rdx)
jle L170
movq (%rdx), %rcx
vmovsd (%rcx), %xmm0 ## xmm0 = mem[0],zero
movabsq $5250033752, %rcx ## imm = 0x138ED2858
vmulsd (%rcx), %xmm0, %xmm1
movabsq $5250033760, %rcx ## imm = 0x138ED2860
vaddsd (%rcx), %xmm1, %xmm1
vmulsd %xmm1, %xmm0, %xmm1
movabsq $5250033768, %rcx ## imm = 0x138ED2868
vaddsd (%rcx), %xmm1, %xmm1
vmulsd %xmm1, %xmm0, %xmm1
movabsq $5250033776, %rcx ## imm = 0x138ED2870
vaddsd (%rcx), %xmm1, %xmm1
vmulsd %xmm1, %xmm0, %xmm0
movabsq $5250033784, %rcx ## imm = 0x138ED2878
vaddsd (%rcx), %xmm0, %xmm0
movq (%rax), %rcx
vmovsd %xmm0, (%rcx)
addq $40, %rsp
retq
L148:
movabsq $throw_boundserror, %rcx
leaq 24(%rsp), %rsi
movq %rax, %rdi
callq *%rcx
ud2
L170:
movabsq $throw_boundserror, %rax
leaq 8(%rsp), %rsi
movq %rdx, %rdi
callq *%rax
ud2
Comparing the performance of InterpretedSystem
and CompiledSystem
we see that the latter is around 10x faster.
julia> u = zeros(1)
x1-element Array{Float64,1}:
0.0
julia> x = [rand()]
1-element Array{Float64,1}:
0.3889664871743841
julia> using BenchmarkTools
julia> @btime evaluate!($u, $F, $x)
3.750 ns (0 allocations: 0 bytes)
1-element Array{Float64,1}:
-4.04033706522339
julia> @btime evaluate!($u, $G, $x)
35.033 ns (0 allocations: 0 bytes)
1-element Array{Float64,1}:
-4.04033706522339
However, if the expressions are large than the compilation step for CompiledSystem
can become prohibitively slow.