Skip to content

Instantly share code, notes, and snippets.

@Houstonwp
Last active January 22, 2021 03:39
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Houstonwp/b905920baa48891117ce9523420c1ab2 to your computer and use it in GitHub Desktop.
Save Houstonwp/b905920baa48891117ce9523420c1ab2 to your computer and use it in GitHub Desktop.
library(data.table)

q <- c(0.001,
       0.002,
       0.003,
       0.003,
       0.004,
       0.004,
       0.005,
       0.007,
       0.009,
       0.011)

w <- c(0.05,
       0.07,
       0.08,
       0.10,
       0.14,
       0.20,
       0.20,
       0.20,
       0.10,
       0.04)

P <- 100
S <- 25000
r <- 0.02

dt <- as.data.table(cbind(q,w))

npv <- function(cf, r, S, P) {
  cf[, inforce := shift(cumprod(1 - q - w), fill = 1)
  ][, lapses := inforce * w
  ][, deaths := inforce * q
  ][, claims := deaths * S
  ][, premiums := inforce * P
  ][, ncf := premiums - claims
  ][, d := (1/(1+r))^(.I)
  ][, sum(ncf*d)]
}

npv(dt,r,S,P)
#> [1] 50.32483

microbenchmark::microbenchmark(npv(dt,r,S,P))
#> Unit: milliseconds
#>              expr    min      lq     mean  median      uq    max neval
#>  npv(dt, r, S, P) 2.5791 2.71035 2.964293 2.85625 3.10385 6.0357   100

Created on 2021-01-15 by the reprex package (v0.3.0)

@Houstonwp
Copy link
Author

Base R implementation that mimics the python function.

q <- c(0.001,0.002,0.003,0.003,0.004,0.004,0.005,0.007,0.009,0.011)
w <- c(0.05,0.07,0.08,0.10,0.14,0.20,0.20,0.20,0.10,0.04)
P <- 100
S <- 25000
r <- 0.02

base_r_npv <- function(q,w,P,S,r) {
  inforce <- c(1,head(cumprod(1-q-w), -1))
  ncf <- inforce * P - inforce * q * S
  d <- (1/(1+r)) ^ seq_along(ncf)
  sum(ncf * d)
}

base_r_npv(q,w,P,S,r)
#> [1] 50.32483
microbenchmark::microbenchmark(base_r_npv(q,w,P,S,r))
#> Unit: microseconds
#>                       expr min  lq    mean median  uq     max neval
#>  base_r_npv(q, w, P, S, r) 7.9 8.4 202.219    8.8 9.2 19314.8   100

Created on 2021-01-16 by the reprex package (v0.3.0)

@alecloudenback
Copy link

Nice and concise! @lewisfogden, have you seen this?

@alecloudenback
Copy link

For comparison, the Julia translation:

q = [0.001,0.002,0.003,0.003,0.004,0.004,0.005,0.007,0.009,0.011]
w = [0.05,0.07,0.08,0.10,0.14,0.20,0.20,0.20,0.10,0.04]
P = 100
S = 25000
r = 0.02

function npv(q,w,P,S,r) 
	inforce = [1.; cumprod(1 .- q .- w)[1:end-1]] 
  	ncf = inforce .* P .- inforce .* q .* S
 	d = (1 ./ (1 + r)) .^ (1:length(ncf))
  	return sum(ncf .* d)
end
using BenchmarkTools

@benchmark npv(q,w,P,S,r)

results in:

BenchmarkTools.Trial: 
  memory estimate:  2.36 KiB
  allocs estimate:  40
  --------------
  minimum time:     3.130 μs (0.00% GC)
  median time:      3.255 μs (0.00% GC)
  mean time:        3.491 μs (4.11% GC)
  maximum time:     376.062 μs (97.25% GC)
  --------------
  samples:          10000
  evals/sample:     8

@lewisfogden
Copy link

Thanks - good to see. On the benchmarks, just making sure I understand them?
R using data.table is 2.8ms ~= 2800μs
R analytically is 9μs (median)
Julia analytically is 3μs (median) after the initial JIT compilation

So the 'pure'/base language solution trumps using data.tables, and Julia is 3x the speed of R on equivalent code.

My gripe with this (and not a reflection on your impressive coding skills!) is that this is very bespoke code for the problem, and likely to cause anyone reading it a fair bit of head scratching to work out what it does. I think the balance between speed and readibility is an important discussion so will share that on the commons.

@Houstonwp
Copy link
Author

@alecloudenback, thoughts on reducing allocations? StaticArrays? These small examples are fine at demonstrating basic implementations, but do they generalize and perform at scale? A million policies and 100 years? That is where I see something open source and high performance filling a valuable role. I will say I’m a big fan of Julia, and it checks all of the boxes when it comes to requirements. I just happen to work with R at actual work.

@Houstonwp
Copy link
Author

Thanks - good to see. On the benchmarks, just making sure I understand them?
R using data.table is 2.8ms ~= 2800μs
R analytically is 9μs (median)
Julia analytically is 3μs (median) after the initial JIT compilation

So the 'pure'/base language solution trumps using data.tables, and Julia is 3x the speed of R on equivalent code.

Yes and no, you could technically compile the base r code to eek out a little bit more performance. data.table has a decent amount of overhead going on for something with such a small amount of data. My hunch would be the data.table version scales better when you’re getting to millions and millions of rows.

My gripe with this (and not a reflection on your impressive coding skills!) is that this is very bespoke code for the problem, and likely to cause anyone reading it a fair bit of head scratching to work out what it does. I think the balance between speed and readibility is an important discussion so will share that on the commons.

Bespoke to this problem and not readable. Could be cleaned up with proper variable naming, etc. There is something to be said about generalizing and splitting the code up. I would separate the calculation of the contingent cashflows and the discount curve from the npv.

My intention was to show a vectorized approach to the “life problem”.

@lewisfogden
Copy link

I had a look around today about speeding up Python - one approach I saw (was a pygotham video) was to use nim in place of C (Nim compiles to C). I picked the language up quite quickly given similarities in syntax with Python.

Code is here:

# memoized actuarial model

import memo  # provides {.memoized.} decorator / https://github.com/andreaferretti/memo
import hashes
import math
import times
import stats

type
    Policy = object
        term: int
        premium: float
        sum_assured: float
        init_pols: float

# For memoization we require that each field can be hashed (to be stored in hashtable)
# we can create a simple hash by hashing individual terms and joining
proc hash(pol: Policy): Hash =
  var h: Hash = 0
  h = h !& hash(pol.term)
  h = h !& hash(pol.premium)
  h = h !& hash(pol.sum_assured)
  h = h !& hash(pol.init_pols)
  result = !$h

var
  w: seq[float]
  q: seq[float]
w = @[0.05, 0.07, 0.08, 0.10, 0.14, 0.20, 0.20, 0.20, 0.10, 0.04]
q = @[0.001, 0.002, 0.003, 0.003, 0.004, 0.004, 0.005, 0.007, 0.009, 0.011]


# forward declarations required
proc num_deaths(t: int, pol: Policy): float
proc num_lapses(t: int, pol: Policy): float
proc num_in_force(t: int, pol: Policy): float # optional

proc num_in_force(t: int, pol: Policy): float {.memoized.} =
    if t == 0:
        return pol.init_pols
    elif t >= pol.term:
        return 0
    else:
        return num_in_force(t-1, pol) - num_deaths(t-1, pol) - num_lapses(t-1, pol)

proc num_deaths(t: int, pol: Policy): float {.memoized.} = 
    if t < pol.term:
        result = num_in_force(t, pol) * q[t]      # can use return or result = <float> - if allocating to result then need to take care not to overwrite.
    else:
        result = 0

proc num_lapses(t: int, pol: Policy): float {.memoized.} =
    if t < pol.term:
        result = num_in_force(t, pol) * w[t]
    else:
        result = 0

proc claims(t: int, pol: Policy): float {.memoized.} =
    return num_deaths(t, pol) * pol.sum_assured

proc premiums(t: int, pol: Policy): float {.memoized.} =
    return num_in_force(t, pol) * pol.premium

proc net_cashflow(t: int, pol: Policy): float {.memoized.} =
    return premiums(t, pol) - claims(t, pol)


proc npv(pol: Policy, int_rate: float): float =
    var v: float
    result = 0
    v = 1 / ( 1 + int_rate)
    for t in countup(0, pol.term):
        result += net_cashflow(t, pol) * (v ^ t)

var num_tests: int
num_tests = 10
var results: seq[float]

for i in countup(0, num_tests):
    # need to clear the caches between each test run
    resetCachenet_cashflow
    resetCachenum_deaths
    resetCachenum_in_force
    resetCachenum_lapses
    resetCachepremiums
    resetCacheclaims
    var npv_val: float
    var pol1 = Policy(term: 10, premium: 100, sum_assured: 25000, init_pols: 1)
    let t0 = getTime()
    npv_val = npv(pol1, 0.02)
    let elapsed = getTime() - t0
    results.add(elapsed.inMicroseconds.float)
    echo i, ": ", elapsed

var rs: RunningStat
rs.push(results)
echo "mean: ", rs.mean()

I had to write the benchmark code myself as I couldn't get the profiler to work, compile settings and results:

lewis@DESKTOP~/nimnim$ nim c -r -d:release --opt:speed act2.nim
Hint: used config file '/home/lewis/.choosenim/toolchains/nim-1.4.2/config/nim.cfg' [Conf]
Hint: used config file '/home/lewis/.choosenim/toolchains/nim-1.4.2/config/config.nims' [Conf]
...................
Hint:  [Link]
Hint: 43870 lines; 0.558s; 60.922MiB peakmem; Release build; proj: /home/lewis/nimnim/act2.nim; out: /home/lewis/nimnim/act2 [SuccessX]
Hint: /home/lewis/nimnim/act2  [Exec]
0: 6 microseconds and 400 nanoseconds
1: 5 microseconds and 300 nanoseconds
2: 4 microseconds and 600 nanoseconds
3: 4 microseconds and 300 nanoseconds
4: 4 microseconds and 400 nanoseconds
5: 17 microseconds
6: 4 microseconds and 400 nanoseconds
7: 4 microseconds and 400 nanoseconds
8: 4 microseconds and 400 nanoseconds
9: 4 microseconds and 800 nanoseconds
10: 4 microseconds and 300 nanoseconds
mean: 5.454545454545454

Also - just realised we can't compare results as we're not all using the same hardware!

@alecloudenback
Copy link

@alecloudenback, thoughts on reducing allocations? StaticArrays?

StaticArrays help when the size of the array is less than about 15 items, IIRC. Since the sample here is unrealistically short (e.g. q is often much longer). So it would probably help here, but would be misleading for more "real world" problem sizes.

Here's a small tweak to the code, which avoids the more expensive concatenation ([arr1;arr2]) and modifies the same array that is generated with the cumprod:

function npv2(q,w,P,S,r) 
	inforce = append!([1.0],cumprod(1 .- q .- w)[1:end-1])
  	ncf = inforce .* P .- inforce .* q .* S
 	d = (1 ./ (1 + r)) .^ (1:length(ncf))
  	return sum(ncf .* d)
end

which is close to ~8x speedup:

BenchmarkTools.Trial: 
  memory estimate:  1.12 KiB
  allocs estimate:  8
  --------------
  minimum time:     413.945 ns (0.00% GC)
  median time:      427.553 ns (0.00% GC)
  mean time:        485.135 ns (9.34% GC)
  maximum time:     15.881 μs (95.88% GC)
  --------------
  samples:          10000
  evals/sample:     199

And inserting an @view where the cumprod array is sliced makes the operation non-allocating and eeks out a touch more performance:

function npv2(q,w,P,S,r) 
	inforce = append!([1.0],@view cumprod(1 .- q .- w)[1:end-1])
  	ncf = inforce .* P .- inforce .* q .* S
 	d = (1 ./ (1 + r)) .^ (1:length(ncf))
  	return sum(ncf .* d)
end
BenchmarkTools.Trial: 
  memory estimate:  992 bytes
  allocs estimate:  7
  --------------
  minimum time:     390.470 ns (0.00% GC)
  median time:      398.723 ns (0.00% GC)
  mean time:        448.883 ns (8.84% GC)
  maximum time:     13.341 μs (95.96% GC)
  --------------
  samples:          10000
  evals/sample:     202

@alecloudenback
Copy link

Also - just realised we can't compare results as we're not all using the same hardware!

@lewisfogden - I'd say that's partly true. In the cases here, we are comparing orders of magnitude difference. The relative performance of modern processors is noise on this scale:

code relative absolute timing (mean)
Julia 1x 500 nanoseconds
Python numpy 42x 21 microseconds
R vectorized 140x 70 microseconds
R data.table 6000x 3 milliseconds

Notes: using mean times, since that's more relevant for overall time on larger problems (you'd want to know how long it would take to complete the combined problem, not just the typical (median) sub-problem). Also not comparing to the memoized version posted above since that's a different technique.

I'm on an M1 Macbook Air (where Julia has to run on Rosetta2 because there's not a native version available. For calibration's sake, I ran the "R vectorized" version posted above on my machine (running on Rosetta2 as well):

Unit: microseconds
                      expr   min   lq     mean median     uq      max neval
 base_r_npv(q, w, P, S, r) 7.042 7.25 68.62107  7.418 9.6045 5989.668   100

The median is pretty close to @Houstonwp above. I don't understand why the mean is so much higher in our benchmarks than the median? I still think if one is trying to figure out the best solution to crunching a problem at scale, the mean is a better representation... but what's going on here?

I didn't try running the python version since the python binary and environment management is so notoriously tricky... I'm waiting for an ARM-native homebrew, which seems to be the best practice of managing python on Mac

@lewisfogden
Copy link

I don't understand why the mean is so much higher in our benchmarks than the median

I've been running them in a Ubuntu VM on Windows 10 (using WSL) - the 10 manual runs I did for nim indicated that there was a spike every so often (e.g. some background operation running). The mean is skewed by that high 5989 maximum - it is contributing ~5989/100 = 60 to the mean, which would be around 8 microsecs otherwise.

If you are doing homogeneous operations then the median is probably a more useful indicator purely for benchmarking - if it was at scale and you wanted to test a variety of policies then agree mean (or a truncated mean) would be better to indicate your production run time.

I had a look at the Julia Benchmarks - they roughly tie up to your table. LuaJIT (which I've never heard of) gives Julia a good run for its money on the JIT front. Downside on these is that its probably harder to interface them into Python if you are building a production pipeline.

@paddyhoran
Copy link

paddyhoran commented Jan 20, 2021

Here is the Rust implementation for reference:

pub fn npv(mortality_rates: &Vec<f64>, lapse_rates: &Vec<f64>, interest_rate: f64, sum_assured: f64, premium: f64, init_pols: f64, term: Option<usize>) -> f64 {

    let term = term.unwrap_or_else(|| mortality_rates.len());
    let mut result = 0.0;
    let mut inforce = init_pols;
    let v: f64 = 1.0 / (1.0 + interest_rate);

    for (t, (q, w)) in mortality_rates.iter().zip(lapse_rates).enumerate() {
        let no_deaths = if t < term {inforce * q} else {0.0};
        let no_lapses = if t < term {inforce * w} else {0.0};
        let premiums = inforce * premium;
        let claims = no_deaths * sum_assured;
        let net_cashflow = premiums - claims;
        result += net_cashflow * v.powi(t as i32);
        inforce = inforce - no_deaths - no_lapses;
    }

    result
}

And the benchmark:

npv                     time:   [362.23 ns 364.30 ns 366.35 ns]
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

The reason there are three measurements above is that it's showing the upper and lower ranges. The Rust benchmarking system is a statistical one.

I wrote this fast so I may have made a mistake. I also did not try to optimize it so there's some performance gain to be had for sure. Even though it looks serial I'm sure the compiler is auto-vectorizing it.

@alecloudenback
Copy link

alecloudenback commented Jan 21, 2021

Nice, @paddyhoran. I have not used rust so played with it here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=255a33e84c70e3bcd43258ddbaa66f27 Looks like the answer matches the other above.

Also looks like your algorithm for the problem is more efficient overall. It's over 2x faster than the prior Julia fastest Julia version above. Code and benchmark:

function npv3(q,w,P,S,r,term=nothing)
	term = isnothing(term) ? length(q) + 1 : term + 1
	inforce = 1.0
	result = 0.0
	
	for (t,(q,w)) in enumerate(zip(q,w))
		deaths = t < term ? inforce * q : 0.0
		lapses = t < term ? inforce * w : 0.0
		premiums = inforce * P
		claims = deaths * S
		ncf = premiums - claims
		result += ncf * (1 / ( 1 + r)) ^ (t-1)
		inforce = inforce - deaths - lapses
	end
	
  	return result
end
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     135.023 ns (0.00% GC)
  median time:      135.262 ns (0.00% GC)
  mean time:        137.513 ns (0.00% GC)
  maximum time:     1.327 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     873

@alecloudenback
Copy link

alecloudenback commented Jan 22, 2021

Updated Rankings for the "Life Modeling Problem":

code algorithm relative absolute timing (mean)
Julia accumulating for loop 1x 135 nanoseconds
Rust accumulating for loop 3x 360 nanoseconds
Julia vector 4x 500 nanoseconds
Python numpy vector 155x 21 microseconds
R vectorized vector 520x 70 microseconds
R data.table data.table vector 22000x 3 milliseconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment