ryanbthomas/rsvr_design.md

## rsvr_design.md

      
    Raw
  

              rsvr_design.md
            
          
    Rsvr

This is document is an attempt to define the minimum viable product for a data structure which reserving methodologies can be built off of. The final section has some ideas about the API for rsvr.
tri_df (triangle dataframe)

A tabular data structure (a subset of tibble), used to hold historical claim data that is an input to reserving analysis.  It has the following columns:


cohort: (integer or string) [required]
A variable that groups claims, or policies together whose development is tracked over time. For example, this might be accident year or underwriting quarter.


index: (numeric or date)[required]
A variable that tracks the development of each cohort over time. For example, this could be an integer for number of months since beginning of accident year, or it could be the snapshot date.


measurements: (numeric)[required]
One or more variables that represent the values we want to track in a triangle (e.g.,  paid_loss, incurred_loss, open_claim_count, or paid_medical_loss). We allow for multiple measurements because it is sometimes useful to model functions of basic measurments, for example average_severity = incurred_loss  /( reported_claim_count - cwop_claim_count).


There are some metadata fields related to reserving data which might best be stored in attributes of tri_df but I think these should be ignored for MVP purposes. The best place for this information is likely to depend on how the api evolves.
rsvr object

This is analogous to the lm object from linear regression. Every reserving method is required to return either a rsrv object or a subclass.
A few methods I would expect these objects to have


predict method
equivalent to broom::glimpse(lmfit)
equivalent to broom::augment(lmfit)
equivalent to broom::glance(lmfit)

Initial Thoughts on API

basic workflow

# 1. Convert raw data into tri_df
triangle_data <- as_triangle_tibble(raw_data, 
	cohort = year(acc_date),
	index = months_elapsed(start_of_year(acc_date), eval_date),
	paid_loss, incurred_loss, payroll)
	
# 2. Set up models of claim development	
paid_cl_estimates <- chain_ladder(triangle_data, paid_loss) %>%
	set_num_years(5) %>% 
	fit_tail_invpower() %>%
	estimate()

inc_cl_estimates <- chain_ladder(triangle_data, incurred_loss) %>%
	set_num_years(5) %>% 
	fit_tail_invpower() %>%
	estimate()
 
mack_estimate <- mack(triangle_data, paid_loss) %>%
	set_alpha(2) %>%
	estimate()
	
cape_cod_estimate <- cape_cod(triangle_data, incurred_loss) %>%
	set_development_factor(inc_cl_estimate) %>%
	set_exposure(payroll) %>%
	estimate()

all_models <- list(paid_cl = paid_cl_estimates, 
	inc_cl = inc_cl_estimates, 
	cape_cod = cape_cod_estimate, 
	mack = mack_estimate)	
	
# 3. Run Diagnostics on models 
# TBD
# Basically the idea would be to have a model object which provides output like 
# residuals and other diagnostic output in tidy format that could be used to evaluate
# the value of the model. This could be used to evaluate the amount of weight to 
# give to each model as well as to determine if any of the models set up needs 
# to be revisited by the actuary. 
# I'm thinking a heirarchy of S3 objects, so that certain types of diagnostic 
# information is guaranteed to be there for all models, but specific types of 
# diagnostic info can be added for specific types of models.

# 4. Create overall estimate of ultimate losses by cohort. I'm assuming that a 
# table called method_weights has been created from a combination of apriori 
# assumptions and results of diagnostic step.
     
apply_method_weight <- function(cohort, paid_cl, inc_cl, cape_cod, mack, ...){
	current_cohort <- cohort
	weights <- dplyr::filter(method_weights, cohort == current_cohort) %>%
		select(-cohort) %>% 
		gather() %>%
		pull(value)
	
	tibble(selected = sum(weights * c(paid_cl, inc_cl, cape_cod, mack)))
}

latest_evaluation <- triangle_data %>% 
	extract_latest_evaluation() 

all_model_estimates <- purrr::map(all_models, predict, .data = latest_evaluation) %>% 
	purrr::map_dfc(~ .$estimate) %>%
	{bind_cols(latest_evaluation, .)}

all_estimates <- purrr::pmap_dfr(all_model_estimates, apply_method_weight) %>%
	{bind_cols(all_estimates, .)}

	
Written with StackEdit.