Skip to content

Instantly share code, notes, and snippets.

@lukasvermeer
Created June 22, 2017 09:36
Show Gist options
  • Save lukasvermeer/ef43ead43c10b27ba062269f11c66706 to your computer and use it in GitHub Desktop.
Save lukasvermeer/ef43ead43c10b27ba062269f11c66706 to your computer and use it in GitHub Desktop.
Attempt to uncover causal link between calorie intake and weight using CCM method as implemented in CauseMap Julia module.
using CauseMap
# specify parameter ranges to test over
E_vals = 2:10 # range to test of system dimensionality
tau_s_vals = 1:1 # range for lag length for manifold reconstruction
tau_p_vals = 0:15 # range to test for time lag of causal effect
# weight and calorie intake data
weight = [69.0,68.7,68.1,68.3,67.6,68.0,68.2,66.7,67.7,67.1,67.1,67.3,67.0,67.5,66.0,66.6,67.0,66.7,66.7,66.4,66.4,66.2,65.8,66.2,65.8,65.6,65.5,65.4,66.4,65.0,65.4,65.2,64.9,64.4,64.6,65.0,64.6,64.4,64.5,63.9,63.6,63.7,64.0,64.0,63.7,63.8,63.4,63.7,64.0,63.7,64.2,64.1,63.8,63.8]
cals = [2207.0,1275.0,1069.0,1760.0,1498.0,2544.0,1116.0,1495.0,1224.0,1584.0,1381.0,1084.0,1723.0,1276.0,1356.0,1337.0,1674.0,1516.0,1499.0,1759.0,1221.0,1476.0,1212.0,2023.0,1528.0,1279.0,2130.0,1519.0,1954.0,1590.0,1769.0,1411.0,1337.0,1798.0,1756.0,899.0,1243.0,1411.0,1463.0,1284.0,2199.0,1611.0,1855.0,1498.0,1727.0,1517.0,1474.0,1876.0,1929.0,1518.0,1265.0,1641.0,1175.0,1387.0]
# run analysis
makeoptimizationplots(weight, cals,
E_vals, tau_s_vals, tau_p_vals,
"Weight", "Calories";
nreps=10, left_E=2, left_tau_p=0, # optional
right_E=7, right_tau_p=12, lagunit=.5, # optional
unit="days", show_tau_s=false # optional
)
@lukasvermeer
Copy link
Author

Hey I just met you, and this is crazy, but causal inference in time-series data without randomised controlled trials, maybe?

I've been looking into new approaches to do causal inference without randomised controlled trials. I found this implementation of this method available as a Julia module.

The results presented seemed too good to be true, so I patched it for Julia 0.5.1 and applied it to some of my own data. The code and data is pasted below.

Data

For almost two months, I counted my own calorie intake and measured my weight every morning. (I was trying to lose weight by making myself more conscious of my diet. As you can see in the data, that totally worked.)

I did not process, scale or normalise the data. I just created two vectors weight and cals of the raw values.

Results

cals-vs-weight

As you can see from the left graph, the method had correctly identified a (weak) causal relationship between calorie intake and weight. It seems the effect flattens out at 0.5 at some point (the horizontal axes here is amount of data used; we would expect the strength of the evidence to keep increasing as data volume grows.) There is no evidence for a causal effect in the other direction.

The other two graphs add a little depth to this result. It seems the effect of calorie intake on weight is most apparent in the first few days after the measurement. After that, the effect fades out.

Both of these results are in line with the (presumed) true causal structure here: calorie intake affects weight changes the next few days, but not the other ways around.

Discussion

I don't even. Wat. How?! This is all new to me. Until yesterday, I assumed it was obvious that what I just did was impossible.

Assuming it is actually possible, what applications of this would be have in Booking that we could try? Could we use this to investigate the causal link between availability and commission for a given city, for instance?

@lukasvermeer
Copy link
Author

Looks like CCM finds spurious causal links in random data.

using CauseMap

# specify parameter ranges to test over
E_vals     = 2:10  # range to test of system dimensionality
tau_s_vals = 1:1 # range for lag length for manifold reconstruction
tau_p_vals = 0:15  # range to test for time lag of causal effect

random1 = cumsum(randn(100))
random2 = cumsum(randn(100))

# run analysis
makeoptimizationplots(random1, random2,  
                                        E_vals, tau_s_vals, tau_p_vals, 
                                        "random1", "random2"; 
                                        nreps=10, left_E=2, left_tau_p=0,   # optional 
                                        right_E=7, right_tau_p=12, lagunit=.5, # optional
                                        unit="days", show_tau_s=false # optional
                                    )

Both series random1 and random2 are entirely random and independent, but the output suggests there are causal effects both ways.

random

I think this is not an implementation fault, but rather a shortcoming of the CCM method itself. Also posting it here for reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment