erikcs/example.md

## example.md

      
    Raw
  

              example.md
            
          
    Stepping through core grf with lldb and RStudio (MacOS)

First step (optional):


compile the R extension without compiler optimizations (to avoid unexpected debug behavior, debug symbols are on by default, -g).
Add CXX11FLAGS = -g -O0 to your ~/.R/Makevars (note: not grf/src/Makevars) and reinstall/recompile the package as usual (Rstudio->Build). You can see the system compile options with $ R CMD config CXX11FLAGS


Example script

Say we have the following example.R file open and executed in RStudio (you may have to run base R form a terminal window if there are problems attaching to the RStudio process)
library(grf)

n = 2000; p = 20
X = matrix(rnorm(n * p), n, p)
TAU = 1 / (1 + exp(-X[, 3]))
W = rbinom(n ,1, 1 / (1 + exp(-X[, 1] - X[, 2])))
Y = pmax(X[, 2] + X[, 3], 0) + rowMeans(X[, 4:6]) / 2 + W * TAU + rnorm(n)

forest <- causal_forest(X, Y, W, num.threads = 1)
1

Get the process id from R:
> Sys.getpid()
[1] 23248
2

Start lldb form the terminal and attach to the R session:
$ lldb
(lldb) process attach --pid 23248

(lldb autocompletes most commands with tab)
3

Set a breakpoint for a C++ function (all entry points are listed in R\RcppExports.R). For the above program, the relevant one if we want to step through the training process is causal_train:
(lldb) breakpoint set --name causal_train

(again, lldb should autocomplete from causal_*)
4

Run the last line in Rstudio
forest <- causal_forest(X, Y, W, num.threads = 1)
5

Then
(lldb) continue

Will drop you into the interactive debugger:
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x000000010921d933 grf.so`causal_train(train_matrix=Rcpp::NumericMatrix @ 0x00007ffeecc59ab0, sparse_train_matrix=SparseMatrix<double, 0, int> @ 0x00007ffeecc59a68, outcome_index=11, treatment_index=12, sample_weight_index=13, use_sample_weights=false, mtry=10, num_trees=2000, min_node_size=5, sample_fraction=0.5, honesty=true, honesty_fraction=0.5, ci_group_size=2, reduced_form_weight=0, alpha=0.050000000000000003, imbalance_penalty=0, stabilize_splits=true, clusters=size=0, samples_per_cluster=0, compute_oob_predictions=true, num_threads=1, seed=112137146) at CausalForestBindings.cpp:52:64
   49  	                        bool compute_oob_predictions,
   50  	                        unsigned int num_threads,
   51  	                        unsigned int seed) {
-> 52  	  ForestTrainer trainer = ForestTrainers::instrumental_trainer(reduced_form_weight, stabilize_splits);
   53
   54  	  Data* data = RcppUtilities::convert_data(train_matrix, sparse_train_matrix);
   55  	  data->set_outcome_index(outcome_index - 1);
Target 0: (rsession) stopped

step into the current line:
(lldb) s
Process 26531 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x000000010f0f15c2 grf.so`ForestTrainers::instrumental_trainer(reduced_form_weight=0, stabilize_splits=true) at ForestTrainers.cpp:33:59
   30  	ForestTrainer ForestTrainers::instrumental_trainer(double reduced_form_weight,
   31  	                                                   bool stabilize_splits) {
   32
-> 33  	  std::shared_ptr<RelabelingStrategy> relabeling_strategy(new InstrumentalRelabelingStrategy(reduced_form_weight));
   34  	  std::shared_ptr<SplittingRuleFactory> splitting_rule_factory = stabilize_splits
   35  	          ? std::shared_ptr<SplittingRuleFactory>(new InstrumentalSplittingRuleFactory())
   36  	          : std::shared_ptr<SplittingRuleFactory>(new RegressionSplittingRuleFactory());
Target 0: (rsession) stopped.

print variables
(lldb) p stabilize_splits
(bool) $0 = true
(lldb) p reduced_form_weight
(double) $1 = 0

List them with var
Show the current frame with f
Resources

lldb Tutorial
Stackoverflow - Running R with -d