bcow/EnsemblePlot.md

## EnsemblePlot.md

      
    Raw
  

              EnsemblePlot.md
            
          
    PEcAn has the (optional) ability to initiate large ensembles of runs to perform different types of analyses:

Sensitivity Analysis (Global Sensitivity Analysis)
Uncertainty Analysis
State Data Assimilation
Parameter Data Assimilation

For now we will not focus on data assimilation, but we should keep in mind ways to generalize and expand the app to eventually support visualizations for these analyses as well.


PEcAn runs ensembles of runs as part of the PEcAn Workflow and then saves all the run outputs to .Rdata files at the get.results step.


The following steps in the workflow (run.ensemble.analysis and run.sensitivity.analysis) load these Rdata files, perform analyses, and save outputs as .Rdata files.


Goal: Ensembles Plot App


There are disagnostic plots associated with the different analyses that are currently saved as pdfs. The pdfs can be viewed in the basic PEcAn app.

The goal is to port these figures to shiny where they can be interactive.
The plotting functions that we want to use already exist in the uncertainty package and are used throught the sensitivity and uncertainity analyses workflows.
We can also explore additional visualizations once the basic ones are working.


Global sensitivity analysis already has a very basic app that Alexey and I made years ago for a project. We may ultimately want to depricate the old app and include global sensitivity as a tab in the new Ensemble Plots app.


Examples and Demos

PEcAn tutorial

Basic instuctions for setting up ensemble runs through PEcAn can be found in the tutorial
The tutorial also goes through the output files in more detail than I have explained here.
App Demo

I put together a mock up of what I think the app could look like.
http://test-pecan.bu.edu/shiny/ecowdery/DemoEnsemblePlot/?workflow_id=1000009294

Right now it only loads the pre-existing pdf files in essentialy the same way that the original PEcAn app does.
You may not want to use any of the code that I used for the example app. I made it just to have a visual example.

The code for the demo app is on my shiny_DemoEnsemblePlot branch of pecan: https://github.com/bcow/pecan/tree/shiny_DemoEnsemblePlot/shiny/DemoEnsemblePlot
Where to start


Work through the tutorial and see what the pdfs look like in the old pecan app


Look at the demo app to get an idea of what the new app could look like


Read through and try running

/pecan/modules/uncertainty/R/run.ensemble.analysis.R
/pecan/modules/uncertainty/R/run.sensitivity.analysis.R
Both these functions use plotting functions in /pecan/modules/uncertainty/R/plots.R


The main arguments to these functions is settings which is an R list object that is generated by running read.settings(pecan.xml)


If you have a workflow id and want to find its respective pecan.xml, run:
bety <- betyConnect()
folder <- tbl(bety, "workflows") %>% filter(id == 1000009290) %>% pull(folder)
file.path(folder, "pecan.xml")

Useful tips for getting info about runs from the database

One workflow can include multiple types of ensembles.
In the database the ensembles table keeps track of which types of runs have been performed.
For example, I recently did a series of ensembles as part of workflow 1000009294.
By looking at the ensembles table and filtering by workflow id 1000009294 in the database I can see the run types
> tbl(bety, "ensembles") %>% filter(workflow_id == 1000009294) %>% collect
# A tibble: 2 x 6
          id notes created_at          updated_at          runtype              workflow_id
*      <dbl> <chr> <dttm>              <dttm>              <chr>                      <dbl>
1 1000018566 ""    2018-06-11 21:00:42 2018-06-11 21:00:42 ensemble              1000009290
2 1000018565 ""    2018-06-11 20:59:52 2018-06-11 20:59:52 sensitivity analysis  1000009290

I ran an ensemble and a sensitivity analysis.
By looking at the runs table and filtering by ensemble id 1000018566 in the database I can see that the ensemble contianed 25 runs
> tbl(bety, "runs") %>% filter(ensemble_id == 1000018566) %>% collect()
# A tibble: 25 x 14
        id model_id site_id start_time          finish_time         outdir outprefix setting parameter_list created_at         
 *   <dbl>    <dbl>   <dbl> <dttm>              <dttm>              <chr>  <chr>     <chr>   <chr>          <dttm>             
 1  1.00e9   1.00e9     756 2004-01-01 00:00:00 2004-12-31 00:00:00 ""     ""        ""      ensemble=1     2018-06-11 21:00:42
 2  1.00e9   1.00e9     756 2004-01-01 00:00:00 2004-12-31 00:00:00 ""     ""        ""      ensemble=19    2018-06-11 21:00:43

# ... with more rows, and 4 more variables: updated_at <dttm>, started_at <dttm>, finished_at <dttm>, ensemble_id <dbl>

By looking at the runs table and filtering by ensemble id 1000018565 in the database I can see that the sensitivity analysis contianed 77 runs
> tbl(bety, "runs") %>% filter(ensemble_id == 1000018565) %>% collect
# A tibble: 77 x 14
        id model_id site_id start_time          finish_time         outdir outprefix setting parameter_list created_at         
 *   <dbl>    <dbl>   <dbl> <dttm>              <dttm>              <chr>  <chr>     <chr>   <chr>          <dttm>             
 1  1.00e9   1.00e9     756 2004-01-01 00:00:00 2004-12-31 00:00:00 ""     ""        ""      quantile=15.8… 2018-06-11 17:00:14
 2  1.00e9   1.00e9     756 2004-01-01 00:00:00 2004-12-31 00:00:00 ""     ""        ""      quantile=2.27… 2018-06-11 16:59:52

# ... with more rows, and 4 more variables: updated_at <dttm>, started_at <dttm>, finished_at <dttm>, ensemble_id <dbl>