Skip to content

Instantly share code, notes, and snippets.

Last active February 20, 2023 15:40
Show Gist options
  • Save atrisovic/93d379dd84e31f0d63b965de8d529777 to your computer and use it in GitHub Desktop.
Save atrisovic/93d379dd84e31f0d63b965de8d529777 to your computer and use it in GitHub Desktop.
Form to document new analytic data on FASSE

Step 1: Check analytic data

Is the data you need already on FASSE? Check out the catalog here:

If it is not, see step 2.

Step 2: Fill in the form below and add it in the comments here.

The format of the form goes like this:

* - key_name 
  - value 

Below is the form for analytic data documentation with key_names. Fill in the value fields or choose between the options.

One dataset should correspond to one form. If your dataset is spit into multiple files of a same format (ie, admissions_2011.fst, admissions_2012.fst etc), it is fine to complete one form.

* - dataset_name
  - a meaningful name (not filename)
* - dataset_author
  - Name Surname
* - date_created
  - Jun 15 2022
* - data_source
  - MedPar (admissions), MBSF (denominator), Medicaid MAX, other (specify)
* - spatial_coverage
  - US
* - spatial_resolution
  - zipcode, city, county, state
* - temporal_coverage
  - 1999-2016
* - temporal_resolution
  - daily, monthly, annually
* - description
  - Write in free text what (if any) processing was done to the data sources. Were there any selections (cuts), data quality checks and aggregations?
* - rce_location
  - `~/shared_space/TEXT`
* - fasse_location
  - `/n/dominici_nsaph_l3/projects/analytic/TEXT`

Optional fileds (choose as applicable):

* - publication (if this data was used in publication)
  - URL
* - GitHub repository/directory on how the data was processed
  - URL
* - exposures
  - What were the air pollution/exposure data sources used to create this data file? 
* - confounders
  - What were the confounder data sources used to create this dataset?
* - meterological
  - What were the meterological data sources used to create this data file?
* - other
  - What other data sources were used to create this data?
* - size
  - 1.2 GB
* - files
   ├── dataset_2011.fst
   ├── ...
   └── dataset_2016.fst
* - header (see in R with str(dat))
QID  : Factor 
year : num  

Embed the form here to get the JupyterBook (NSAPH handbook) entry for

`````{dropdown} 1. Meaningful dataset name
:header-rows: 0


Copy link

and in your git its actually "data/final_backup.csv"
now im even more worried...

Copy link

seulkeeheo commented Nov 1, 2022

and in your git its actually "data/final_backup.csv"
now im even more worried...

Sorry. I am also worried that I found several files that look like a dataset Whanhee used in his latest analysis. So far, I have only found 'final.csv' and 'final_JUL10.csv' in Whanhee's RCE folder.

Copy link

Hi @seulkeeheo and @daniellebraun, there is nothing to be concerned about. If needed, I could clean up the RCE folders.
I renamed the file "final_JUL10.csv" into "final.csv" before storing it in /analytic and documenting it in the catalog.

Copy link

in order to be able to reproduce the pipeline the file names should match the ones in your python code, which is final_backup.csv not final or final_JUL10.csv, renaming files by hand is horrible practice and will create a lot of problem down the line. why would you rename the file? it also creates issues if whanhee's code relies on final_JUL10.csv. this is still VERY concerning. and it seems like in the RCE folder there is both final.csv and final_JUL10.csv, so did you move final_JUL10.csv and then rename it?

Copy link

  • dataset_name
    • Predicted daily smoke PM2.5 over the Contiguous US, 2006 - 2020
  • dataset_author
    • Marissa Childs
  • date_created
    • October 24, 2020
  • data_source
    • other (exposure predictions)
  • spatial_coverage
    • Contiguous US
  • spatial_resolution
    • originally 10km. aggregated to zcta, census tract, and county by area and population-weighted averages
  • temporal_coverage
    • 2006 - 2020
  • temporal_resolution
    • daily
  • processing_description
    • none
  • rce_location
    • ??
  • fasse_location
    • ??
  • publication (if this data was used in publication)
  • GitHub repository/directory on how the data was processed
  • exposures
    • PM2.5 from smoke

Copy link

@lhenneman and @macork your data has now been transferred and documented at


Copy link

danielmork commented Jan 11, 2023

    • dataset_name
    • Space weather data
    • dataset_author
    • Carolina L Zilli Vieira
    • date_created
    • Oct 17 2022
    • data_source
    • NASA (solar and geomagnetic activity parameters), DAAC NASA (solar radiation), BARTOL Neutron Station (neutrons)
    • Is this all from the same source? In either case URLs are needed.
    • spatial_coverage
    • Global UTC converted to local time
    • spatial_resolution
    • zipcode, city, county, state
    • You download data in all these resolutions or you aggregated it?
    • temporal_coverage
    • 1996-2022
    • temporal_resolution
    • daily, monthly, annually
    • Same as spatial resultion? Is this original format or derived?
    • processing_description
    • raw data converted to local time
    • Carolina's email suggests the data is not processed?
    • fasse_location
    • /n/dominici_nsaph_l3/exposures/solar_activity
    • size
    • TBD

Copy link

Data URL: Solar activity data: []
Neutron data:
Solar radiation (

Yes, we processed the data in UTC to US time zone data. From this source, it is not possible to have spatial data. To do so, we converted UTC global data to US local time data. Then we used these local time zone data to county data. The numbers change a little by location based in the time zone.

We provided daily data, which can be aggregated them to monthly and annual data.

Please let me know if there is anything unclear yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment