Is the data you need already on FASSE? Check out the catalog here: https://nsaph.info/analytic.html#analytic-data
If it is not, see step 2.
The format of the form goes like this:
* - key_name
- value
Below is the form for analytic data documentation with key_names
. Fill in the value
fields or choose between the options.
One dataset should correspond to one form. If your dataset is spit into multiple files of a same format (ie, admissions_2011.fst
, admissions_2012.fst
etc), it is fine to complete one form.
* - dataset_name
- a meaningful name (not filename)
* - dataset_author
- Name Surname
* - date_created
- Jun 15 2022
* - data_source
- MedPar (admissions), MBSF (denominator), Medicaid MAX, other (specify)
* - spatial_coverage
- US
* - spatial_resolution
- zipcode, city, county, state
* - temporal_coverage
- 1999-2016
* - temporal_resolution
- daily, monthly, annually
* - description
- Write in free text what (if any) processing was done to the data sources. Were there any selections (cuts), data quality checks and aggregations?
* - rce_location
- `~/shared_space/TEXT`
* - fasse_location
- `/n/dominici_nsaph_l3/projects/analytic/TEXT`
Optional fileds (choose as applicable):
* - publication (if this data was used in publication)
- URL
* - GitHub repository/directory on how the data was processed
- URL
* - exposures
- What were the air pollution/exposure data sources used to create this data file?
* - confounders
- What were the confounder data sources used to create this dataset?
* - meterological
- What were the meterological data sources used to create this data file?
* - other
- What other data sources were used to create this data?
* - size
- 1.2 GB
* - files
```
├── dataset_2011.fst
├── ...
└── dataset_2016.fst
```
* - header (see in R with str(dat))
```
QID : Factor
ADATE: Date
year : num
```
Embed the form here to get the JupyterBook (NSAPH handbook) entry for nsaph.info/analytic.html:
`````{dropdown} 1. Meaningful dataset name
```{list-table}
:header-rows: 0
COPY AND PASTE THE FORM HERE
````
`````
Yes, we processed the data in UTC to US time zone data. From this source, it is not possible to have spatial data. To do so, we converted UTC global data to US local time data. Then we used these local time zone data to county data. The numbers change a little by location based in the time zone.
We provided daily data, which can be aggregated them to monthly and annual data.
Please let me know if there is anything unclear yet.
Carolina