Skip to content

Instantly share code, notes, and snippets.

@gschivley
Last active March 26, 2020 11:54
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gschivley/09257d239a88fcbd8981ca5e0589321e to your computer and use it in GitHub Desktop.
Save gschivley/09257d239a88fcbd8981ca5e0589321e to your computer and use it in GitHub Desktop.
FERC714_exploration.ipynb

Moved to github.com/gschivley/FERC_714

There has been more interest in this project than I first anticitpated, so I've moved this notebook to a full repository. The notebook in this gist will not be updated - all future changes will take place on the repo. You can fork the repo, submit pull requests, or open issues.

name: ferc-data
channels:
- conda-forge
dependencies:
- python=3.7
- numpy
- pandas=0.25.*
- pip
# - matplotlib=3.*
- joblib
- xlrd
# GIS dependencies from conda-forge in case ppl start using shapefiles
- conda-forge::fiona
- conda-forge::geopandas
- conda-forge::shapely
- conda-forge::descartes
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@truggles
Copy link

This is great work Greg! I think the summary stats from the cleaning look reasonable and inline with what we saw from EIA-930. You find 97.3% of the demand values appear good (looking at output from summary_df.describe()). We find 2.2% of values are missing in the EIA-930 database and 0.5% are anomalous = 97.3% good values.

In your summary stats, you have 0.0% missing, that is impressive.

If you are using the values for creating average profiles, I think this should be fine. We imputed in our work because we need continuous time series for use in models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment