Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
PUDL Boiler Generator Associations Explained
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## EIA Boiler Generator Associations Exploration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In our path towards compiling the marginal cost of electricity of all of the publicly reported generating units across the country, we wanted to compile the heat rate for each generating unit. The heat rate is important because it helps us attribute the fuel costs (which are reported at the plant level) to the actual electric generation (reported at the generator level). The heat rate is what links the fuel consumption (at the boiler) to the electricity generated (at the generator). This illuminated a hole in EIA’s 860 reported boiler generator associations, that we were able to compensate for through a series of matching cartwheels and network analysis.\n",
"\n",
"We were previously doing all of this after the original data was ingested into the database, but we’ve recently reconfigured our database ingest process to move the new and improved boiler generator associations into its own table in PUDL."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Boiler, schmoiler?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two essential elements of generating units - namely boilers and generators. You can think of them as the place where the fuel is burned (the boiler) and the place where the electricity is generated (the generator). Yes, this is a ridiculous simplification of a complex system but it will do for our purposes. In order to understand anything interesting about a generating unit, we need to connect the stuff being burned to the electricity created.\n",
"\n",
"Combined-cycle gas plants. You can think of combined-cycle gas plants as having two.. well cycles. The first cycle is a gas turbine - gas goes in turns a turbine and hot air comes out. The second cycle takes that hot air, plus water to make steam to turn a steam turbine. In the first cycle, the gas turbine is the boiler as well as the generator.\n",
"State of the EIA Union\n",
"\n",
"I can’t tell you how excited I was to find the 860 environmental associations table with boiler generator associations. And subsequently I can’t tell you how confused I got upon realizing that the associations were complete for some plants but not others. So we reached out the EIA to get a handle on what was happening.\n",
"\n",
"“As I am sure you are aware, we account for combined cycle facilities using three prime mover codes:\n",
"- CA – combined cycle steam part\n",
"- CS – combined cycle single shaft\n",
"- CT – combined cycle turbine part\n",
"For our EIA-860 and EIA-923 reports, plants are only supposed to submit data on their boilers and associated equipment. Therefore, since boilers are associated with steam generators, only the generators that have a prime mover of a “CA” or “CS” (when referring to combined cycle facilities) are listed in the boiler/generator association table. The gas turbine parts of combined cycle facilities (prime mover “CT”) are not included in this association table.”\n",
"\n",
"Well that’s explainable enough, although not particularly useful. This means that all of the combustion turbines (remember - cycle 1 - that’s where are the natural gas is being burned) in all of the combined cycle plants across the country were not attached to the steam turbines. This was resulting in some very strange heat rates because all of the assigned generation was attached to the boilers with little to no fuel. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Imports & Set-up"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2\n",
"%matplotlib inline\n",
"import pandas as pd\n",
"import numpy as np\n",
"import sqlalchemy as sa\n",
"import postgres_copy\n",
"import copy\n",
"import re\n",
"import networkx as nx\n",
"\n",
"from pudl import init\n",
"from pudl import constants as pc\n",
"import pudl.extract.eia923\n",
"import pudl.extract.eia860\n",
"import pudl.transform.eia\n",
"import pudl.output.pudltabl\n",
"import pudl.helpers\n",
"import warnings\n",
"warnings.filterwarnings('ignore')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"eia923_tables=pc.eia923_pudl_tables\n",
"eia923_years=pc.working_years['eia923']\n",
"eia860_tables=pc.eia860_pudl_tables\n",
"eia860_years=pc.working_years['eia860']\n",
"verbose=False\n",
"debug=False"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"eia923_raw_dfs = pudl.extract.eia923.extract(eia923_years=eia923_years,\n",
" verbose=verbose)\n",
"\n",
"eia860_raw_dfs = pudl.extract.eia860.extract(eia860_years=eia860_years,\n",
" verbose=verbose)\n",
"eia923_transformed_dfs = \\\n",
" pudl.transform.eia923.transform(eia923_raw_dfs,\n",
" eia923_tables=eia923_tables,\n",
" verbose=verbose)\n",
"\n",
"eia860_transformed_dfs = \\\n",
" pudl.transform.eia860.transform(eia860_raw_dfs,\n",
" eia860_tables=eia860_tables,\n",
" verbose=verbose)\n",
"\n",
"eia_transformed_dfs = copy.deepcopy(eia860_transformed_dfs)\n",
"eia_transformed_dfs.update(copy.deepcopy(eia923_transformed_dfs))"
]
},
{
"cell_type": "markdown",
"metadata": {
"scrolled": false
},
"source": [
"### Building Blocks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We’ve been mainly working with two separate EIA forms: 923 and 860. In order to compile the more complete boiler generator associations, we compiled all of the table from 860 and 923 which have boilers or generators to try to connect as many of them as possible. Here are the tables which are different tabs in 923’s reporting and different files in 860 (with the original EIA table name and the PUDL table name):\n",
"\n",
"923 Tables:\n",
"- Generator / ‘generation_eia923’: monthly net generation by generator\n",
"- Boiler Fuel / ‘boiler_fuel_eia923’: monthly fuel consumption (in number of physical units and MMBTU per unit) by boiler\n",
" \n",
"860 Tables:\n",
"- Enviro_Assoc / ‘boiler_generator_assn_eia860’ - the (limited) boiler generator associations\n",
"- Generator / ‘generators_eia860’: generator info, including capacity and (limited) unit codes"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# compile and scrub all the parts\n",
"bga_eia860 = eia_transformed_dfs['boiler_generator_assn_eia860'].copy()\n",
"bga_eia860 = pudl.transform.eia._restrict_years(bga_eia860, eia923_years, eia860_years)\n",
"bga_eia860['generator_id'] = bga_eia860.generator_id.astype(str)\n",
"bga_eia860['boiler_id'] = bga_eia860.boiler_id.astype(str)\n",
"bga_eia860 = bga_eia860.drop(['operator_id'], axis=1)\n",
"\n",
"gen_eia923 = eia_transformed_dfs['generation_eia923'].copy()\n",
"gen_eia923 = pudl.transform.eia._restrict_years(gen_eia923, eia923_years, eia860_years)\n",
"gen_eia923['generator_id'] = gen_eia923.generator_id.astype(str)\n",
"gen_eia923 = gen_eia923.set_index(pd.DatetimeIndex(gen_eia923.report_date))\n",
"\n",
"gen_eia923_gb = gen_eia923.groupby(\n",
" [pd.Grouper(freq='AS'), 'plant_id_eia', 'generator_id'])\n",
"gen_eia923 = gen_eia923_gb['net_generation_mwh'].sum().reset_index()\n",
"gen_eia923['missing_from_923'] = False\n",
"\n",
"# The generator records that are missing from 860 but appear in 923\n",
"# I created issue no. 128 to deal with this at a later date\n",
"merged = pd.merge(eia_transformed_dfs['generators_eia860'].copy(),\n",
" gen_eia923,\n",
" on=['plant_id_eia', 'report_date', 'generator_id'],\n",
" indicator=True, how='outer')\n",
"missing = merged[merged['_merge'] == 'right_only']\n",
"\n",
"# compile all of the generators\n",
"gens_eia860 = eia_transformed_dfs['generators_eia860'].copy()\n",
"gens_eia860 = pudl.transform.eia._restrict_years(gens_eia860, eia923_years, eia860_years)\n",
"gens_eia860['generator_id'] = gens_eia860.generator_id.astype(str)\n",
"gens = pd.merge(gen_eia923, gens_eia860,\n",
" on=['plant_id_eia', 'report_date', 'generator_id'],\n",
" how='outer')\n",
"\n",
"gens = gens[['plant_id_eia',\n",
" 'report_date',\n",
" 'generator_id',\n",
" 'unit_code',\n",
" 'net_generation_mwh',\n",
" 'missing_from_923']].drop_duplicates()\n",
"\n",
"gens['generator_id'] = gens['generator_id'].astype(str)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We’re going to use Barry (yes, that is the plant that is basically always at the top of every table) because it’s a relatively good example."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>plant_id_eia</th>\n",
" <th>boiler_id</th>\n",
" <th>generator_id</th>\n",
" <th>report_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2015-01-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2015-01-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>2015-01-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>3</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>2015-01-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3</td>\n",
" <td>6A</td>\n",
" <td>A1ST</td>\n",
" <td>2015-01-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>3</td>\n",
" <td>6B</td>\n",
" <td>A1ST</td>\n",
" <td>2015-01-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>3</td>\n",
" <td>7B</td>\n",
" <td>A2ST</td>\n",
" <td>2015-01-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>3</td>\n",
" <td>7A</td>\n",
" <td>A2ST</td>\n",
" <td>2015-01-01</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" plant_id_eia boiler_id generator_id report_date\n",
"0 3 1 1 2015-01-01\n",
"1 3 2 2 2015-01-01\n",
"2 3 3 3 2015-01-01\n",
"3 3 4 4 2015-01-01\n",
"4 3 5 5 2015-01-01\n",
"5 3 6A A1ST 2015-01-01\n",
"6 3 6B A1ST 2015-01-01\n",
"7 3 7B A2ST 2015-01-01\n",
"8 3 7A A2ST 2015-01-01"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stage = bga_eia860\n",
"stage[(stage['plant_id_eia'] == 3) & (stage['report_date'].dt.year == 2015)]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>plant_id_eia</th>\n",
" <th>report_date</th>\n",
" <th>generator_id</th>\n",
" <th>unit_code</th>\n",
" <th>net_generation_mwh</th>\n",
" <th>missing_from_923</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>13914</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>-1413.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13915</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>2</td>\n",
" <td>NaN</td>\n",
" <td>418.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13916</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>3</td>\n",
" <td>NaN</td>\n",
" <td>92406.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13917</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>4</td>\n",
" <td>NaN</td>\n",
" <td>1577172.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13918</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>5</td>\n",
" <td>NaN</td>\n",
" <td>2935726.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13919</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>1096926.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13920</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>1129228.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13921</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13922</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>1025123.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13923</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>1067440.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13924</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" plant_id_eia report_date generator_id unit_code net_generation_mwh \\\n",
"13914 3 2015-01-01 1 NaN -1413.0 \n",
"13915 3 2015-01-01 2 NaN 418.0 \n",
"13916 3 2015-01-01 3 NaN 92406.0 \n",
"13917 3 2015-01-01 4 NaN 1577172.0 \n",
"13918 3 2015-01-01 5 NaN 2935726.0 \n",
"13919 3 2015-01-01 A1CT G521 1096926.0 \n",
"13920 3 2015-01-01 A1CT2 G521 1129228.0 \n",
"13921 3 2015-01-01 A1ST G521 1269176.0 \n",
"13922 3 2015-01-01 A2C1 G522 1025123.0 \n",
"13923 3 2015-01-01 A2C2 G522 1067440.0 \n",
"13924 3 2015-01-01 A2ST G522 1195360.0 \n",
"\n",
" missing_from_923 \n",
"13914 False \n",
"13915 False \n",
"13916 False \n",
"13917 False \n",
"13918 False \n",
"13919 False \n",
"13920 False \n",
"13921 False \n",
"13922 False \n",
"13923 False \n",
"13924 False "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stage = gens\n",
"stage[(stage['plant_id_eia'] == 3) & (stage['report_date'].dt.year == 2015)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Use what yo mamma gave you.."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we compiled a complete list of generators from 923 and 860. Then we merged the 860 boiler generator associations onto that list. The generators with associated boilers were all labeled with an “eia860_org” tag indicating it came from the original EIA 860 reporting."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# create the beginning of a bga compilation w/ the generators as the\n",
"# background\n",
"bga_compiled_1 = pd.merge(gens, bga_eia860,\n",
" on=['plant_id_eia', 'generator_id',\n",
" 'report_date'],\n",
" how='outer')\n",
"\n",
"# Create a set of bga's that are linked, directly from bga8\n",
"bga_assn = bga_compiled_1[bga_compiled_1['boiler_id'].notnull()].copy()\n",
"bga_assn['bga_source'] = 'eia860_org'\n",
"\n",
"# Create a set of bga's that were not linked directly through bga8\n",
"bga_unassn = bga_compiled_1[bga_compiled_1['boiler_id'].isnull()].copy()\n",
"bga_unassn = bga_unassn.drop(['boiler_id'], axis=1)\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>plant_id_eia</th>\n",
" <th>report_date</th>\n",
" <th>generator_id</th>\n",
" <th>unit_code</th>\n",
" <th>net_generation_mwh</th>\n",
" <th>missing_from_923</th>\n",
" <th>boiler_id</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>25912</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>-1413.0</td>\n",
" <td>False</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25913</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>2</td>\n",
" <td>NaN</td>\n",
" <td>418.0</td>\n",
" <td>False</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25914</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>3</td>\n",
" <td>NaN</td>\n",
" <td>92406.0</td>\n",
" <td>False</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25915</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>4</td>\n",
" <td>NaN</td>\n",
" <td>1577172.0</td>\n",
" <td>False</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25916</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>5</td>\n",
" <td>NaN</td>\n",
" <td>2935726.0</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25917</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>1096926.0</td>\n",
" <td>False</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25918</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>1129228.0</td>\n",
" <td>False</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25919</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" <td>6A</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25920</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" <td>6B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25921</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>1025123.0</td>\n",
" <td>False</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25922</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>1067440.0</td>\n",
" <td>False</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25923</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" <td>7B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25924</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" <td>7A</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" plant_id_eia report_date generator_id unit_code net_generation_mwh \\\n",
"25912 3 2015-01-01 1 NaN -1413.0 \n",
"25913 3 2015-01-01 2 NaN 418.0 \n",
"25914 3 2015-01-01 3 NaN 92406.0 \n",
"25915 3 2015-01-01 4 NaN 1577172.0 \n",
"25916 3 2015-01-01 5 NaN 2935726.0 \n",
"25917 3 2015-01-01 A1CT G521 1096926.0 \n",
"25918 3 2015-01-01 A1CT2 G521 1129228.0 \n",
"25919 3 2015-01-01 A1ST G521 1269176.0 \n",
"25920 3 2015-01-01 A1ST G521 1269176.0 \n",
"25921 3 2015-01-01 A2C1 G522 1025123.0 \n",
"25922 3 2015-01-01 A2C2 G522 1067440.0 \n",
"25923 3 2015-01-01 A2ST G522 1195360.0 \n",
"25924 3 2015-01-01 A2ST G522 1195360.0 \n",
"\n",
" missing_from_923 boiler_id \n",
"25912 False 1 \n",
"25913 False 2 \n",
"25914 False 3 \n",
"25915 False 4 \n",
"25916 False 5 \n",
"25917 False NaN \n",
"25918 False NaN \n",
"25919 False 6A \n",
"25920 False 6B \n",
"25921 False NaN \n",
"25922 False NaN \n",
"25923 False 7B \n",
"25924 False 7A "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stage = bga_compiled_1\n",
"stage[(stage['plant_id_eia'] == 3) & (stage['report_date'].dt.year == 2015)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That got us 57% of the annual boiler/generator mappings but then we had to compile the rest."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### String Matching"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We realized in our sleuthing that many of unmapped boilers and generators id’s actually are the same strings - which is weird because most of the id’s mapped in the 860 association table have wildly different id’s. But alas, we simply merged the remaining unmapped generators to the unmapped boilers with the report date, EIA plant id, and the respective boiler or generator id. This mapped another 3,621 records - another 2% of the mappings!"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"bf_eia923 = eia_transformed_dfs['boiler_fuel_eia923'].copy()\n",
"bf_eia923 = pudl.transform.eia._restrict_years(bf_eia923, eia923_years, eia860_years)\n",
"bf_eia923['boiler_id'] = bf_eia923.boiler_id.astype(str)\n",
"bf_eia923['total_heat_content_mmbtu'] = bf_eia923['fuel_qty_consumed'] * \\\n",
" bf_eia923['fuel_mmbtu_per_unit']\n",
"bf_eia923 = bf_eia923.set_index(pd.DatetimeIndex(bf_eia923.report_date))\n",
"bf_eia923_gb = bf_eia923.groupby(\n",
" [pd.Grouper(freq='AS'), 'plant_id_eia', 'boiler_id'])\n",
"bf_eia923 = bf_eia923_gb.agg({\n",
" 'total_heat_content_mmbtu': pudl.helpers.sum_na,\n",
"}).reset_index()\n",
"\n",
"bf_eia923.drop_duplicates(\n",
" subset=['plant_id_eia', 'report_date', 'boiler_id'], inplace=True)\n",
"\n",
"# Create a list of boilers that were not in bga8\n",
"bf9_bga = bf_eia923.merge(bga_compiled_1,\n",
" on=['plant_id_eia', 'boiler_id', 'report_date'],\n",
" how='outer',\n",
" indicator=True)\n",
"bf9_not_in_bga = bf9_bga[bf9_bga['_merge'] == 'left_only']\n",
"bf9_not_in_bga = bf9_not_in_bga.drop(['_merge'], axis=1)\n",
"\n",
"# Match the unassociated generators with unassociated boilers\n",
"# This method is assuming that some the strings of the generators and the\n",
"# boilers are the same\n",
"bga_unassn = bga_unassn.merge(bf9_not_in_bga[['plant_id_eia',\n",
" 'boiler_id',\n",
" 'report_date']],\n",
" how='left',\n",
" left_on=['report_date',\n",
" 'plant_id_eia',\n",
" 'generator_id'],\n",
" right_on=['report_date',\n",
" 'plant_id_eia',\n",
" 'boiler_id'])\n",
"bga_unassn.sort_values(['report_date', 'plant_id_eia'], inplace=True)\n",
"\n",
"bga_unassn['bga_source'] = None\n",
"bga_unassn.loc[bga_unassn.boiler_id.notnull(),\n",
" 'bga_source'] = 'string_assn'\n",
"\n",
"bga_compiled_2 = bga_assn.append(bga_unassn)\n",
"bga_compiled_2.sort_values(['plant_id_eia', 'report_date'], inplace=True)\n",
"bga_compiled_2['missing_from_923'].fillna(value=True, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>plant_id_eia</th>\n",
" <th>report_date</th>\n",
" <th>generator_id</th>\n",
" <th>unit_code</th>\n",
" <th>net_generation_mwh</th>\n",
" <th>missing_from_923</th>\n",
" <th>boiler_id</th>\n",
" <th>bga_source</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>25912</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>-1413.0</td>\n",
" <td>False</td>\n",
" <td>1</td>\n",
" <td>eia860_org</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25913</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>2</td>\n",
" <td>NaN</td>\n",
" <td>418.0</td>\n",
" <td>False</td>\n",
" <td>2</td>\n",
" <td>eia860_org</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25914</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>3</td>\n",
" <td>NaN</td>\n",
" <td>92406.0</td>\n",
" <td>False</td>\n",
" <td>3</td>\n",
" <td>eia860_org</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25915</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>4</td>\n",
" <td>NaN</td>\n",
" <td>1577172.0</td>\n",
" <td>False</td>\n",
" <td>4</td>\n",
" <td>eia860_org</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25916</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>5</td>\n",
" <td>NaN</td>\n",
" <td>2935726.0</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>eia860_org</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25919</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" <td>6A</td>\n",
" <td>eia860_org</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25920</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" <td>6B</td>\n",
" <td>eia860_org</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25923</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" <td>7B</td>\n",
" <td>eia860_org</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25924</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" <td>7A</td>\n",
" <td>eia860_org</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2137</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>1096926.0</td>\n",
" <td>False</td>\n",
" <td>A1CT</td>\n",
" <td>string_assn</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2138</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>1129228.0</td>\n",
" <td>False</td>\n",
" <td>A1CT2</td>\n",
" <td>string_assn</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2139</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>1025123.0</td>\n",
" <td>False</td>\n",
" <td>A2C1</td>\n",
" <td>string_assn</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2140</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>1067440.0</td>\n",
" <td>False</td>\n",
" <td>A2C2</td>\n",
" <td>string_assn</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" plant_id_eia report_date generator_id unit_code net_generation_mwh \\\n",
"25912 3 2015-01-01 1 NaN -1413.0 \n",
"25913 3 2015-01-01 2 NaN 418.0 \n",
"25914 3 2015-01-01 3 NaN 92406.0 \n",
"25915 3 2015-01-01 4 NaN 1577172.0 \n",
"25916 3 2015-01-01 5 NaN 2935726.0 \n",
"25919 3 2015-01-01 A1ST G521 1269176.0 \n",
"25920 3 2015-01-01 A1ST G521 1269176.0 \n",
"25923 3 2015-01-01 A2ST G522 1195360.0 \n",
"25924 3 2015-01-01 A2ST G522 1195360.0 \n",
"2137 3 2015-01-01 A1CT G521 1096926.0 \n",
"2138 3 2015-01-01 A1CT2 G521 1129228.0 \n",
"2139 3 2015-01-01 A2C1 G522 1025123.0 \n",
"2140 3 2015-01-01 A2C2 G522 1067440.0 \n",
"\n",
" missing_from_923 boiler_id bga_source \n",
"25912 False 1 eia860_org \n",
"25913 False 2 eia860_org \n",
"25914 False 3 eia860_org \n",
"25915 False 4 eia860_org \n",
"25916 False 5 eia860_org \n",
"25919 False 6A eia860_org \n",
"25920 False 6B eia860_org \n",
"25923 False 7B eia860_org \n",
"25924 False 7A eia860_org \n",
"2137 False A1CT string_assn \n",
"2138 False A1CT2 string_assn \n",
"2139 False A2C1 string_assn \n",
"2140 False A2C2 string_assn "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stage = bga_compiled_2\n",
"stage[(stage['plant_id_eia'] == 3) & (stage['report_date'].dt.year == 2015)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using the Unit Codes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next step was to use the unit code reported in the EIA 860 generators table. These unit codes are not reported for every single generator record. There are 9592 records from 2011-2016 with some unit code. Largely, this helped connect the remaining, unmapped steam turbine portion of the combined-cycle plants to the natural gas turbine portion. All of a sudden we had 80% of the generators mapped."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# Connect the gens and boilers in units\n",
"bga_compiled_units = bga_compiled_2.loc[\n",
" bga_compiled_2['unit_code'].notnull()]\n",
"bga_gen_units = bga_compiled_units.drop(['boiler_id'], axis=1)\n",
"bga_boil_units = bga_compiled_units[['plant_id_eia',\n",
" 'report_date',\n",
" 'boiler_id',\n",
" 'unit_code']].copy()\n",
"bga_boil_units.dropna(subset=['boiler_id'], inplace=True)\n",
"\n",
"# merge the units with the boilers\n",
"bga_unit_compilation = bga_gen_units.merge(bga_boil_units,\n",
" how='outer',\n",
" on=['plant_id_eia',\n",
" 'report_date',\n",
" 'unit_code'],\n",
" indicator=True)\n",
"\n",
"# label the bga_source\n",
"bga_unit_compilation. \\\n",
" loc[bga_unit_compilation['bga_source'].isnull(),\n",
" 'bga_source'] = 'unit_connection'\n",
"\n",
"bga_unit_compilation.drop(['_merge'], axis=1, inplace=True)\n",
"bga_non_units = bga_compiled_2[bga_compiled_2['unit_code'].isnull()]\n",
"\n",
"# combine the unit compilation and the non units\n",
"bga_compiled_3 = bga_non_units.append(bga_unit_compilation)\n",
"\n",
"# resort the records and the columns\n",
"bga_compiled_3.sort_values(['plant_id_eia', 'report_date'], inplace=True)\n",
"bga_compiled_3 = bga_compiled_3[['plant_id_eia',\n",
" 'report_date',\n",
" 'generator_id',\n",
" 'boiler_id',\n",
" 'unit_code',\n",
" 'bga_source',\n",
" 'net_generation_mwh',\n",
" 'missing_from_923']]"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>plant_id_eia</th>\n",
" <th>report_date</th>\n",
" <th>generator_id</th>\n",
" <th>boiler_id</th>\n",
" <th>unit_code</th>\n",
" <th>bga_source</th>\n",
" <th>net_generation_mwh</th>\n",
" <th>missing_from_923</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>25912</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>eia860_org</td>\n",
" <td>-1413.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25913</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>NaN</td>\n",
" <td>eia860_org</td>\n",
" <td>418.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25914</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>NaN</td>\n",
" <td>eia860_org</td>\n",
" <td>92406.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25915</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>NaN</td>\n",
" <td>eia860_org</td>\n",
" <td>1577172.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25916</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>NaN</td>\n",
" <td>eia860_org</td>\n",
" <td>2935726.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>80</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>6A</td>\n",
" <td>G521</td>\n",
" <td>eia860_org</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>81</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>6B</td>\n",
" <td>G521</td>\n",
" <td>eia860_org</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>82</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>eia860_org</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>83</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>eia860_org</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>84</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>6A</td>\n",
" <td>G521</td>\n",
" <td>eia860_org</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>85</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>6B</td>\n",
" <td>G521</td>\n",
" <td>eia860_org</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>eia860_org</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>87</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>eia860_org</td>\n",
" <td>1269176.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>6A</td>\n",
" <td>G521</td>\n",
" <td>string_assn</td>\n",
" <td>1096926.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>6B</td>\n",
" <td>G521</td>\n",
" <td>string_assn</td>\n",
" <td>1096926.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>string_assn</td>\n",
" <td>1096926.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>91</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>string_assn</td>\n",
" <td>1096926.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>92</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>6A</td>\n",
" <td>G521</td>\n",
" <td>string_assn</td>\n",
" <td>1129228.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>93</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>6B</td>\n",
" <td>G521</td>\n",
" <td>string_assn</td>\n",
" <td>1129228.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>94</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>string_assn</td>\n",
" <td>1129228.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>string_assn</td>\n",
" <td>1129228.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>7B</td>\n",
" <td>G522</td>\n",
" <td>eia860_org</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>7A</td>\n",
" <td>G522</td>\n",
" <td>eia860_org</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>eia860_org</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>eia860_org</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>7B</td>\n",
" <td>G522</td>\n",
" <td>eia860_org</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>101</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>7A</td>\n",
" <td>G522</td>\n",
" <td>eia860_org</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>102</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>eia860_org</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>103</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>eia860_org</td>\n",
" <td>1195360.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>104</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>7B</td>\n",
" <td>G522</td>\n",
" <td>string_assn</td>\n",
" <td>1025123.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>105</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>7A</td>\n",
" <td>G522</td>\n",
" <td>string_assn</td>\n",
" <td>1025123.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>106</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>string_assn</td>\n",
" <td>1025123.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>107</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>string_assn</td>\n",
" <td>1025123.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>108</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>7B</td>\n",
" <td>G522</td>\n",
" <td>string_assn</td>\n",
" <td>1067440.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>109</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>7A</td>\n",
" <td>G522</td>\n",
" <td>string_assn</td>\n",
" <td>1067440.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>110</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>string_assn</td>\n",
" <td>1067440.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>111</th>\n",
" <td>3</td>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>string_assn</td>\n",
" <td>1067440.0</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" plant_id_eia report_date generator_id boiler_id unit_code bga_source \\\n",
"25912 3 2015-01-01 1 1 NaN eia860_org \n",
"25913 3 2015-01-01 2 2 NaN eia860_org \n",
"25914 3 2015-01-01 3 3 NaN eia860_org \n",
"25915 3 2015-01-01 4 4 NaN eia860_org \n",
"25916 3 2015-01-01 5 5 NaN eia860_org \n",
"80 3 2015-01-01 A1ST 6A G521 eia860_org \n",
"81 3 2015-01-01 A1ST 6B G521 eia860_org \n",
"82 3 2015-01-01 A1ST A1CT G521 eia860_org \n",
"83 3 2015-01-01 A1ST A1CT2 G521 eia860_org \n",
"84 3 2015-01-01 A1ST 6A G521 eia860_org \n",
"85 3 2015-01-01 A1ST 6B G521 eia860_org \n",
"86 3 2015-01-01 A1ST A1CT G521 eia860_org \n",
"87 3 2015-01-01 A1ST A1CT2 G521 eia860_org \n",
"88 3 2015-01-01 A1CT 6A G521 string_assn \n",
"89 3 2015-01-01 A1CT 6B G521 string_assn \n",
"90 3 2015-01-01 A1CT A1CT G521 string_assn \n",
"91 3 2015-01-01 A1CT A1CT2 G521 string_assn \n",
"92 3 2015-01-01 A1CT2 6A G521 string_assn \n",
"93 3 2015-01-01 A1CT2 6B G521 string_assn \n",
"94 3 2015-01-01 A1CT2 A1CT G521 string_assn \n",
"95 3 2015-01-01 A1CT2 A1CT2 G521 string_assn \n",
"96 3 2015-01-01 A2ST 7B G522 eia860_org \n",
"97 3 2015-01-01 A2ST 7A G522 eia860_org \n",
"98 3 2015-01-01 A2ST A2C1 G522 eia860_org \n",
"99 3 2015-01-01 A2ST A2C2 G522 eia860_org \n",
"100 3 2015-01-01 A2ST 7B G522 eia860_org \n",
"101 3 2015-01-01 A2ST 7A G522 eia860_org \n",
"102 3 2015-01-01 A2ST A2C1 G522 eia860_org \n",
"103 3 2015-01-01 A2ST A2C2 G522 eia860_org \n",
"104 3 2015-01-01 A2C1 7B G522 string_assn \n",
"105 3 2015-01-01 A2C1 7A G522 string_assn \n",
"106 3 2015-01-01 A2C1 A2C1 G522 string_assn \n",
"107 3 2015-01-01 A2C1 A2C2 G522 string_assn \n",
"108 3 2015-01-01 A2C2 7B G522 string_assn \n",
"109 3 2015-01-01 A2C2 7A G522 string_assn \n",
"110 3 2015-01-01 A2C2 A2C1 G522 string_assn \n",
"111 3 2015-01-01 A2C2 A2C2 G522 string_assn \n",
"\n",
" net_generation_mwh missing_from_923 \n",
"25912 -1413.0 False \n",
"25913 418.0 False \n",
"25914 92406.0 False \n",
"25915 1577172.0 False \n",
"25916 2935726.0 False \n",
"80 1269176.0 False \n",
"81 1269176.0 False \n",
"82 1269176.0 False \n",
"83 1269176.0 False \n",
"84 1269176.0 False \n",
"85 1269176.0 False \n",
"86 1269176.0 False \n",
"87 1269176.0 False \n",
"88 1096926.0 False \n",
"89 1096926.0 False \n",
"90 1096926.0 False \n",
"91 1096926.0 False \n",
"92 1129228.0 False \n",
"93 1129228.0 False \n",
"94 1129228.0 False \n",
"95 1129228.0 False \n",
"96 1195360.0 False \n",
"97 1195360.0 False \n",
"98 1195360.0 False \n",
"99 1195360.0 False \n",
"100 1195360.0 False \n",
"101 1195360.0 False \n",
"102 1195360.0 False \n",
"103 1195360.0 False \n",
"104 1025123.0 False \n",
"105 1025123.0 False \n",
"106 1025123.0 False \n",
"107 1025123.0 False \n",
"108 1067440.0 False \n",
"109 1067440.0 False \n",
"110 1067440.0 False \n",
"111 1067440.0 False "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stage = bga_compiled_3\n",
"stage[(stage['plant_id_eia'] == 3) & (stage['report_date'].dt.year == 2015)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This cell is largely for labeling plants and generators with helpful indicators."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# label plants that have 'bad' generator records (generators that have MWhs\n",
"# in gens9 but don't have connected boilers) create a df with just the bad\n",
"# plants by searching for the 'bad' generators\n",
"bad_plants = bga_compiled_3[(bga_compiled_3['boiler_id'].isnull()) &\n",
" (bga_compiled_3['net_generation_mwh'] > 0)].\\\n",
" drop_duplicates(subset=['plant_id_eia', 'report_date'])\n",
"bad_plants = bad_plants[['plant_id_eia', 'report_date']]\n",
"\n",
"# merge the 'bad' plants back into the larger frame\n",
"bga_compiled_3 = bga_compiled_3.merge(bad_plants,\n",
" how='outer',\n",
" on=['plant_id_eia', 'report_date'],\n",
" indicator=True)\n",
"\n",
"# use the indicator to create labels\n",
"bga_compiled_3['plant_w_bad_generator'] = \\\n",
" np.where(bga_compiled_3._merge == 'both', True, False)\n",
"# Note: At least one gen has reported MWh in 923, but could not be\n",
"# programmatically mapped to a boiler\n",
"\n",
"# we don't need this one anymore\n",
"bga_compiled_3 = bga_compiled_3.drop(['_merge'], axis=1)\n",
"\n",
"# create a label for generators that are unmapped but in 923\n",
"bga_compiled_3['unmapped_but_in_923'] = \\\n",
" np.where((bga_compiled_3.boiler_id.isnull()) &\n",
" ~bga_compiled_3.missing_from_923 &\n",
" (bga_compiled_3.net_generation_mwh == 0),\n",
" True,\n",
" False)\n",
"\n",
"# create a label for generators that are unmapped\n",
"bga_compiled_3['unmapped'] = np.where(bga_compiled_3.boiler_id.isnull(),\n",
" True,\n",
" False)\n",
"bga_out = bga_compiled_3.drop('net_generation_mwh', axis=1)\n",
"bga_out.loc[bga_out.unit_code.isnull(), 'unit_code'] = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Unit Mapping"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The collection of boilers and generators that we really care about is the generating unit - i.e. every boiler and generator that is connected and interdependent. It doesn’t make sense to talk about a “heat rate” at any smaller unit, since the fuel consumption and electricity output are mingled at the unit level In order to connect the relatively disparate associated generators and boilers into units, we did some simple network analysis, using the NetworkX python package originally developed at the Los Alamos National Laboratory. For every plant, we created a graph of all of the connections between the boilers and generators, and every connected subgraph was assigned a unique unit id."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"bga_for_nx = bga_out[['plant_id_eia', 'report_date', 'generator_id',\n",
" 'boiler_id', 'unit_code']]\n",
"# If there's no boiler... there's no boiler-generator association\n",
"bga_for_nx = bga_for_nx.dropna(subset=['boiler_id']).drop_duplicates()\n",
"\n",
"# Need boiler & generator specific ID strings, or they look like\n",
"# the same node to NX\n",
"bga_for_nx['generators'] = 'p' + bga_for_nx.plant_id_eia.astype(str) + \\\n",
" '_g' + bga_for_nx.generator_id.astype(str)\n",
"bga_for_nx['boilers'] = 'p' + bga_for_nx.plant_id_eia.astype(str) + \\\n",
" '_b' + bga_for_nx.boiler_id.astype(str)\n",
"\n",
"# dataframe to accumulate the unit_ids in\n",
"bga_w_units = pd.DataFrame()\n",
"# We want to start our unit_id counter anew for each plant:\n",
"for pid in bga_for_nx.plant_id_eia.unique():\n",
" bga_byplant = bga_for_nx[bga_for_nx.plant_id_eia == pid].copy()\n",
"\n",
" # Create a graph from the dataframe of boilers and generators. It's a\n",
" # multi-graph, meaning the same nodes can be connected by more than one\n",
" # edge -- this allows us to preserve multiple years worth of boiler\n",
" # generator association information for later inspection if need be:\n",
" bga_graph = nx.from_pandas_edgelist(bga_byplant,\n",
" source='generators',\n",
" target='boilers',\n",
" edge_attr=True,\n",
" create_using=nx.MultiGraph())\n",
"\n",
" # Each connected sub-graph is a generation unit:\n",
" gen_units = list(nx.connected_component_subgraphs(bga_graph))\n",
"\n",
" # Assign a unit_id to each subgraph, and extract edges into a dataframe\n",
" for unit_id, unit in zip(range(len(gen_units)), gen_units):\n",
" # All the boiler-generator association graphs should be bi-partite,\n",
" # meaning generators only connect to boilers, and boilers only\n",
" # connect to generators.\n",
" assert nx.algorithms.bipartite.is_bipartite(unit), \\\n",
" \"\"\"Non-bipartite generation unit graph found.\n",
"plant_id_eia={}, unit_id_pudl={}.\"\"\".format(pid, unit_id)\n",
" nx.set_edge_attributes(\n",
" unit, name='unit_id_pudl', values=unit_id + 1)\n",
" new_unit_df = nx.to_pandas_edgelist(unit)\n",
" bga_w_units = bga_w_units.append(new_unit_df)\n",
" \n",
"bga_w_units = bga_w_units.sort_values(['plant_id_eia', 'unit_id_pudl',\n",
" 'generator_id', 'boiler_id'])\n",
"bga_w_units = bga_w_units.drop(['source', 'target'], axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>report_date</th>\n",
" <th>generator_id</th>\n",
" <th>unit_code</th>\n",
" <th>plant_id_eia</th>\n",
" <th>boiler_id</th>\n",
" <th>unit_id_pudl</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>1</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>2</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>3</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>4</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>5</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>6A</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>6B</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>A1CT</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>A1CT2</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>6A</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>6B</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>A1CT</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>A1CT2</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>6A</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>6B</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>A1CT</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>A1CT2</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>7A</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>7B</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>A2C1</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>A2C2</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>7A</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>7B</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>A2C1</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>A2C2</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>7A</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>7B</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>A2C1</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>A2C2</td>\n",
" <td>7</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" report_date generator_id unit_code plant_id_eia boiler_id unit_id_pudl\n",
"4 2015-01-01 1 None 3 1 1\n",
"4 2015-01-01 2 None 3 2 2\n",
"4 2015-01-01 3 None 3 3 3\n",
"4 2015-01-01 4 None 3 4 4\n",
"4 2015-01-01 5 None 3 5 5\n",
"40 2015-01-01 A1CT G521 3 6A 6\n",
"10 2015-01-01 A1CT G521 3 6B 6\n",
"43 2015-01-01 A1CT G521 3 A1CT 6\n",
"31 2015-01-01 A1CT G521 3 A1CT2 6\n",
"25 2015-01-01 A1CT2 G521 3 6A 6\n",
"4 2015-01-01 A1CT2 G521 3 6B 6\n",
"28 2015-01-01 A1CT2 G521 3 A1CT 6\n",
"19 2015-01-01 A1CT2 G521 3 A1CT2 6\n",
"49 2015-01-01 A1ST G521 3 6A 6\n",
"16 2015-01-01 A1ST G521 3 6B 6\n",
"52 2015-01-01 A1ST G521 3 A1CT 6\n",
"34 2015-01-01 A1ST G521 3 A1CT2 6\n",
"22 2015-01-01 A2C1 G522 3 7A 7\n",
"34 2015-01-01 A2C1 G522 3 7B 7\n",
"25 2015-01-01 A2C1 G522 3 A2C1 7\n",
"28 2015-01-01 A2C1 G522 3 A2C2 7\n",
"4 2015-01-01 A2C2 G522 3 7A 7\n",
"16 2015-01-01 A2C2 G522 3 7B 7\n",
"7 2015-01-01 A2C2 G522 3 A2C1 7\n",
"10 2015-01-01 A2C2 G522 3 A2C2 7\n",
"40 2015-01-01 A2ST G522 3 7A 7\n",
"52 2015-01-01 A2ST G522 3 7B 7\n",
"43 2015-01-01 A2ST G522 3 A2C1 7\n",
"46 2015-01-01 A2ST G522 3 A2C2 7"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stage = bga_w_units\n",
"stage[(stage['plant_id_eia'] == 3) & (stage['report_date'].dt.year == 2015)]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING: multiple EIA unit codes found in these PUDL units:\n",
"plant_id_eia unit_id_pudl\n",
"1004 3 [1, CT1]\n",
"1904 1 [HBR0, BDS0]\n",
"1927 1 [RIV0, HBR0]\n",
"4040 1 [PWG1, PWG2]\n",
"10725 1 [F801, F802]\n",
"50973 1 [BLK1, BLK2, BLK3]\n",
"55153 1 [STG1, STG2]\n",
"55309 1 [SMR1, SMR2]\n",
"55502 1 [CC1, G801, CC2]\n",
"56041 1 [NGS, MGS]\n",
"56350 1 [BLK1, 115]\n",
" 2 [BLK2, 116]\n",
"56998 1 [43, PB4]\n",
" 2 [53, PB5]\n",
"60786 1 [4343, 4141]\n",
"Name: unit_code, dtype: object\n"
]
}
],
"source": [
"# Check whether the PUDL unit_id values we've inferred conflict with\n",
"# the unit_code values that were reported to EIA. Are there any PUDL\n",
"# unit_id values that have more than 1 EIA unit_code within them?\n",
"bga_unit_code_counts = \\\n",
" bga_w_units.groupby(['plant_id_eia', 'unit_id_pudl'])['unit_code'].\\\n",
" nunique().to_frame().reset_index()\n",
"bga_unit_code_counts = bga_unit_code_counts.rename(\n",
" columns={'unit_code': 'unit_code_count'})\n",
"bga_unit_code_counts = pd.merge(bga_w_units, bga_unit_code_counts,\n",
" on=['plant_id_eia', 'unit_id_pudl'])\n",
"too_many_codes = \\\n",
" bga_unit_code_counts[bga_unit_code_counts.unit_code_count > 1]\n",
"too_many_codes = \\\n",
" too_many_codes[~too_many_codes.unit_code.isnull()].\\\n",
" groupby(['plant_id_eia', 'unit_id_pudl'])['unit_code'].unique()\n",
"print('WARNING: multiple EIA unit codes found in these PUDL units:')\n",
"print(too_many_codes)\n",
"bga_w_units = bga_w_units.drop('unit_code', axis=1)\n",
"\n",
"# These assertions test that all boilers and generators ended up in the\n",
"# same unit_id across all the years of reporting:\n",
"assert (bga_w_units.groupby(\n",
" ['plant_id_eia', 'generator_id'])['unit_id_pudl'].nunique() == 1).all()\n",
"assert (bga_w_units.groupby(\n",
" ['plant_id_eia', 'boiler_id'])['unit_id_pudl'].nunique() == 1).all()\n",
"bga_w_units = bga_w_units.drop('report_date', axis=1)\n",
"bga_w_units = bga_w_units[['plant_id_eia', 'unit_id_pudl',\n",
" 'generator_id', 'boiler_id']].drop_duplicates()\n",
"bga_out = pd.merge(bga_out, bga_w_units, how='left',\n",
" on=['plant_id_eia', 'generator_id', 'boiler_id'])\n",
"bga_out['unit_id_pudl'] = \\\n",
" bga_out['unit_id_pudl'].fillna(value=0).astype(int)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>report_date</th>\n",
" <th>generator_id</th>\n",
" <th>unit_code</th>\n",
" <th>plant_id_eia</th>\n",
" <th>boiler_id</th>\n",
" <th>unit_id_pudl</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>1</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>2</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>3</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>4</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>5</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>6A</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>6B</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>A1CT</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>A1CT2</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>6A</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>6B</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>A1CT</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1CT2</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>A1CT2</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>6A</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>6B</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>A1CT</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>2015-01-01</td>\n",
" <td>A1ST</td>\n",
" <td>G521</td>\n",
" <td>3</td>\n",
" <td>A1CT2</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>7A</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>7B</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>A2C1</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C1</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>A2C2</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>7A</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>7B</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>A2C1</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2C2</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>A2C2</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>7A</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>7B</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>A2C1</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>2015-01-01</td>\n",
" <td>A2ST</td>\n",
" <td>G522</td>\n",
" <td>3</td>\n",
" <td>A2C2</td>\n",
" <td>7</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" report_date generator_id unit_code plant_id_eia boiler_id unit_id_pudl\n",
"4 2015-01-01 1 None 3 1 1\n",
"4 2015-01-01 2 None 3 2 2\n",
"4 2015-01-01 3 None 3 3 3\n",
"4 2015-01-01 4 None 3 4 4\n",
"4 2015-01-01 5 None 3 5 5\n",
"40 2015-01-01 A1CT G521 3 6A 6\n",
"10 2015-01-01 A1CT G521 3 6B 6\n",
"43 2015-01-01 A1CT G521 3 A1CT 6\n",
"31 2015-01-01 A1CT G521 3 A1CT2 6\n",
"25 2015-01-01 A1CT2 G521 3 6A 6\n",
"4 2015-01-01 A1CT2 G521 3 6B 6\n",
"28 2015-01-01 A1CT2 G521 3 A1CT 6\n",
"19 2015-01-01 A1CT2 G521 3 A1CT2 6\n",
"49 2015-01-01 A1ST G521 3 6A 6\n",
"16 2015-01-01 A1ST G521 3 6B 6\n",
"52 2015-01-01 A1ST G521 3 A1CT 6\n",
"34 2015-01-01 A1ST G521 3 A1CT2 6\n",
"22 2015-01-01 A2C1 G522 3 7A 7\n",
"34 2015-01-01 A2C1 G522 3 7B 7\n",
"25 2015-01-01 A2C1 G522 3 A2C1 7\n",
"28 2015-01-01 A2C1 G522 3 A2C2 7\n",
"4 2015-01-01 A2C2 G522 3 7A 7\n",
"16 2015-01-01 A2C2 G522 3 7B 7\n",
"7 2015-01-01 A2C2 G522 3 A2C1 7\n",
"10 2015-01-01 A2C2 G522 3 A2C2 7\n",
"40 2015-01-01 A2ST G522 3 7A 7\n",
"52 2015-01-01 A2ST G522 3 7B 7\n",
"43 2015-01-01 A2ST G522 3 A2C1 7\n",
"46 2015-01-01 A2ST G522 3 A2C2 7"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"statge = bga_out\n",
"stage[(stage['plant_id_eia'] == 3) & (stage['report_date'].dt.year == 2015)]"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"if not debug:\n",
" bga_out = bga_out[~bga_out.missing_from_923 &\n",
" ~bga_out.plant_w_bad_generator &\n",
" ~bga_out.unmapped_but_in_923 &\n",
" ~bga_out.unmapped]\n",
"\n",
" bga_out = bga_out.drop(['missing_from_923',\n",
" 'plant_w_bad_generator',\n",
" 'unmapped_but_in_923',\n",
" 'unmapped'], axis=1)\n",
" bga_out = bga_out.drop_duplicates(subset=['report_date',\n",
" 'plant_id_eia',\n",
" 'boiler_id',\n",
" 'generator_id'])\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>report_date</th>\n",
" <th>generator_id</th>\n",
" <th>unit_code</th>\n",
" <th>plant_id_eia</th>\n",
" <th>boiler_id</th>\n",
" <th>unit_id_pudl</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>1</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>2</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>3</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01</td>\n",
" <td>4</td>\n&