Skip to content

Instantly share code, notes, and snippets.

@simonw
Last active March 10, 2019 01:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save simonw/53fb1dcd88bd26080306370bebdddfb3 to your computer and use it in GitHub Desktop.
Save simonw/53fb1dcd88bd26080306370bebdddfb3 to your computer and use it in GitHub Desktop.
How I created dams.now.sh

How I created dams.now.sh

Try it out at https://dams.now.sh/ - see this Twitter thread for background.

I started by grabbing the URLs to every downloadable Excel spreadsheet.

I navigated to the "Downloads (Public)" link starting from https://nid-test.sec.usace.army.mil/ - then I ran this JavaScript in my browser's console to extract all of the URLs as a JSON blob.

console.log(JSON.stringify(
    Array.from(
        document.querySelectorAll(".t-Report-cell")
    ).map(c => c.querySelector("a").href)
))

I pasted this JSON blob into the Python script, then added code to load the data using pandas and save it to a SQLite database.

python fetch_and_save_to_sqlite.py

This created a file called dams.db.

I used sqlite-utils to add a full-text index on some text columns, plus indexes on some columns that I would like to be able to run fast queries against:

sqlite-utils enable-fts dams.db dams DAM_NAME OTHER_DAM_NAME
sqlite-utils create-index dams.db dams RIVER
sqlite-utils create-index dams.db dams STATE

Finally I published that file using the following command:

datasette publish now \
    -n dams-datasette \
    --alias dams \
    --title="National Inventory of Dams" \
    --source_url="https://nid-test.sec.usace.army.mil/" \
    --install=datasette-cluster-map \
    --install=datasette-vega \
    dams.db
import requests
import pandas as pd
import sqlite3
dam_urls = [
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=AK_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=AL_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=AR_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=AZ_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=CA_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=CO_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=CT_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=DE_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=FL_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=GA_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=GU_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=HI_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=IA_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=ID_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=IL_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=IN_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=KS_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=KY_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=LA_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=MA_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=MD_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=ME_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=MI_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=MN_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=MO_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=MS_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=MT_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=NC_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=ND_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=NE_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=NH_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=NID2018_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=NJ_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=NM_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=NV_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=NY_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=OH_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=OK_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=OR_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=PA_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=PR_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=RI_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=SC_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=SD_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=TN_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=TX_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=UT_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=VA_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=VT_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=WA_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=WI_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=WV_U.xlsx",
"https://nid-test.sec.usace.army.mil/ords/NID_R.downloadFile?InFileName=WY_U.xlsx",
]
dataframes = []
for dam_url in dam_urls:
if "NID2018_U" in dam_url:
# This one causes dupes of every dam
continue
dataframes.append(pd.read_excel(dam_url))
combined = pd.concat(dataframes)
conn = sqlite3.connect("dams.db")
combined.to_sql("dams", conn)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment