Microsoft US Building Footprints are available at:
https://github.com/Microsoft/USBuildingFootprints
The state-based downloads are in .geojson format, which is a pretty bad choice for datasets this large. Bad because .geojson has no spatial index, making it very slow to load and render. So it will be useful to split and save them into a separate file (geopackage or shapefile) for each county. Also, some of the polygons have invalid geometries that we should fix along the way.
Below are instructions for splitting New York into counties using QGIS
- Download buildings from https://usbuildingdata.blob.core.windows.net/usbuildings-v1-1/NewYork.zip
- Download US counties from https://www2.census.gov/geo/tiger/TIGER2019/COUNTY/tl_2019_us_county.zip
Be sure to unzip both .zip files, then load both into QGIS (the buildings may take a couple minutes).
The MS Buildings polygons may have some invalid geometries that will cause problems later on, so let's fix them.
- Processing toolbox > Fix Geometries (just output to temp layer)
We can make the join process much faster by selecting just the New York counties.
-
In the toolbar, click the yellowish "Select Features by Value" tool
- select where STATEFP = 36
-
Processing toolbox > Join Attributes by Location
- Input layer = Fixed Geometries
- Join layer = tl_2019_us_county
- Check "Selected features only" for the counties
- Geometric predicate = "intersects"
- Fields to add -- click "...' select "NAME" (which is the county name)
- Join type = "Create separate feature for each located feature (one-to-many)"
- Output to temp layer
- Click "Run" -- you'll see a warning about "no spatial index exists for input layer" but that's okay since it needs to loop through every feature anyway. (Plus, by the time you create the index, you won't save that much time.)
By choosing a one-to-many join, any buildings that cross a boundary will be copied, so that the building will be included in both county files in the end.
It will be helpful to have an arbitrary id column, so that we'll still have at least one column left if we want to remove the county column after splitting. By saving to a geopackage file, an "fid" column will automatically be added.
- Right-click "Joined layer" > Make Permanent...
- Format = GeoPackage
- File name -- click the "..." to specify the output location and save as "joined.gpkg"
- leave the other settings as they are and click "OK"
- Processing toolbox > Split vector layer
- Input layer = Joined layer
- Unique ID field = "NAME"
- Click the "..." to specify the output directory (create and save to a directory called "split")
Before running, notice that there is no place to specify the output file type. It will be whatever the default vector extension is set to in your QGIS settings.
- Settings menu > Options
- Click the "processing" tab on the left
- Search (at top left) for "default"
- Set the "Default output vector layer extension" to whatever you want (probably gpkg or shp)
Finally, go back to your "Split vector layer" dialog and click "Run".
Watch out -- there might be an extra, empty shapefile for "NULL" county, which can be deleted.
Now that the buildings are split into multiple files, the "NAME" column that contains the county name is rather unnecessary. We can delete this column from all the files using a batch process.
- Processing toolbox > Drop field(s)
- Click "Run as Batch Process..." (in the bottom left)
- Under "Input layer", click "Autofill..." > "Add Files by Pattern..."
- File pattern =
*.gpkg
(or*.shp
if using shapefiles) - Look in = select your "split" directory
- Click "Find Files"
- OK
This adds a row for each file that will be processed.
- Under "Fields to drop", click the "..." in the first row
- Select the "NAME" column
- Click "Autofill..." > Fill Down
This copies that option to all the rows. Now we need to set the output filename.
- Under "Remaining fields", click "Autofill..." > Calculate by expression
- Use this expression:
regexp_replace( @INPUT , 'NAME_', 'msbuildings_ny_')
- Click OK, then Run
This should give us a nicely-named file for each county.
- Delete all the old files like "NAME_Albany.gpkg"