Skip to content

Instantly share code, notes, and snippets.

@scottsbaldwin
Last active March 21, 2024 06:01
Show Gist options
  • Save scottsbaldwin/df061d49e2f61f2a55e5ebcc9dcf7ce1 to your computer and use it in GitHub Desktop.
Save scottsbaldwin/df061d49e2f61f2a55e5ebcc9dcf7ce1 to your computer and use it in GitHub Desktop.
Cleanup GeoJSON for Import into Elasticsearch

Santize GeoJSON for Elasticsearch

Santize States

JSON=/Users/scott/src/geojson/us-states-20m.json
TARGET=data/prod-es/states.json
cat $JSON | jq -c '.features[] | { _index: "service-areas", _type: "state", _id: .properties.STATE , _source: { id: .properties.STATE, geoId: .properties.GEO_ID, serviceAreaType: "State", name: .properties.NAME, stateId: .properties.STATE, geometry: .geometry }}' > $TARGET

Sample State Schema

{
    "_index": "service-areas",
    "_type": "state",
    "_id": "04",
    "_source": {
        "geoId": "0400000US04",
        "serviceAreaType": "State",
        "name": "Arizona",
        "stateId": "04",
        "geometry": {
            "type": "Polygon",
            "coordinates": [
                [
                    [
                        -112.538593,
                        37.000674
                    ],
                    [
                        -112.534545,
                        37.000684
                    ],
                    ...
                ]
            ]
        }
    }
}

Sanitize Counties

JSON=/Users/scott/src/geojson/us-counties-20m.json
TARGET=data/prod-es/counties.json
cat $JSON | jq -c '.features[] | { _index: "service-areas", _type: "county", _id: .properties.GEO_ID , _source: { id: .properties.GEO_ID, geoId: .properties.GEO_ID, serviceAreaType: "County", name: .properties.NAME, stateId: .properties.STATE, countyId: .properties.COUNTY, geometry: .geometry }}' > $TARGET

Inject State Names into Counties Schema

Use this ruby script to inject the state names into the counties file.

require 'json'

# ----------------------------------
# Load up the state names
# ----------------------------------
path = 'data/prod-es/states.json'
lines = File.readlines(path)
states = {}
lines.each do |l|
  json = JSON.parse(l)
  states[json["_id"]] = json["_source"]["name"]
end

# ----------------------------------
# Inject the state names
# ----------------------------------
path = 'data/prod-es/counties.json'
lines = File.readlines(path)
counties = []
lines.each do |l|
  json = JSON.parse(l)
  json["_source"]["stateNameForCounty"] = states[json["_source"]["stateId"]]
  counties << json.to_json
end

outpath = 'data/prod-es/counties-with-state-names.json'
File.open(outpath, 'w') do |f|
  f.puts(counties)
end

Sample County Schema

{
    "_index": "service-areas",
    "_type": "county",
    "_id": "0500000US01001",
    "_source": {
        "id": "0500000US01001",
        "geoId": "0500000US01001",
        "serviceAreaType": "County",
        "name": "Autauga",
        "stateId": "01",
        "countyId": "001",
        "geometry": {
            "type": "Polygon",
            "coordinates": [
                [
                    [
                        -86.496774,
                        32.344437
                    ],
                    [
                        -86.717897,
                        32.402814
                    ],
                    ...
                ]
            ]
        },
        "stateNameForCounty": "Alabama"
    }
}

Importing the files

EXPORT_PATH=/Users/scott/src/scripts/data/prod-es
mkdir -p ${EXPORT_PATH}
MOUNT_POINT=/data

TARGET=http://localhost:9200
docker run --net=host --rm -ti -v ${EXPORT_PATH}:${MOUNT_POINT} taskrabbit/elasticsearch-dump \
	--input=${MOUNT_POINT}/states.json \
	--output=${TARGET}/service-areas \
	--type=data

docker run --net=host --rm -ti -v ${EXPORT_PATH}:${MOUNT_POINT} taskrabbit/elasticsearch-dump \
	--input=${MOUNT_POINT}/counties-with-state-names.json \
	--output=${TARGET}/service-areas \
	--type=data

Queries

Get State Names

curl -s "localhost:9200/service-areas/state/_search?size=100&sort=name:asc"| jq -c '.hits.hits[] | { name: ._source.name, type: ._source.serviceAreaType }'

# OR

curl -s "localhost:9200/service-areas/_search?size=100&sort=name:asc&q=_type:state"| jq -c '.hits.hits[] | { name: ._source.name, type: ._source.serviceAreaType }'

# OR

curl -s "localhost:9200/service-areas/_search?size=100&sort=name:asc&q=serviceAreaType:State"| jq -c '.hits.hits[] | { name: ._source.name, type: ._source.serviceAreaType }'

Get Counties for a State

STATE=49
curl -s "localhost:9200/service-areas/county/_search?size=1000&sort=name:asc&q=stateId:${STATE}"| jq -c '.hits.hits[] | { id: ._source.id, name: ._source.name, type: ._source.serviceAreaType }'

# OR

curl -s "localhost:9200/service-areas/_search?size=1000&sort=name:asc&q=stateId:${STATE}%20AND%20_type:county"| jq -c '.hits.hits[] | { id: ._source.id, name: ._source.name, type: ._source.serviceAreaType }'

# OR

curl -s "localhost:9200/service-areas/_search?size=1000&sort=name:asc&q=stateId:${STATE}%20AND%20serviceAreaType:county"| jq -c '.hits.hits[] | { id: ._source.id, name: ._source.name, type: ._source.serviceAreaType }'

Get Geometry for a County

STATE=49
COUNTY=Davis
curl -s "localhost:9200/service-areas/_search?q=name:${COUNTY}%20AND%20stateId=${STATE}%20AND%20_type:county"| jq -c '.hits.hits[] | { id: ._source.id, name: ._source.name, geometry: ._source.geometry }'

Get Geometry for a State

NAME=Idaho
curl -s "localhost:9200/service-areas/_search?q=name:${NAME}%20AND%20_type:state"| jq -c '.hits.hits[] | { id: ._source.id, name: ._source.name, geometry: ._source.geometry }'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment