Skip to content

Instantly share code, notes, and snippets.

@dfolch
Created March 17, 2023 20:37
Show Gist options
  • Save dfolch/713354bca5a5831da0d9bb814cb75d6e to your computer and use it in GitHub Desktop.
Save dfolch/713354bca5a5831da0d9bb814cb75d6e to your computer and use it in GitHub Desktop.
Python Workflow for Converting Geotab Logs into Line Geometry
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "2bcf605e-f6d1-4e7b-908d-8c4e440a1a5d",
"metadata": {},
"source": [
"# Python Workflow for Converting Geotab Logs into Line Geometry"
]
},
{
"cell_type": "markdown",
"id": "0ba70ff5-7d14-4f97-8e5e-70d7ce576862",
"metadata": {},
"source": [
"## This workflow demonstrates:\n",
"- Simple and bulk downloading of vehicle, trip and GPS data from the Geotab Data Feed API\n",
"- Extracting and combining the GPS data into line geometry\n",
"- Collecting other useful information such as trip origin lat/lon, trip destination lat/lon and converting times to local time zone\n",
"- Exporting the results for use in a desktop GIS package like ArcGIS or QGIS\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "241b3033-8a56-446d-95ec-b0b1a9a8c6e9",
"metadata": {},
"source": [
"### Getting started"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bcd3d05b-6c1b-4be7-acc9-63263ae0cf98",
"metadata": {},
"outputs": [],
"source": [
"from dateutil import rrule\n",
"from datetime import datetime\n",
"import time"
]
},
{
"cell_type": "markdown",
"id": "aa311e1a-12c6-4172-a479-e3287fb2c7dd",
"metadata": {},
"source": [
"Authenticate with the Geotab Data Feed API. You need to already have a Geotab account."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e7588b1c-ae90-4a8b-bfa3-d2ccc3d7c001",
"metadata": {},
"outputs": [],
"source": [
"from mygeotab.ext import entitylist as gtel\n",
"client = gtel.API(username='your.email@address.com', password='your_password', database='your_database_name')\n",
"client.authenticate()"
]
},
{
"cell_type": "markdown",
"id": "626fd0fc-77af-45c1-964f-1290fb44cdd6",
"metadata": {},
"source": [
"---\n",
"### Examples of small API Queries"
]
},
{
"cell_type": "markdown",
"id": "701fc500-a5b5-4152-a9fc-65f041960ac2",
"metadata": {},
"source": [
"#### Vehicles"
]
},
{
"cell_type": "markdown",
"id": "420f9c5d-957a-497c-895b-3f49a76c55de",
"metadata": {},
"source": [
"Get details on each vehicle in your database and convert to a Pandas DataFrame."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cfb8a9eb-6da8-45f0-aac6-16a6eedae41f",
"metadata": {},
"outputs": [],
"source": [
"vehicles = client.get('Device').to_dataframe(normalize=True)"
]
},
{
"cell_type": "markdown",
"id": "b9aa51a8-105f-4fda-b2d5-4a90aa0b8398",
"metadata": {},
"source": [
"Write vehicle data to CSV."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "22727303-3fe5-414e-bc08-7df39383fc8f",
"metadata": {},
"outputs": [],
"source": [
"vehicles.to_csv('vehicles.csv', index=False)"
]
},
{
"cell_type": "markdown",
"id": "bd20bc93-b639-4dd5-8564-b30201e03f6c",
"metadata": {},
"source": [
"#### Trips"
]
},
{
"cell_type": "markdown",
"id": "690034ac-c26d-41ef-818e-4a913e4943e4",
"metadata": {},
"source": [
"Get trips over a date range and convert to Pandas DataFrame."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cce60ebf-6bd1-4d9f-9e9b-099b61a92e6b",
"metadata": {},
"outputs": [],
"source": [
"trips = client.get('Trip', FromDate='2023-01-01 00:00:00', ToDate='2023-02-01 00:00:00')\n",
"trips_df = trips.to_dataframe(normalize=True)"
]
},
{
"cell_type": "markdown",
"id": "656bd823-617e-4bbc-a49b-951faa330a4e",
"metadata": {},
"source": [
"Get trips from some start date, but constrain the total results you get back from the API. Convert to Pandas DataFrame."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2734bbc5-82dc-4f24-b2e5-b906d80982ed",
"metadata": {},
"outputs": [],
"source": [
"trips = client.get('Trip', FromDate='2018-01-01 00:00:00', resultsLimit=1000)\n",
"trips_df = trips.to_dataframe(normalize=True)"
]
},
{
"cell_type": "markdown",
"id": "2a249515-3ec2-4550-9d25-a175848c0041",
"metadata": {},
"source": [
"Write trips data to CSV."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5665e357-51d4-4a8b-885c-30188e7c60fc",
"metadata": {},
"outputs": [],
"source": [
"trips_df.to_csv('trips_test.csv', index=False)"
]
},
{
"cell_type": "markdown",
"id": "0bc9de39-85f8-4e54-b54f-3b6663e07882",
"metadata": {},
"source": [
"#### Logs"
]
},
{
"cell_type": "markdown",
"id": "a44e9ee6-7292-4a28-97f5-d2b572dd9490",
"metadata": {},
"source": [
"Logs contain each GPS click recorded for a vehicle. There are not many attributes, but there are many rows since the GPS might be clicking every few seconds. This query gets logs over a date range and converts to a Pandas DataFrame."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e219c3cc-a28f-485d-b526-4bef41309853",
"metadata": {},
"outputs": [],
"source": [
"logs = client.get('LogRecord', FromDate='2023-01-01 00:00:00', ToDate='2023-02-01 00:00:00')\n",
"logs_df = logs.to_dataframe(normalize=True)"
]
},
{
"cell_type": "markdown",
"id": "9a2c50c7-90a2-4c89-8184-bc15b126edfa",
"metadata": {},
"source": [
"Write log data to CSV."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7bad04bd-17e7-45c3-bf51-f71b8643b75b",
"metadata": {},
"outputs": [],
"source": [
"logs_df.to_csv('logs_test.csv', index=False)"
]
},
{
"cell_type": "markdown",
"id": "f019b882-8ca4-46e3-8c7f-3ed796675b88",
"metadata": {},
"source": [
"---\n",
"### Example of bulk API query"
]
},
{
"cell_type": "markdown",
"id": "9b584e8b-d339-434d-9938-29c35d37df4b",
"metadata": {},
"source": [
"The API constrains the amount of data that can be downloaded in one query. The process below downloads trips and logs one month at a time from October 2020 to March 2023."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ed28171f-ba8a-4b4d-824b-1679652a370c",
"metadata": {},
"outputs": [],
"source": [
"### Date range setup ###\n",
"start_date = datetime(2020, 10, 1)\n",
"#end_date = datetime(2021, 1, 1) # for testing\n",
"end_date = datetime(2023, 3, 1)\n",
"\n",
"months = rrule.rrule(rrule.MONTHLY, dtstart=start_date, until=end_date)\n",
"months = [i for i in months]\n",
"\n",
"### Loop over each month to download data ###\n",
"trips = 'initial value'\n",
"logs = 'initial value'\n",
"for start_idx, start in enumerate(months[:-1]):\n",
" stop = months[start_idx+1]\n",
" print(start, stop) \n",
" \n",
" # Get trips and write to CSV\n",
" del trips\n",
" counter = 1\n",
" trips = client.get('Trip', FromDate=start, ToDate=stop)\n",
" while 'trips' not in locals(): # wait for API to finish returning results\n",
" time.sleep(1)\n",
" if counter % 10 == 0:\n",
" print(counter, 'seconds trips') # feedback every 10 seconds if API download is slow\n",
" counter += 1\n",
" trips_df = trips.to_dataframe(normalize=True)\n",
" trips_df.to_csv('trips/trips_'+str(start.month)+'_'+str(start.year)+'.csv', index=False)\n",
" \n",
" # Get logs and write to CSV\n",
" del logs\n",
" counter = 1\n",
" logs = client.get('LogRecord', FromDate=start, ToDate=stop)\n",
" while 'logs' not in locals(): # wait for API to finish returning results\n",
" time.sleep(1)\n",
" if counter % 10 == 0:\n",
" print(counter, 'seconds logs') # feedback every 10 seconds if API download is slow\n",
" counter += 1 \n",
" logs_df = logs.to_dataframe(normalize=True)\n",
" logs_df.to_csv('logs/logs_'+str(start.month)+'_'+str(start.year)+'.csv', index=False)"
]
},
{
"cell_type": "markdown",
"id": "fa6b5492-b6a7-4f27-a1fa-e3fdf42554bb",
"metadata": {},
"source": [
"---\n",
"### Processing spatial data"
]
},
{
"cell_type": "markdown",
"id": "dd25f04f-6b8c-4122-ad0d-99eb760a5e29",
"metadata": {},
"source": [
"#### Imports\n",
"\n",
"Requires spatial packages. We recommend creating a separate Conda environment for these packages and getting them from the `conda-forge` channel."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a8cc9b0a-dd77-476c-82d2-009585e408f1",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.environ['USE_PYGEOS'] = '0' # chooses backend spatial data toolkit (optional)\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import geopandas as gpd\n",
"from shapely import LineString"
]
},
{
"cell_type": "markdown",
"id": "dfe0042a-97d7-42b4-949d-181ac49de1f8",
"metadata": {},
"source": [
"#### Read in one month of data\n",
"This assumes you are reading in the CSV files downloaded from the API in the earlier steps."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ea7fe448-5d09-4a99-8ca4-36bcbd88f4eb",
"metadata": {},
"outputs": [],
"source": [
"month = 2\n",
"year = 2023\n",
"\n",
"trips = pd.read_csv('trips/trips_'+str(month)+'_'+str(year)+'.csv')\n",
"logs = pd.read_csv('logs/logs_'+str(month)+'_'+str(year)+'.csv')"
]
},
{
"cell_type": "markdown",
"id": "a15840bd-f71c-4b4b-90e7-20f1dd3ca4b7",
"metadata": {},
"source": [
"#### Read in many months of data"
]
},
{
"cell_type": "markdown",
"id": "382a23ae-7649-4320-b2ff-c84059093300",
"metadata": {},
"source": [
"This combines all the trips into one Pandas DataFrame and all the logs into another. This is especially important for the logs CSV files in case some trips span two months (e.g., started 11:45pm Jan 31 and ended 12:25am Feb 1). Combining the trips CSVs is less important."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0ced10dc-0908-408c-939a-f6f6a68be378",
"metadata": {},
"outputs": [],
"source": [
"# List of lists with year, start month, end month\n",
"times = [[2020, 10, 12],\n",
" [2021, 1, 12],\n",
" [2022, 1, 12],\n",
" [2023, 1, 2]]\n",
"\n",
"# Create empty dataframes\n",
"trips = pd.DataFrame()\n",
"logs = pd.DataFrame()\n",
"\n",
"# Loop over each year\n",
"for year, start, stop in times:\n",
" # Loop over each month in the year\n",
" for month in range(start, stop+1):\n",
" # Read in one month's trips and append to running total\n",
" trips_temp = pd.read_csv('trips/trips_'+str(month)+'_'+str(year)+'.csv')\n",
" trips = pd.concat([trips, trips_temp])\n",
" # Read in one month's logs and append to running total\n",
" logs_temp = pd.read_csv('logs/logs_'+str(month)+'_'+str(year)+'.csv')\n",
" logs = pd.concat([logs, logs_temp])\n",
"\n",
"# Cleanup the dataframe indexes\n",
"trips = trips.reset_index(drop=True)\n",
"logs = logs.reset_index(drop=True)"
]
},
{
"cell_type": "markdown",
"id": "0f14c99c-9fcc-4c63-8400-f0fe2469a047",
"metadata": {},
"source": [
"#### Append line geometry to each trip"
]
},
{
"cell_type": "markdown",
"id": "2d116489-d6d5-4871-9463-9ea44e232553",
"metadata": {},
"source": [
"The general logic is:\n",
"- For each trip, extract the vehicle ID, start time and stop time for the trip\n",
"- Use these three values to extract all the logs for that vehicle between the start time and stop time\n",
"- Each log has a lat/lon; string these lat/lons together in time order to form the line for the trip\n",
"- Append the line back to the trip\n",
"\n",
"Other details:\n",
"- Instead of using a for loop to do this, we use the `apply` method of a Pandas DataFrame\n",
"- We use a global counter to keep track of how many trips have been completed; and print out intermediate feedback"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2e23192c-2f34-45a4-977d-2f1d12ecf849",
"metadata": {},
"outputs": [],
"source": [
"# Counter to keep track of the number of trips processed\n",
"counter = 0\n",
"\n",
"# Function to be applied to each trip\n",
"def get_line(trip):\n",
" # Get the logs for a particular vehicle between two time points\n",
" segs = logs.loc[(logs['device.id']==trip['device.id']) & \n",
" (logs['dateTime']>=trip['start']) &\n",
" (logs['dateTime']<=trip['stop']),]\n",
" # Create a GIS line using the Shapley package\n",
" if segs.shape[0] > 1: # a line needs at least two verticies\n",
" line = LineString(zip(segs['longitude'], segs['latitude']))\n",
" else: # for a non-line, use Numpy NaN value\n",
" line = np.NaN \n",
" # Give feedback on overall progress\n",
" global counter\n",
" counter += 1 # increment for each trip processed\n",
" if counter % 1000 == 0: # print feedback for every 1000 trips processed\n",
" print('completed', counter)\n",
" return line\n",
"\n",
"# Create a copy of the trips dataframe\n",
"#trips_spatial = trips.loc[0:10,:].copy() # for testing\n",
"trips_spatial = trips.copy()\n",
"\n",
"# Get the line geometry for each trip\n",
"geoms = trips_spatial.apply(get_line, axis=1)\n",
"\n",
"# Append the line geometry to each trip and\n",
"# convert from Pandas DataFrame to GeoPandas GeoDataFrame\n",
"trips_spatial = gpd.GeoDataFrame(trips_spatial, geometry=geoms, crs='epsg:4326')"
]
},
{
"cell_type": "markdown",
"id": "29844b94-c70f-49ce-be42-b30562031f0e",
"metadata": {},
"source": [
"#### Extract some other useful data\n",
"- Get start lat/lon\n",
"- Get stop lat/lon\n",
"- Convert from UTC time to local time; in this example the Arizona time zone (Arizona has a special time zone since it does not observe daylight saving time)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7a44444-6202-493d-9a11-1bb6ad24cc62",
"metadata": {},
"outputs": [],
"source": [
"# Get the start lat/lon\n",
"trips_spatial['start_lon'] = [line.coords[0][0] for line in trips_spatial.geometry.to_list()]\n",
"trips_spatial['start_lat'] = [line.coords[0][1] for line in trips_spatial.geometry.to_list()]\n",
"# Get the stop lat/lon\n",
"trips_spatial['stop_lon'] = [line.coords[-1][0] for line in trips_spatial.geometry.to_list()]\n",
"trips_spatial['stop_lat'] = [line.coords[-1][1] for line in trips_spatial.geometry.to_list()]\n",
"\n",
"# Convert the start time to local time\n",
"trips_spatial['start_az'] = pd.to_datetime(trips_spatial['start'])\n",
"trips_spatial['start_az'] = trips_spatial['start_az'].dt.tz_convert('US/Arizona')\n",
"# Convert the stop time to local time\n",
"trips_spatial['stop_az'] = pd.to_datetime(trips_spatial['stop'])\n",
"trips_spatial['stop_az'] = trips_spatial['stop_az'].dt.tz_convert('US/Arizona')"
]
},
{
"cell_type": "markdown",
"id": "82115b91-37b4-491e-87c1-ba01831e2779",
"metadata": {},
"source": [
"#### Write spatial data to hard drive\n",
"\n",
"A GeoPandas GeoDataFrame can be written to many spatial formats. The code below shows GeoJSON and shapefile."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce1e6c1a-02be-43cc-bd54-cf4d6029a408",
"metadata": {},
"outputs": [],
"source": [
"#trips_spatial.to_file('trips_with_geom.shp', index=False)\n",
"trips_spatial.to_file('trips_with_geom.geojson', index=False)"
]
},
{
"cell_type": "markdown",
"id": "a9d7994a-78b0-4432-9a91-786eaca5889d",
"metadata": {},
"source": [
"---\n",
"## Additional Resources\n",
"\n",
"This workflow is based on information from the following sources.\n",
"\n",
"#### API details\n",
"- https://geotab.github.io/sdk/software/api/reference/\n",
"\n",
"#### Python package linking to API\n",
"- https://github.com/Geotab/mygeotab-python\n",
"- https://mygeotab-python.readthedocs.io/en/latest/\n",
"\n",
"#### Inspiration \n",
"- https://community.geotab.com/s/question/0D52J00008fcWCJSA2/how-to-export-polylines-of-triphistory-as-shpkml-or-gpx?language=en_US\n",
"- https://www.geotab.com/video/building-a-mygeotab-addin-from-scratch/\n",
"- https://github.com/geotab/addin-mapreplay\n",
"- https://community.geotab.com/s/article/Are-there-limits-for-API-usage?language=en_US"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment