Skip to content

Instantly share code, notes, and snippets.

@boxfrommars
Last active January 31, 2020 01:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save boxfrommars/81f18213aeaa9722f9b5bbc4c874bebf to your computer and use it in GitHub Desktop.
Save boxfrommars/81f18213aeaa9722f9b5bbc4c874bebf to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from math import radians, sin, cos, asin, acos, sqrt, pi, atan, atan2, fabs\n",
"from time import time\n",
"import geopy.distance\n",
"import pyproj\n",
"from geographiclib.geodesic import Geodesic, Constants\n",
"from pygeodesy.ellipsoidalVincenty import LatLon\n",
"\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"EARTH_MEAN_RADIUS = geopy.distance.EARTH_RADIUS * 1000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Расстояние между двумя геоточками. Сравнение алгоритмов и их реализаций"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Есть два класса алгоритмов, рассчитывающих расстояние между двумя точками: **Great-circle**-функции (находят кратчайшее расстояние на сфере) и **Geodesic**-функции (на эллипсоиде)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Great Circle\n",
"\n",
"* https://en.wikipedia.org/wiki/Great-circle_distance\n",
"* https://en.wikipedia.org/wiki/Haversine_formula\n",
"\n",
"Все формулы для расчёта Great-circle расстояния -- точные, то есть теоретически дают одинаковый результат, но вычислительно разные. Так например `cosine`-функция [вычислительно нестабильна](https://dtcenter.org/met/users/docs/write_ups/gc_simple.pdf) при небольших расстояниях между точками. `haversine`-функции вычислительно оптимизированы и лишены этого недостатка. `geopy_great_circle` использует `atan`-версию (`haversine_2`) функции для расчёта расстояния"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def cosine(point1, point2):\n",
" lon1, lat1 = (radians(coord) for coord in point1)\n",
" lon2, lat2 = (radians(coord) for coord in point2)\n",
" \n",
" return acos(\n",
" sin(lat1) * sin(lat2) + cos(lat1) * cos(lat2) * cos(lon1 - lon2)\n",
" ) * EARTH_MEAN_RADIUS"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def haversine_1(point1, point2):\n",
" lon1, lat1 = (radians(coord) for coord in point1)\n",
" lon2, lat2 = (radians(coord) for coord in point2)\n",
" dlat = (lat2 - lat1)\n",
" dlon = (lon2 - lon1)\n",
" a = (sin(dlat * 0.5)**2 + cos(lat1) * cos(lat2) * sin(dlon * 0.5)**2)\n",
"\n",
" return 2 * EARTH_MEAN_RADIUS * asin(sqrt(a))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def haversine_2(point1, point2):\n",
" lon1, lat1 = (radians(coord) for coord in point1)\n",
" lon2, lat2 = (radians(coord) for coord in point2)\n",
"\n",
" dlon = fabs(lon1 - lon2)\n",
" dlat = fabs(lat1 - lat2)\n",
"\n",
" numerator = sqrt(\n",
" (cos(lat2)*sin(dlon))**2 +\n",
" ((cos(lat1)*sin(lat2)) - (sin(lat1)*cos(lat2)*cos(dlon)))**2)\n",
"\n",
" denominator = (\n",
" (sin(lat1)*sin(lat2)) +\n",
" (cos(lat1)*cos(lat2)*cos(dlon)))\n",
"\n",
" c = atan2(numerator, denominator)\n",
" return EARTH_MEAN_RADIUS * c"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def geopy_great_circle(point1, point2): \n",
" # https://geopy.readthedocs.io/en/stable/#geopy.distance.great_circle\n",
" return geopy.distance.great_circle(point1[::-1], point2[::-1]).meters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Geodesic\n",
"\n",
"Гедезические-функции находят кратчайшее расстояние на эллипсоиде. Распространены два итеративных алгоритма нахождения такого расстояния: **Vincenty** и **Karney**. Оба алгоритма сверхточны (погрешность в несколько микрон для Vincenty и несколько нанометров для Carney по сравнению с настоящим наикратчайшим расстоянием на эллипсоиде)\n",
"\n",
"**Vincenty** может не сходится для точек, расположенных в противолоположных областях эллипсоида: по тестам от *Geopy* -- для 20 пар точек из 1000, случайно взятых на сфере алгоритм завершается с ошибкой\n",
"\n",
"**Karney** лишён этого недостатка и поэтому сейчас используется почти везде. Но он медленнее\n",
"\n",
"И *Geopy Geodesic* и *Proj Geodesic* и собственно *GeographicLib Geodesic* использует *GeographicLib* реализацию. Но что важно *Geopy Geodesic* как и сам *GeographicLib Geodesic* используют пайтон-реализацию, а *Proj Geodesic* -- C-реализацию алгоритма. Что явно сказывается на скорости расчётов\n",
"\n",
"* https://en.wikipedia.org/wiki/Geodesics_on_an_ellipsoid#Equations_for_a_geodesic\n",
"* https://en.wikipedia.org/wiki/Vincenty%27s_formulae (Vincenty)\n",
"* https://link.springer.com/content/pdf/10.1007/s00190-012-0578-z.pdf (Karney)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Geopy Geodesic Karney"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"def geopy_geodesic(point1, point2): \n",
" # https://geopy.readthedocs.io/en/stable/#geopy.distance.geodesic\n",
" # https://geographiclib.sourceforge.io/html/python/_modules/geographiclib/geodesic.html#Geodesic\n",
" # https://github.com/geopy/geopy/blob/master/geopy/distance.py#L409\n",
" return geopy.distance.geodesic(point1[::-1], point2[::-1]).meters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### GeographicLib Geodesic Karney"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"def geographiclib_geodesic(point1, point2): \n",
" # https://geographiclib.sourceforge.io/html/python/code.html#module-geographiclib.geodesic\n",
" # https://geographiclib.sourceforge.io/html/python/_modules/geographiclib/geodesic.html#Geodesic\n",
" lon1, lat1 = (coord for coord in point1)\n",
" lon2, lat2 = (coord for coord in point2)\n",
" \n",
" return Geodesic.WGS84.Inverse(lat1, lon1, lat2, lon2)['s12']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Proj Geodesic Karney"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def proj_geodesic(point1, point2): \n",
" # https://pyproj4.github.io/pyproj/dev/examples.html#geodesic-calculations\n",
" # https://proj.org/geodesic.html#background\n",
" _az12, _az21, distance = pyproj.Geod(ellps='WGS84').inv(*list(point1 + point2))\n",
" \n",
" return distance\n",
"\n",
"geod = pyproj.Geod(ellps='WGS84')\n",
"\n",
"def proj_geodesic_preinit(point1, point2): \n",
" _az12, _az21, distance = geod.inv(*list(point1 + point2))\n",
"\n",
" return distance"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Geopy Geodesic Vincenty"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"def geopy_vincenty(point1, point2): \n",
" # Deprecated!\n",
" # https://geopy.readthedocs.io/en/stable/#geopy.distance.vincenty\n",
" return geopy.distance.vincenty(point1[::-1], point2[::-1]).meters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### PyGeodesy Geodesic Vincenty"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"def pygeodesy_vincenty(point1, point2):\n",
" # https://mrjean1.github.io/PyGeodesy/docs/pygeodesy.ellipsoidalVincenty-module.html\n",
" lon1, lat1 = (coord for coord in point1)\n",
" lon2, lat2 = (coord for coord in point2)\n",
" \n",
" return LatLon(lat1, lon1).distanceTo(LatLon(lat2, lon2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Non exact spherical"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"def nortsouth_eastweast(point1, point2):\n",
" lon1, lat1 = (radians(coord) for coord in point1)\n",
" lon2, lat2 = (radians(coord) for coord in point2)\n",
" \n",
" NS = lat1 - lat2\n",
" EW = (lon1 - lon2) * cos(lat1)\n",
" return sqrt(NS**2 + EW**2) * EARTH_MEAN_RADIUS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tests"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"def test_distance(method, points_pair, rep=10000):\n",
" t = time()\n",
" for i in range(rep):\n",
" distance = method(points_pair[0], points_pair[1])\n",
" \n",
" return ((time() - t) * 1e6) / rep\n",
"\n",
"# ~ 700km\n",
"p_minsk = (27.561831, 53.902257)\n",
"p_moscow = (37.620393, 55.75396)\n",
"\n",
"# ~ 4km\n",
"p_kremlin = (37.618044, 55.751959)\n",
"p_office = (37.651293, 55.723882)\n",
"\n",
"p_mgnt = (37.650601, 55.724158)\n",
"p_efshk = (37.650625, 55.724151)\n",
"\n",
"def test_wrap(method, method_name):\n",
" return {\n",
" 'name': method_name, \n",
" 'minsk_msk_d': method(p_minsk, p_moscow),\n",
" 'krmln_ofc_d': method(p_kremlin, p_office),\n",
" 'mgnt_efshk_d': method(p_mgnt, p_efshk),\n",
" 'time_ms': round((test_distance(method, [p_minsk, p_moscow]) + \n",
" test_distance(method, [p_kremlin, p_office]) +\n",
" test_distance(method, [p_mgnt, p_efshk])\n",
" ) / 3, 3)\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/fasten806/miniconda3/envs/geo/lib/python3.7/site-packages/ipykernel_launcher.py:4: DeprecationWarning: Vincenty is deprecated and is going to be removed in geopy 2.0. Use `geopy.distance.geodesic` (or the default `geopy.distance.distance`) instead, which is more accurate and always converges.\n",
" after removing the cwd from sys.path.\n"
]
}
],
"source": [
"metrics = []\n",
"\n",
"metrics.append(test_wrap(haversine_1, 'Great-сircle Haversine 1'))\n",
"metrics.append(test_wrap(haversine_2, 'Great-circle Haversine 2'))\n",
"# metrics.append(test_wrap(nortsouth_eastweast, 'NS-EW'))\n",
"metrics.append(test_wrap(geopy_great_circle, 'Geopy Great-circle Haversine'))\n",
"metrics.append(test_wrap(cosine, 'Great-circle Cosine'))\n",
"metrics.append(test_wrap(proj_geodesic_preinit, 'Proj Geodesic Karney preinit'))\n",
"metrics.append(test_wrap(proj_geodesic, 'Proj Geodesic Karney'))\n",
"metrics.append(test_wrap(geopy_geodesic, 'Geopy Geodesic Karney'))\n",
"metrics.append(test_wrap(geographiclib_geodesic, 'GeographicLib Geodesic Karney'))\n",
"metrics.append(test_wrap(geopy_vincenty, 'Geopy Geodesic Vincenty'))\n",
"metrics.append(test_wrap(pygeodesy_vincenty, 'PyGeodesy Geodesic Vincenty'))\n",
"\n",
"metrics = pd.DataFrame(metrics)\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>minsk_msk_d</th>\n",
" <th>krmln_ofc_d</th>\n",
" <th>mgnt_efshk_d</th>\n",
" <th>time_µs</th>\n",
" <th>minsk_msk_err</th>\n",
" <th>krmln_ofc_err</th>\n",
" <th>mgnt_efshk_err</th>\n",
" <th>op/s</th>\n",
" <th>performance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Great-сircle Haversine 1</td>\n",
" <td>675656.320692</td>\n",
" <td>3752.236736</td>\n",
" <td>1.692539</td>\n",
" <td>2.240</td>\n",
" <td>0.315%</td>\n",
" <td>0.193%</td>\n",
" <td>0.295%</td>\n",
" <td>446429</td>\n",
" <td>1.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Great-circle Cosine</td>\n",
" <td>675656.320692</td>\n",
" <td>3752.236736</td>\n",
" <td>1.692941</td>\n",
" <td>2.414</td>\n",
" <td>0.315%</td>\n",
" <td>0.193%</td>\n",
" <td>0.272%</td>\n",
" <td>414250</td>\n",
" <td>1.08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Great-circle Haversine 2</td>\n",
" <td>675656.320692</td>\n",
" <td>3752.236736</td>\n",
" <td>1.692539</td>\n",
" <td>2.901</td>\n",
" <td>0.315%</td>\n",
" <td>0.193%</td>\n",
" <td>0.295%</td>\n",
" <td>344709</td>\n",
" <td>1.30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Proj Geodesic Karney preinit</td>\n",
" <td>677789.531233</td>\n",
" <td>3759.498757</td>\n",
" <td>1.697553</td>\n",
" <td>9.547</td>\n",
" <td>0.0%</td>\n",
" <td>0.0%</td>\n",
" <td>0.0%</td>\n",
" <td>104745</td>\n",
" <td>4.26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Proj Geodesic Karney</td>\n",
" <td>677789.531233</td>\n",
" <td>3759.498757</td>\n",
" <td>1.697553</td>\n",
" <td>15.693</td>\n",
" <td>0.0%</td>\n",
" <td>0.0%</td>\n",
" <td>0.0%</td>\n",
" <td>63723</td>\n",
" <td>7.01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Geopy Great-circle Haversine</td>\n",
" <td>675656.320692</td>\n",
" <td>3752.236736</td>\n",
" <td>1.692539</td>\n",
" <td>19.990</td>\n",
" <td>0.315%</td>\n",
" <td>0.193%</td>\n",
" <td>0.295%</td>\n",
" <td>50025</td>\n",
" <td>8.92</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Geopy Geodesic Vincenty</td>\n",
" <td>677789.531232</td>\n",
" <td>3759.498755</td>\n",
" <td>1.697552</td>\n",
" <td>31.222</td>\n",
" <td>0.0%</td>\n",
" <td>0.0%</td>\n",
" <td>0.0%</td>\n",
" <td>32029</td>\n",
" <td>13.94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>PyGeodesy Geodesic Vincenty</td>\n",
" <td>677789.531234</td>\n",
" <td>3759.498755</td>\n",
" <td>1.697552</td>\n",
" <td>73.785</td>\n",
" <td>0.0%</td>\n",
" <td>0.0%</td>\n",
" <td>0.0%</td>\n",
" <td>13553</td>\n",
" <td>32.94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>GeographicLib Geodesic Karney</td>\n",
" <td>677789.531233</td>\n",
" <td>3759.498757</td>\n",
" <td>1.697553</td>\n",
" <td>159.508</td>\n",
" <td>0.0%</td>\n",
" <td>0.0%</td>\n",
" <td>0.0%</td>\n",
" <td>6269</td>\n",
" <td>71.21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Geopy Geodesic Karney</td>\n",
" <td>677789.531233</td>\n",
" <td>3759.498757</td>\n",
" <td>1.697553</td>\n",
" <td>255.147</td>\n",
" <td>0.0%</td>\n",
" <td>0.0%</td>\n",
" <td>0.0%</td>\n",
" <td>3919</td>\n",
" <td>113.90</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name minsk_msk_d krmln_ofc_d mgnt_efshk_d \\\n",
"0 Great-сircle Haversine 1 675656.320692 3752.236736 1.692539 \n",
"1 Great-circle Cosine 675656.320692 3752.236736 1.692941 \n",
"2 Great-circle Haversine 2 675656.320692 3752.236736 1.692539 \n",
"3 Proj Geodesic Karney preinit 677789.531233 3759.498757 1.697553 \n",
"4 Proj Geodesic Karney 677789.531233 3759.498757 1.697553 \n",
"5 Geopy Great-circle Haversine 675656.320692 3752.236736 1.692539 \n",
"6 Geopy Geodesic Vincenty 677789.531232 3759.498755 1.697552 \n",
"7 PyGeodesy Geodesic Vincenty 677789.531234 3759.498755 1.697552 \n",
"8 GeographicLib Geodesic Karney 677789.531233 3759.498757 1.697553 \n",
"9 Geopy Geodesic Karney 677789.531233 3759.498757 1.697553 \n",
"\n",
" time_µs minsk_msk_err krmln_ofc_err mgnt_efshk_err op/s performance \n",
"0 2.240 0.315% 0.193% 0.295% 446429 1.00 \n",
"1 2.414 0.315% 0.193% 0.272% 414250 1.08 \n",
"2 2.901 0.315% 0.193% 0.295% 344709 1.30 \n",
"3 9.547 0.0% 0.0% 0.0% 104745 4.26 \n",
"4 15.693 0.0% 0.0% 0.0% 63723 7.01 \n",
"5 19.990 0.315% 0.193% 0.295% 50025 8.92 \n",
"6 31.222 0.0% 0.0% 0.0% 32029 13.94 \n",
"7 73.785 0.0% 0.0% 0.0% 13553 32.94 \n",
"8 159.508 0.0% 0.0% 0.0% 6269 71.21 \n",
"9 255.147 0.0% 0.0% 0.0% 3919 113.90 "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"minsk_msk_actual = 677789.5312325\n",
"\n",
"metrics['minsk_msk_err'] = (metrics['minsk_msk_d'] - minsk_msk_actual) / minsk_msk_actual\n",
"metrics['minsk_msk_err'] = abs(round(metrics['minsk_msk_err'] * 100, 3)).astype(str) + '%'\n",
"\n",
"krmln_ofc_actual = 3759.498757\n",
"\n",
"metrics['krmln_ofc_err'] = (metrics['krmln_ofc_d'] - krmln_ofc_actual) / krmln_ofc_actual\n",
"metrics['krmln_ofc_err'] = abs(round(metrics['krmln_ofc_err'] * 100, 3)).astype(str) + '%'\n",
"\n",
"mgnt_efshk_actual = 1.697553\n",
"\n",
"metrics['mgnt_efshk_err'] = (metrics['mgnt_efshk_d'] - mgnt_efshk_actual) / mgnt_efshk_actual\n",
"metrics['mgnt_efshk_err'] = abs(round(metrics['mgnt_efshk_err'] * 100, 3)).astype(str) + '%'\n",
"\n",
"metrics['op/s'] = round(1e6 / metrics['time_ms']).astype(int)\n",
"metrics['performance'] = round(metrics['time_ms'] / metrics.iloc[0]['time_ms'], 2)\n",
"\n",
"metrics.rename(columns={'time_ms': 'time_µs'}).sort_values(by='performance').reset_index(drop=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Где `performance` - во сколько раз алгоритм медленнее чем `Great Circle Haversine 1`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Выводы\n",
"\n",
"Great-circle алгоритмы дают погрешность не превышающие 0.5% при этом они в ~2.5 раза быстрее чем самый быстрый geodesic-алгоритм.\n",
"\n",
"Используемые сейчас **Geopy** версии алгоритмов дико медленные:\n",
"1. Из-за избыточной усложнённости кода (для расчитывания расстояния создаётся несколько объектов): Например `Geopy Great-circle` и `Great-circle Haversine 2` используют одинаковые формулы, но втоорая работает в 6 раз/на 20 микросекунд быстрее. Или `Geopy Geodesic` использует `GeographicLib Geodesic` но работает в 1.5 раза/на 150 микросекунд медленнее \n",
"2. Из-за использвания python-версий алгоритмов: `Proj Geodesic` в 20 раз быстрее засчёт использования C-версии\n",
"\n",
"Предлагаю использовать `Proj Geodesic` -- самый быстрый алгоритм среди всех геодезических засчёт реализации на C. При этом его можно ещё ускорить для вычисления расстояний многих пар точек, если создать объект эллипсоида заранее и переиспользовать его для всех вычислений\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment