Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Spherical distance calcualtion based on latitude and longitude with Apache Spark
// Based on following links:
// http://andrew.hedges.name/experiments/haversine/
// http://www.movable-type.co.uk/scripts/latlong.html
df
.withColumn("a", pow(sin(toRadians($"destination_latitude" - $"origin_latitude") / 2), 2) + cos(toRadians($"origin_latitude")) * cos(toRadians($"destination_latitude")) * pow(sin(toRadians($"destination_longitude" - $"origin_longitude") / 2), 2))
.withColumn("distance", atan2(sqrt($"a"), sqrt(-$"a" + 1)) * 2 * 6371)
>>>
+--------------+-------------------+-------------+----------------+---------------+----------------+--------------------+---------------------+--------------------+------------------+
|origin_airport|destination_airport| origin_city|destination_city|origin_latitude|origin_longitude|destination_latitude|destination_longitude| a| distance|
+--------------+-------------------+-------------+----------------+---------------+----------------+--------------------+---------------------+--------------------+------------------+
| HKG| SYD| Hong Kong| Sydney| 22.308919| 113.914603| -33.946111| 151.177222| 0.3005838068886348|7393.8837884771565|
| YYZ| HKG| Toronto| Hong Kong| 43.677223| -79.630556| 22.308919| 113.914603| 0.6941733892671567|12548.533187172497|
+--------------+-------------------+-------------+----------------+---------------+----------------+--------------------+---------------------+--------------------+------------------+
@nathanwalther

This comment has been minimized.

Copy link

nathanwalther commented Nov 5, 2017

Thanks for sharing, this was a huge help!

@joekane3

This comment has been minimized.

Copy link

joekane3 commented Oct 4, 2018

+1

@kennethlimjf

This comment has been minimized.

Copy link

kennethlimjf commented Apr 10, 2019

Thanks @pavlov99, I still use this!

@harpaj

This comment has been minimized.

Copy link

harpaj commented May 31, 2019

Thanks a lot for this! I ported it to Pyspark, maybe it helps someone:

    import pyspark.sql.functions as F
    df = df.withColumn("a", (
        F.pow(F.sin(F.radians(F.col("destination_latitude") - F.col("origin_latitude")) / 2), 2) +
        F.cos(F.radians(F.col("origin_latitude"))) * F.cos(F.radians(F.col("destination_latitude"))) *
        F.pow(F.sin(F.radians(F.col("destination_longitude") - F.col("origin_longitude")) / 2), 2)
    )).withColumn("distance", F.atan2(F.sqrt(F.col("a")), F.sqrt(-F.col("a") + 1)) * 12742000)
@RobinL

This comment has been minimized.

Copy link

RobinL commented Mar 30, 2020

Thanks @pavlov99 and @harpaj!. Worth noting that harpaj's code gives distance in meters

and if you like sql:

cast(atan2(sqrt(
(
pow(sin(radians(lat_r - lat_l))/2, 2) + 
cos(radians(lat_l)) * cos(radians(lat_r)) *
pow(sin(radians(long_r - long_l)/2),2)
)
), sqrt(-1*
(
pow(sin(radians(lat_r - lat_l))/2, 2) + 
cos(radians(lat_l)) * cos(radians(lat_r)) *
pow(sin(radians(long_r - long_l)/2),2)
)
 + 1)) * 12742 as float) as distance_km
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.