Skip to content

Instantly share code, notes, and snippets.

@joekane3
Created October 4, 2018 11:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save joekane3/7da1395d5d7d946a358e00499efa3f90 to your computer and use it in GitHub Desktop.
Save joekane3/7da1395d5d7d946a358e00499efa3f90 to your computer and use it in GitHub Desktop.
haversine distance using pyspark
def haversine_spark(df , col_lat1, col_lon1, col_lat2, col_lon2, col_name="distance"):
df = df.withColumn("a", F.pow(F.sin(F.radians(col_lat2 - col_lat1) / 2), 2) + F.cos(F.radians(col_lat1)) * F.cos(F.radians(col_lat2)) * F.pow(F.sin(F.radians(col_lon2 - col_lon1) / 2), 2))
df = df.withColumn(col_name, F.atan2(F.sqrt(df["a"]), F.sqrt(-df["a"] + 1)) * 2 * 6371)
return df.drop("a")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment