Skip to content

Instantly share code, notes, and snippets.

View pavlov99's full-sized avatar
:shipit:
Coding mode

Kirill Pavlov pavlov99

:shipit:
Coding mode
View GitHub Profile
@pavlov99
pavlov99 / haversine.scala
Created December 19, 2016 07:52
Spherical distance calcualtion based on latitude and longitude with Apache Spark
// Based on following links:
// http://andrew.hedges.name/experiments/haversine/
// http://www.movable-type.co.uk/scripts/latlong.html
df
.withColumn("a", pow(sin(toRadians($"destination_latitude" - $"origin_latitude") / 2), 2) + cos(toRadians($"origin_latitude")) * cos(toRadians($"destination_latitude")) * pow(sin(toRadians($"destination_longitude" - $"origin_longitude") / 2), 2))
.withColumn("distance", atan2(sqrt($"a"), sqrt(-$"a" + 1)) * 2 * 6371)
>>>
+--------------+-------------------+-------------+----------------+---------------+----------------+--------------------+---------------------+--------------------+------------------+
|origin_airport|destination_airport| origin_city|destination_city|origin_latitude|origin_longitude|destination_latitude|destination_longitude| a| distance|
@pavlov99
pavlov99 / deque.awk
Last active July 1, 2023 00:16
AWK data structures
function deque_init(d) {d["+"] = d["-"] = 0}
function deque_is_empty(d) {return d["+"] == d["-"]}
function deque_push_back(d, val) {d[d["+"]++] = val}
function deque_push_front(d, val) {d[--d["-"]] = val}
function deque_back(d) {return d[d["+"] - 1]}
function deque_front(d) {return d[d["-"]]}
function deque_pop_back(d) {if(deque_is_empty(d)) {return NULL} else {i = --d["+"]; x = d[i]; delete d[i]; return x}}
function deque_pop_front(d) {if(deque_is_empty(d)) {return NULL} else {i = d["-"]++; x = d[i]; delete d[i]; return x}}
function deque_print(d){x="["; for (i=d["-"]; i<d["+"] - 1; i++) x = x d[i]", "; print x d[d["+"] - 1]"]; size: "d["+"] - d["-"] " [" d["-"] ", " d["+"] ")"}

Keybase proof

I hereby claim:

  • I am pavlov99 on github.
  • I am p99 (https://keybase.io/p99) on keybase.
  • I have a public key whose fingerprint is 247B E451 E71C 8AAD 6AF2 7842 5849 1A6C A92B 0F59

To claim this, I am signing this object:

@pavlov99
pavlov99 / ansible-galaxy-find-role-id.sh
Created March 27, 2017 07:11
Find your role's id in ansible-galaxy
$ ansible-galaxy info YourUser.RoleName | grep -E 'id: [0-9]' | awk {'print $2'}
@pavlov99
pavlov99 / enum.py
Created June 6, 2018 09:55
Python helpers
class Choices(object):
""" Choices."""
def __init__(self, *choices):
self._choices = []
self._choice_dict = {}
for choice in choices:
if isinstance(choice, (list, tuple)):
@pavlov99
pavlov99 / dump-xgboost-trees.py
Created November 2, 2016 03:20
Dump xgboost trees visualisation to the file system in pdf format.
model = xgb.Booster(model_file='your.model')
model.feature_names = xgtrain.feature_names # Note: xgtrain is your train file with features.
model.feature_types = xgtrain.feature_types
# Number of trees in the model
num_trees = len(model.get_dump())
# dump all of the trees to tree folder
for tree_index in range(num_trees):
dot = xgb.to_graphviz(model, num_trees=tree_index)
// NOTE: add minimum and maximum values to thresholds
val thresholds: Array[Double] = Array(Double.MinValue, 0.0) ++ (((0.0 until 50.0 by 10).toArray ++ Array(Double.MaxValue)).map(_.toDouble))
// Convert DataFrame to RDD and calculate histogram values
val _tmpHist = df
.select($"column" cast "double")
.rdd.map(r => r.getDouble(0))
.histogram(thresholds)
// Result DataFrame contains `from`, `to` range and the `value`.
@pavlov99
pavlov99 / crossjoin.py
Last active September 12, 2019 10:19
Pandas cross-join
from functools import reduce
def crossjoin(*dfs, **kwargs):
"""Calculate a cartesian product of given dataframes.
Subsequently join each dataframe using a temporary constant key and then remove it.
Also set a MultiIndex - cartesian product of the indices of the input dataframes.
See: https://github.com/pydata/pandas/issues/5401
Args:
@pavlov99
pavlov99 / mongo-dump-csv.sh
Last active September 7, 2018 07:23 — forked from mderazon/mongo-dump-csv.sh
Export all of Mongodb collections as csv without the need to specify fields
OIFS=$IFS;
IFS=",";
# fill in your details here
dbname=DBNAME
user=USERNAME
pass=PASSWORD
host=HOSTNAME:PORT
# first get all collections in the database
@pavlov99
pavlov99 / javascript-dependencies.sh
Created August 24, 2018 03:27
Visualize JavaScript dependencies
# Install madge (https://github.com/pahen/madge) and graphviz first
madge --dot --layout neato --include-npm src/ | dot -Tpng > dependencies.png