Skip to content

Instantly share code, notes, and snippets.

@Nikolay-Lysenko
Nikolay-Lysenko / non_equi_inner_join_in_pandas.py
Created November 11, 2017 09:33
How to join two DataFrames on conditions that can be other than exact match?
"""
Implementation of non-equi inner join in `pandas`.
Non-equi join is a join with conditions other than exact match.
For example, values from a column of a left `DataFrame` must be higher
than values from a column of a right `DataFrame` and lower than values
from another column of the right `DataFrame`.
@author: Nikolay Lysenko
"""
@Nikolay-Lysenko
Nikolay-Lysenko / how_to_run_jupyter_kernels_without_connection_to_them.md
Last active October 27, 2021 07:05
How to prevent a remote Jupyter kernel from being interrupted when connection from a local machine to it is closed?

The Problem

Suppose that Jupyter notebooks are hosted at a remote server and you access them from your local machine via web interface. By default, if connection is closed (e.g., you close a browser tab with remotely hosted notebook), corresponding to this connection running kernels die. Hence, time-consuming processes can not be finished without maintaining their connections open. Sometimes it is unfeasible — for instance, assume that you must shut down your local machine in the evening and you can turn it on only in the next morning.

The Solution

Initial Once-Only Preparations

Open local .bashrc file in edit mode:

@Nikolay-Lysenko
Nikolay-Lysenko / wrong_train_test_split_for_time_series.py
Last active August 29, 2017 19:11
Why no one should shuffle data endowed with temporal structure?
"""
This script demonstrates why train set and
test set must not be shuffled if a dataset has
temporal structure. In case of temporal
structure, shuffling leads to leakage of some
information about a test set to a training set
and this results in too optimistic scores.
The point of the considered here example is that
a particular flaw of tree-based ensembles
@Nikolay-Lysenko
Nikolay-Lysenko / upsert_from_pandas_to_postgres.py
Last active January 5, 2022 06:08
Upsert (a hybrid of insert and update) from pandas.DataFrame to PostgreSQL database
from time import sleep
from io import StringIO
import psycopg2
def upsert_df_into_postgres(df, target_table, primary_keys, conn_string,
n_trials=5, quoting=None, null_repr=None):
"""
Uploads data from `df` to `target_table`
@Nikolay-Lysenko
Nikolay-Lysenko / simple_backpropagation_in_plain_numpy.py
Last active April 16, 2017 08:42
A minimal working example of how to implement backpropagation having only NumPy.
import numpy as np
class TwoLayerPerceptron(object):
"""
This is a simple neural network
with exactly one hidden layer
and 0.5 * MSE (Mean Squared Error)
as loss.
@Nikolay-Lysenko
Nikolay-Lysenko / xgb_quantile_loss.py
Last active October 25, 2023 13:26
Customized loss function for quantile regression with XGBoost
import numpy as np
def xgb_quantile_eval(preds, dmatrix, quantile=0.2):
"""
Customized evaluational metric that equals
to quantile regression loss (also known as
pinball loss).
Quantile regression is regression that