Skip to content

Instantly share code, notes, and snippets.

Ruhong seahrh

Block or report user

Report or block seahrh

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
seahrh /
Created Jan 25, 2020 — forked from kingspp/
Python Comprehensive Logging using YAML Configuration
import os
import yaml
import logging.config
import logging
import coloredlogs
def setup_logging(default_path='logging.yaml', default_level=logging.INFO, env_key='LOG_CFG'):
| **@author:** Prathyush SP
| Logging Setup
seahrh /
Created Oct 25, 2019 — forked from JettJones/
performance of various ways to get the callers name in python
import sys
import inspect
import time
import traceback
def deeper(func, depth):
if depth > 0:
return deeper(func, depth-1)
return func()
seahrh /
Created Oct 16, 2019
Removing all the features with a high correlation. Keeping those which correlate with target value better.
to_drop = list()
# Iterating over rows starting from the second one, because position [0, 0] will be self-correlation which is 1
for i in range(1, len(corr_matrix)):
# Iterating over columns of the row. Only going under the diagonal.
for j in range(i):
# See if the correlation between two features are more than a selected threshold
if corr_matrix.iloc[i, j] >= 0.98:
# Then keep the one from thos two which correlates with target better
if abs(pd.concat([X[corr_matrix.index[i]], y], axis=1).corr().iloc[0][1]) > abs(pd.concat([X[corr_matrix.columns[j]], y], axis=1).corr().iloc[0][1]):
seahrh /
Created Oct 16, 2019
Permutation importance function
def permutation_importance(X, y, model):
perm = {}
y_true = model.predict_proba(X)[:,1]
baseline= roc_auc_score(y, y_true)
for cols in X.columns:
value = X[cols]
X[cols] = np.random.permutation(X[cols].values)
y_true = model.predict_proba(X)[:,1]
perm[cols] = roc_auc_score(y, y_true) - baseline
X[cols] = value
seahrh /
Created Sep 11, 2019
pandas: find top n unique values in each column
def df_column_unique_values(df, top_n = 5):
for col_name, values in df.iteritems():
col_value_counts = values.value_counts()
print(f"{col_name} : {len(col_value_counts)}")
col_value_count_list = [
"'" + str(c) + "'" + ":" + str(n) for c, n in sorted(
key=lambda kv: kv[1],
seahrh / .gitconfig
Created Jul 19, 2019 — forked from Kovrinic/.gitconfig
git global url insteadOf setup
View .gitconfig
# one or the other, NOT both
[url "https://github"]
insteadOf = git://github
# or
[url ""]
insteadOf = git://github
View QueryReader.scala
import org.apache.spark.sql._
final case class QueryReader(
spark: SparkSession,
query: String,
url: String,
driver: String,
user: String,
password: String
) {
seahrh / README.mkd
Created Apr 25, 2019 — forked from vrillusions/README.mkd
Generate gpg key via batch file
View README.mkd


This is how to create a gpg key without any user interaction or password. This can be used in cases where the primary goal is to secure the data in transit but the gpg key can/must be stored locally without a password. An example of this is the hiera-gpg plugin which doesn't support passwords.

The below genkey-batch file will use the default which currently are RSA/RSA and 2048 bit length. See the reference link to set this to something else.


seahrh /
Created Feb 8, 2019
spark-submit airflow jinja template - optional params
#!/usr/bin/env bash
spark-submit --master yarn \
--deploy-mode cluster \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--class {{ params.class }} {{ params.jar_path }} \
--sink_db {{ params.sink_db }} \
--sink_table {{ params.sink_table }} \
--sink_partition_column_ds {{ ds_nodash }} \
{% if params.sink_partition_column_post_date is defined %}--sink_partition_column_post_date {{ params.sink_partition_column_post_date }} \{% else %}\{% endif %}
{% if is defined %}--foo {{ }} \{% else %}\{% endif %}
seahrh / 0.suffixtree.cs
Created Jan 18, 2019 — forked from axefrog/0.suffixtree.cs
C# Suffix tree implementation based on Ukkonen's algorithm. Full explanation here:
View 0.suffixtree.cs
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
namespace SuffixTreeAlgorithm
public class SuffixTree
You can’t perform that action at this time.