Skip to content

Instantly share code, notes, and snippets.

Ruhong seahrh

Block or report user

Report or block seahrh

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@seahrh
seahrh / logging.py
Created Jan 25, 2020 — forked from kingspp/logging.py
Python Comprehensive Logging using YAML Configuration
View logging.py
import os
import yaml
import logging.config
import logging
import coloredlogs
def setup_logging(default_path='logging.yaml', default_level=logging.INFO, env_key='LOG_CFG'):
"""
| **@author:** Prathyush SP
| Logging Setup
@seahrh
seahrh / pystack.py
Created Oct 25, 2019 — forked from JettJones/pystack.py
performance of various ways to get the callers name in python
View pystack.py
import sys
import inspect
import time
import traceback
def deeper(func, depth):
if depth > 0:
return deeper(func, depth-1)
else:
return func()
@seahrh
seahrh / drop_corr_features.py
Created Oct 16, 2019
Removing all the features with a high correlation. Keeping those which correlate with target value better.
View drop_corr_features.py
to_drop = list()
# Iterating over rows starting from the second one, because position [0, 0] will be self-correlation which is 1
for i in range(1, len(corr_matrix)):
# Iterating over columns of the row. Only going under the diagonal.
for j in range(i):
# See if the correlation between two features are more than a selected threshold
if corr_matrix.iloc[i, j] >= 0.98:
# Then keep the one from thos two which correlates with target better
if abs(pd.concat([X[corr_matrix.index[i]], y], axis=1).corr().iloc[0][1]) > abs(pd.concat([X[corr_matrix.columns[j]], y], axis=1).corr().iloc[0][1]):
@seahrh
seahrh / permutation_importance.py
Created Oct 16, 2019
Permutation importance function
View permutation_importance.py
def permutation_importance(X, y, model):
perm = {}
y_true = model.predict_proba(X)[:,1]
baseline= roc_auc_score(y, y_true)
for cols in X.columns:
value = X[cols]
X[cols] = np.random.permutation(X[cols].values)
y_true = model.predict_proba(X)[:,1]
perm[cols] = roc_auc_score(y, y_true) - baseline
X[cols] = value
@seahrh
seahrh / uniques.py
Created Sep 11, 2019
pandas: find top n unique values in each column
View uniques.py
def df_column_unique_values(df, top_n = 5):
for col_name, values in df.iteritems():
col_value_counts = values.value_counts()
print(f"{col_name} : {len(col_value_counts)}")
col_value_count_list = [
"'" + str(c) + "'" + ":" + str(n) for c, n in sorted(
col_value_counts.items(),
key=lambda kv: kv[1],
reverse=True
)
@seahrh
seahrh / .gitconfig
Created Jul 19, 2019 — forked from Kovrinic/.gitconfig
git global url insteadOf setup
View .gitconfig
# one or the other, NOT both
[url "https://github"]
insteadOf = git://github
# or
[url "git@github.com:"]
insteadOf = git://github
View QueryReader.scala
import org.apache.spark.sql._
final case class QueryReader(
spark: SparkSession,
query: String,
url: String,
driver: String,
user: String,
password: String
) {
@seahrh
seahrh / README.mkd
Created Apr 25, 2019 — forked from vrillusions/README.mkd
Generate gpg key via batch file
View README.mkd

Introduction

This is how to create a gpg key without any user interaction or password. This can be used in cases where the primary goal is to secure the data in transit but the gpg key can/must be stored locally without a password. An example of this is the hiera-gpg plugin which doesn't support passwords.

The below genkey-batch file will use the default which currently are RSA/RSA and 2048 bit length. See the reference link to set this to something else.

References

@seahrh
seahrh / spark.sh
Created Feb 8, 2019
spark-submit airflow jinja template - optional params
View spark.sh
#!/usr/bin/env bash
spark-submit --master yarn \
--deploy-mode cluster \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--class {{ params.class }} {{ params.jar_path }} \
--sink_db {{ params.sink_db }} \
--sink_table {{ params.sink_table }} \
--sink_partition_column_ds {{ ds_nodash }} \
{% if params.sink_partition_column_post_date is defined %}--sink_partition_column_post_date {{ params.sink_partition_column_post_date }} \{% else %}\{% endif %}
{% if params.foo is defined %}--foo {{ params.foo }} \{% else %}\{% endif %}
@seahrh
seahrh / 0.suffixtree.cs
Created Jan 18, 2019 — forked from axefrog/0.suffixtree.cs
C# Suffix tree implementation based on Ukkonen's algorithm. Full explanation here: http://stackoverflow.com/questions/9452701/ukkonens-suffix-tree-algorithm-in-plain-english
View 0.suffixtree.cs
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
namespace SuffixTreeAlgorithm
{
public class SuffixTree
{
You can’t perform that action at this time.