Skip to content

Instantly share code, notes, and snippets.

🎯
Focusing

SemanticBeeng SemanticBeeng

🎯
Focusing
  • Earth
Block or report user

Report or block SemanticBeeng

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View Shapley Value
"Interpretable Machine Learning with XGBoost" https://towardsdatascience.com/interpretable-machine-learning-with-xgboost-9ec80d148d27
"Interpreting complex models with SHAP values" https://medium.com/@gabrieltseng/interpreting-complex-models-with-shap-values-1c187db6ec83
"Interpreting your deep learning model by SHAP" https://towardsdatascience.com/interpreting-your-deep-learning-model-by-shap-e69be2b47893
"SHAP for explainable machine learning" https://meichenlu.com/2018-11-10-SHAP-explainable-machine-learning/
"Detecting Bias with SHAP - What do Developer Salaries Tell us about the Gender Pay Gap?" https://databricks.com/blog/2019/06/17/detecting-bias-with-shap.html
https://github.com/slundberg/shap
View DataFabric
# Cross language/framework/platform data fabric
## Requirements / Goals
1. #DataSchema abstract over data types from simple tabular ("data frame") to multi-dimension tensors/arrays, graph, etc (see HDF5)
2. #DataSchema specifiable throygh by a functioanal / declarative language (like Kotlingrad + Petastorm/UniSchema)
3. #DataSchema with bindings to languages (Scala, Python) and frameworks (Parquet, ApachHudi, Tensorflow, ApacheSpark, PyTorch)
4. #DataSchema to define both in-memory #DataFabric and schema for data at rest (Parquet, ApacheHudi, PetaStorm, etc)
5. Runtime derived from the "shared runtime" paradigm of #ApacheArrow (no conversions, zero-copy, JVM off-heap)
6. Runtime treats IO/persistence as a separate effect (abstracted away from algo/application logic)
View tql with spark
package io.yields.common.meta
import org.apache.spark.sql.Column
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.StructType
import scala.annotation._
import scala.meta._
/**
@SemanticBeeng
SemanticBeeng / arrow panda marshalling
Last active May 31, 2018
arrow panda marshalling
View arrow panda marshalling
# https://arrow.apache.org/docs/python/memory.html
# https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html
# https://arrow.apache.org/docs/python/ipc.html
# https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_io.py
# https://github.com/apache/arrow/blob/master/python/pyarrow/serialization.py
# https://jakevdp.github.io/PythonDataScienceHandbook/02.09-structured-data-numpy.html
# https://stackoverflow.com/questions/46837472/converting-pandas-dataframe-to-structured-arrays
import pyarrow as pa
import pandas as pd
@SemanticBeeng
SemanticBeeng / structured numpy arrays
Last active May 12, 2018
structured numpy arrays
View structured numpy arrays
# #resource
# https://docs.scipy.org/doc/numpy-1.14.0/user/basics.rec.html
conda install -c conda-forge traits=4.6.0
traits: 4.6.0-py36_1 conda-forge
import numpy as np
from traits.api import Array, Tuple, List, String
from traitschema import Schema
View bckp_root_dirs.sh
#! /bin/bash
# Root backup directories (sources, locals. destinations and mount points) for backups executed on a/this machine
# Root of backups executed on this machine (local copies for $BCKP_DIRs of all the backups)
export BCKP_DIRS=/data/bckp_dirs
# Root of backup source directories for data from other machines (see $BCKP_SRC)
export BCKP_SRCS=/mnt/backups/bckp_srcs
# Root of backup remote destination directories (remote copies for $BCKP_DIRs of all the backups)
@SemanticBeeng
SemanticBeeng / 0_reuse_code.js
Created Feb 1, 2016
Here are some things you can do with Gists in GistBox.
View 0_reuse_code.js
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
You can’t perform that action at this time.