Skip to content

Instantly share code, notes, and snippets.

View eliasah's full-sized avatar

Elie A. eliasah

View GitHub Profile
@jakevdp
jakevdp / convolution_matrix.py
Last active March 12, 2019 09:49
Convolution Matrix
# Author: Jake VanderPlas
# LICENSE: MIT
from __future__ import division
import numpy as np
def convolution_matrix(x, N=None, mode='full'):
"""Compute the Convolution Matrix
@gati
gati / guess_candidate_model.py
Created November 2, 2016 15:42
Code for training an LSTM model for text classification using the keras library (Theano backend). Was used for guesscandidate.com.
from sklearn.cross_validation import train_test_split
from keras.preprocessing import sequence, text
from keras.models import Sequential
from keras.layers import (Dense, Dropout, Activation, Embedding, LSTM,
Convolution1D, MaxPooling1D)
# Embedding
max_features = 20000
maxlen = 100
embedding_size = 32

Generating Flame Graphs for Apache Spark

Flame graphs are a nifty debugging tool to determine where CPU time is being spent. Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.

When are flame graphs useful?

Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to spee

@twiecki
twiecki / bayesian_neural_network.ipynb
Last active February 22, 2022 01:28
Bayesian Neural Network in PyMC3
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@glamp
glamp / customer-segmentation.py
Last active April 30, 2020 13:40
Analysis for customer segmentation blog post
import pandas as pd
# http://blog.yhathq.com/static/misc/data/WineKMC.xlsx
df_offers = pd.read_excel("./WineKMC.xlsx", sheetname=0)
df_offers.columns = ["offer_id", "campaign", "varietal", "min_qty", "discount", "origin", "past_peak"]
df_offers.head()
df_transactions = pd.read_excel("./WineKMC.xlsx", sheetname=1)
df_transactions.columns = ["customer_name", "offer_id"]
df_transactions['n'] = 1
df_transactions.head()
@andershammar
andershammar / matplotlib-zeppelin
Created July 1, 2015 07:42
Example showing how to use matplotlib from a Zeppelin notebook
%pyspark
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
import StringIO
def show(p):
img = StringIO.StringIO()
p.savefig(img, format='svg')
@kastnerkyle
kastnerkyle / optimizers.py
Last active January 12, 2021 13:46
Theano optimizers
# Authors: Kyle Kastner
# License: BSD 3-clause
import theano.tensor as T
import numpy as np
import theano
class rmsprop(object):
"""
RMSProp with nesterov momentum and gradient rescaling
@karpathy
karpathy / gist:587454dc0146a6ae21fc
Last active June 7, 2024 05:09
An efficient, batched LSTM.
"""
This is a batched LSTM forward and backward pass
"""
import numpy as np
import code
class LSTM:
@staticmethod
def init(input_size, hidden_size, fancy_forget_bias_init = 3):
@Newmu
Newmu / conv_deconv_variational_autoencoder.py
Last active May 26, 2024 12:32
Prototype code of conv/deconv variational autoencoder, probably not runable, lots of inter-dependencies with local codebase =/
import theano
import theano.tensor as T
from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
from theano.tensor.signal.downsample import max_pool_2d
from theano.tensor.extra_ops import repeat
from theano.sandbox.cuda.dnn import dnn_conv
from time import time
import numpy as np
from matplotlib import pyplot as plt