Skip to content

Instantly share code, notes, and snippets.

View GuangsZuo's full-sized avatar
🎯
Focusing

Arashi GuangsZuo

🎯
Focusing
View GitHub Profile
@namakemono
namakemono / decomposable_attention.py
Created August 17, 2017 23:53
Decomposable Attention with Keras.
from keras.layers import *
from keras.activations import softmax
from keras.models import Model
"""
References
----------
[1]. Parikh, Ankur P., et al. "A decomposable attention model for natural language inference." arXiv preprint arXiv:1606.01933 (2016).
@inchoate
inchoate / including_external_package_in_dataflow.md
Last active February 2, 2024 11:40
Adding an extra package to a Python Dataflow project to run on GCP

The Problem

The documentation for how to deploy a pipeline with extra, non-PyPi, pure Python packages on GCP is missing some detail. This gist shows how to package and deploy an external pure-Python, non-PyPi dependency to a managed dataflow pipeline on GCP.

TL;DR: You external package needs to be a python (source/binary) distro properly packaged and shipped alongside your pipeline. It is not enough to only specify a tar file with a setup.py.

Preparing the External Package

Your external package must have a proper setup.py. What follow is an example setup.py for our ETL package. This is used to package version 1.1.1 of the etl library. The library requires 3 native PyPi packages to run. These are specified in the install_requires field. This package also ships with custom external JSON data, declared in the package_data section. Last, the setuptools.find_packages function searches for all available packages and returns that

@bistaumanga
bistaumanga / kMeans.py
Last active December 7, 2020 10:16
KMeans Clustering Implemented in python with numpy
'''Implementation and of K Means Clustering
Requires : python 2.7.x, Numpy 1.7.1+'''
import numpy as np
def kMeans(X, K, maxIters = 10, plot_progress = None):
centroids = X[np.random.choice(np.arange(len(X)), K), :]
for i in range(maxIters):
# Cluster Assignment step
C = np.array([np.argmin([np.dot(x_i-y_k, x_i-y_k) for y_k in centroids]) for x_i in X])
@velicast
velicast / stdc++.h
Created July 17, 2013 17:56
Linux GCC 4.8.0 /bits/stdc++.h header definition.
// C++ includes used for precompiling -*- C++ -*-
// Copyright (C) 2003-2013 Free Software Foundation, Inc.
//
// This file is part of the GNU ISO C++ Library. This library is free
// software; you can redistribute it and/or modify it under the
// terms of the GNU General Public License as published by the
// Free Software Foundation; either version 3, or (at your option)
// any later version.
@yanofsky
yanofsky / LICENSE
Last active June 5, 2024 21:51
A script to download all of a user's tweets into a csv
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
@jboner
jboner / latency.txt
Last active July 6, 2024 06:48
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD