Skip to content

Instantly share code, notes, and snippets.

View lmatthieu's full-sized avatar

Matthieu Lagacherie lmatthieu

View GitHub Profile
@lmatthieu
lmatthieu / py4sci_repo.sh
Last active August 29, 2015 14:21
DataScience Python 2.7 - Create repository
## Creating py4sci requirements repository - internet access required
cd /tmp
echo "
Theano==0.7.0
cytoolz==0.7.3
docopt==0.6.2
gdbn==0.1
gnumpy==0.2
@lmatthieu
lmatthieu / py4sci_install.sh
Last active August 29, 2015 14:21
DataScience Python 2.7 - Packages install
## No-internet access mode
yum install gcc openssl-devel zlib-devel blas-devel lapack-devel gcc-c++ bzip2-devel lzo-devel freetype-devel libpng-devel sqlite-devel
## Python 2.7.9 installation from sources
cd /tmp
wget https://www.python.org/ftp/python/2.7.9/Python-2.7.9.tgz
tar -zxvf Python-2.7.9.tgz
cd Python-2.7.9
@lmatthieu
lmatthieu / spark_read_csv.py
Last active October 5, 2016 15:09
Load csv file, infer types and save the results in Spark SQL parquet file
from pyspark import SparkContext, SparkConf
from pyspark.sql import HiveContext, SQLContext
import pandas as pd
# sc: Spark context
# file_name: csv file_name
# table_name: output table name
# sep: csv file separator
# infer_limit: pandas type inference nb rows
def read_csv(sc, file_name, table_name, sep=",", infer_limit=10000):
from IPython import embed_kernel
import start_notebook
def main():
p = start_notebook.main()
localDict = { 'a':1, 'b':2 }
embed_kernel()
p.kill()