Skip to content

Instantly share code, notes, and snippets.

@mbostock
mbostock / .block
Last active November 22, 2022 23:32
Line Transition
license: gpl-3.0
@catawbasam
catawbasam / pandas_dbms.py
Last active May 26, 2024 05:32
Python PANDAS : load and save Dataframes to sqlite, MySQL, Oracle, Postgres
# -*- coding: utf-8 -*-
"""
LICENSE: BSD (same as pandas)
example use of pandas with oracle mysql postgresql sqlite
- updated 9/18/2012 with better column name handling; couple of bug fixes.
- used ~20 times for various ETL jobs. Mostly MySQL, but some Oracle.
to do:
save/restore index (how to check table existence? just do select count(*)?),
finish odbc,
@greeness
greeness / gist:4041029
Created November 8, 2012 19:38
sentiment analysis
http://danzambonini.com/self-improving-bayesian-sentiment-analysis-for-twitter/
http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-python-and-nltk/
[PDF]
Sentiment Analysis of Twitter Data - Department of Computer ...
www.cs.columbia.edu/~julia/papers/Agarwaletal11.pdf
@greeness
greeness / gist:4273631
Created December 13, 2012 02:53
fast way of loading data into mysql table
cat mysql-statement
> insert into mytable (k, v) values (1,2);
> insert into mytable (k, v) values (2,2);
> insert into mytable (k, v) values (3,2);
scp mysql-statement to an ec2 machine
ssh to the ec2 machine
$ mysql -h hostname -u username -p password database_name < mysql-statement
@rjurney
rjurney / ntf_idf.pig
Created February 2, 2013 07:57
Implements NTF-IDF, shout outs to Mat Kelcey who recommended this. See: http://nlp.stanford.edu/IR-book/html/htmledition/maximum-tf-normalization-1.html
DEFINE tf_idf(token_records, id_field, token_field) RETURNS out_relation {
/* Calculate the term count per document */
doc_word_totals = foreach (group $token_records by ($id_field, $token_field)) generate
FLATTEN(group) as ($id_field, token),
COUNT_STAR($token_records) as doc_total;
/* Calculate the document size */
pre_term_counts = foreach (group doc_word_totals by $id_field) generate
group AS $id_field,
@rjurney
rjurney / classify.pig
Created April 19, 2013 06:40
TF-IDF.pig uses tfidf.maro.pig to compute TF-IDF scores for the lyric words. After that, classify.pig does a naive bayesian classification using the funcs.py Jython UDF. I spliced TF-IDF in where previously there was MPE. Note: lyrics are top 5,000 words only.
register /me/Software/elephant-bird/pig/target/elephant-bird-pig-3.0.6-SNAPSHOT.jar
register /me/Software/pig/build/ivy/lib/Pig/json-simple-1.1.jar
set elephantbird.jsonloader.nestedLoad 'true'
set default_parallel 4
/* Remove files from previous runs */
rmf /tmp/prior_words.txt
rmf /tmp/prior_genres.txt
rmf /tmp/p_word_given_genre.txt
@stucchio
stucchio / bayesian_ab_test.py
Last active April 2, 2023 03:17
Bayesian A/B test code
from matplotlib import use
from pylab import *
from scipy.stats import beta, norm, uniform
from random import random
from numpy import *
import numpy as np
import os
# Input data
@ctokheim
ctokheim / cython_tricks.md
Last active March 4, 2024 23:27
cython tricks

Cython

Cython has two major benefits:

  1. Making python code faster, particularly things that can't be done in scipy/numpy
  2. Wrapping/interfacing with C/C++ code

Cython gains most of it's benefit from statically typing arguments. However, statically typing is not required, in fact, regular python code is valid cython (but don't expect much of a speed up). By incrementally adding more type information, the code can speed up by several factors. This gist just provides a very basic usage of cython.

@drmalex07
drmalex07 / celeryconfig.py
Last active August 31, 2023 03:53
A quickstart example with celery queue. #celery
# This is a quickstart! In the real world use a real broker (message queue)
# such as Redis or RabbitMQ !!
BROKER_URL = 'sqlalchemy+sqlite:///tasks.db'
CELERY_RESULT_BACKEND = "db+sqlite:///results.db"
CELERY_IMPORTS = ['tasks']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
@jcchurch
jcchurch / lcal.py
Last active August 13, 2016 15:43
Adds line to let bash know that this is a python 3 script.
#!/usr/bin/env python3
import datetime
import optparse
import re
def createDate(stringDate):
if re.match("\d\d\d\d-\d\d-\d\d", stringDate) is None:
raise ValueError("This is not in the correct date format. Use YYYY-MM-DD")