Using Python's built-in defaultdict we can easily define a tree data structure:
def tree(): return defaultdict(tree)
That's it!
Perl and PHP Regular Expressions | |
PHP regexes are based on the PCRE (Perl-Compatible Regular Expressions), so any regexp that works for one should be compatible with the other or any other language that makes use of the PCRE format. Here are some commonly needed regular expressions for both PHP and Perl. Each regex will be in string format and will include delimiters. | |
All Major Credit Cards | |
This regular expression will validate all major credit cards: American Express (Amex), Discover, Mastercard, and Visa. | |
//All major credit cards regex | |
'/^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6011[0-9]{12}|622((12[6-9]|1[3-9][0-9])|([2-8][0-9][0-9])|(9(([0-1][0-9])|(2[0-5]))))[0-9]{10}|64[4-9][0-9]{13}|65[0-9]{14}|3(?:0[0-5]|[68][0-9])[0-9]{11}|3[47][0-9]{13})*$/' |
#!/usr/bin/env/python | |
# | |
# More of a reference of using jinaj2 without actual template files. | |
# This is great for a simple output transformation to standard out. | |
# | |
# Of course you will need to "sudo pip install jinja2" first! | |
# | |
# I like to refer to the following to remember how to use jinja2 :) | |
# http://jinja.pocoo.org/docs/templates/ | |
# |
Using Python's built-in defaultdict we can easily define a tree data structure:
def tree(): return defaultdict(tree)
That's it!
# ============= | |
# Introduction | |
# ============= | |
# I've been doing some data mining lately and specially looking into `Gradient | |
# Boosting Trees <http://en.wikipedia.org/wiki/Gradient_boosting>`_ since it is | |
# claimed that this is one of the techniques with best performance out of the | |
# box. In order to have a better understanding of the technique I've reproduced | |
# the example of section *10.14.1 California Housing* in the book `The Elements of Statistical Learning <http://www-stat.stanford.edu/~tibs/ElemStatLearn/>`_. | |
# Each point of this dataset represents the house value of a property with some | |
# attributes of that house. You can get the data and the description of those |
A personal diary of DataFrame munging over the years.
Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)
#!/usr/bin/python | |
import sys; | |
import re; | |
import slate; | |
import pickle; | |
import nltk; | |
import glob; | |
import os; |
from scipy.spatial.distance import pdist, squareform | |
import numpy as np | |
import copy | |
def distcorr(Xval, Yval, pval=True, nruns=500): | |
""" Compute the distance correlation function, returning the p-value. | |
Based on Satra/distcorr.py (gist aa3d19a12b74e9ab7941) | |
>>> a = [1,2,3,4,5] |
def splitDataFrameList(df,target_column,separator): | |
''' df = dataframe to split, | |
target_column = the column containing the values to split | |
separator = the symbol used to perform the split | |
returns: a dataframe with each entry for the target column separated, with each element moved into a new row. | |
The values in the other columns are duplicated across the newly divided rows. | |
''' | |
def splitListToRows(row,row_accumulator,target_column,separator): | |
split_row = row[target_column].split(separator) |
#!/usr/bin/env python | |
""" | |
Tropical Cyclone Risk Model (TCRM) - Version 1.0 (beta release) | |
Copyright (C) 2011 Geoscience Australia | |
This program is free software: you can redistribute it and/or modify | |
it under the terms of the GNU General Public License as published by | |
the Free Software Foundation, either version 3 of the License, or | |
(at your option) any later version. |