Skip to content

Instantly share code, notes, and snippets.

@mmatkinson
mmatkinson / nltk-intro.py
Last active January 28, 2016 15:37 — forked from alexbowe/nltk-intro.py
Demonstration of extracting key phrases with NLTK in Python
import nltk
#python 3.4.0
#nltk==3.0.4
#numpy==1.10.4
text = """The Buddha, the Godhead, resides quite as comfortably in the circuits of a digital
computer or the gears of a cycle transmission as he does at the top of a mountain
or in the petals of a flower. To think otherwise is to demean the Buddha...which is
to demean oneself."""
@mmatkinson
mmatkinson / useful_pandas_snippets.py
Last active April 29, 2016 00:01 — forked from bsweger/useful_pandas_snippets.md
Useful Pandas Snippets
#List unique values in a DataFrame column
pd.unique(df.column_name.ravel())
#Convert Series datatype to numeric, getting rid of any non-numeric values
df['col'] = df['col'].astype(str).convert_objects(convert_numeric=True)
#Grab DataFrame rows where column has certain values
valuelist = ['value1', 'value2', 'value3']
df = df[df.column.isin(value_list)]
@mmatkinson
mmatkinson / lda_vec.py
Last active May 2, 2016 23:03
Helper class for using sklearn vectorizers with gensim lda.
# For gensim
from itertools import groupby
import gensim
class VectorizedCorpus(object):
"""
Helper Class for using Sklearn Vectorizers with gensim's LDA model
handles transformations between gensim corpus / bow representations and sklearn matrix
@mmatkinson
mmatkinson / table_comparison.py
Created August 5, 2016 19:52
compare two tables & all of their values
import pandas as pd
def df_diff(index_cols, data1, data2, lsuffix='_1'):
"""
usage:
comparisondf= df_diff( ['unique_id','date'], current_df, new_df, lsuffix='_curr')
retuns:
single dataframe with index_cols on the index, as well as all other variables stacked on the index, and the
values in each dataframe along the columns.
@mmatkinson
mmatkinson / df_to_ddl.py
Created November 4, 2016 13:39
take in a dataframe and output (redshift) DDL For creating a table of that format.
def df_to_ddl(df, tablename='test.mytable'):
data_dtypes = df.dtypes.reset_index().rename(columns = {'index':'colname',0:'datatype'})
# Map pandas datatypes into SQL
data_dtypes['sql_dtype'] = data_dtypes.datatype.astype(str).map(
{'object':'varchar(24)',
'float64':'float',
'int64':'int',
'bool':'boolean'} )
@mmatkinson
mmatkinson / google_spreadsheets_create_update_example.py
Created November 5, 2016 02:06 — forked from pahaz/google_spreadsheets_create_update_example.py
Python Google spreadsheets v4 API example. Google spreadsheet access management example. Use google drive v3 API for access management
"""Google spreadsheet related.
Packages required: oauth2client, google-api-python-client
* https://gist.github.com/miohtama/f988a5a83a301dd27469
"""
from oauth2client.service_account import ServiceAccountCredentials
from apiclient import discovery
def get_credentials(scopes: list) -> ServiceAccountCredentials:
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mmatkinson
mmatkinson / geopandas_tour.ipynb
Created January 16, 2018 17:41 — forked from ocefpaf/geopandas_tour.ipynb
explore shapefile
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
**Convert .ipynb to Slides**
cd "test"
ipython nbconvert "test.ipynb" --to slides --reveal-prefix "http://cdn.jsdelivr.net/reveal.js/2.6.2" --post serve --config slides_config.py
* To print slides add ?print-pdf at the end of the URL and print
**Convert .ipynb to LaTex/PDF**
ipython nbconvert MyFirstNotebook.ipynb --to latex --post PDF
**Convert .ipynb to HTML**
@mmatkinson
mmatkinson / postgres_queries_and_commands.sql
Created June 7, 2018 20:54 — forked from rgreenjr/postgres_queries_and_commands.sql
Useful PostgreSQL Queries and Commands
-- show running queries (pre 9.2)
SELECT procpid, age(query_start, clock_timestamp()), usename, current_query
FROM pg_stat_activity
WHERE current_query != '<IDLE>' AND current_query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
-- show running queries (9.2)
SELECT pid, age(query_start, clock_timestamp()), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'