Skip to content

Instantly share code, notes, and snippets.

View ppkn's full-sized avatar
🦥

Daniel Pipkin ppkn

🦥
View GitHub Profile
@larsmans
larsmans / gist:3745866
Created September 18, 2012 21:00
Inspecting scikit-learn CountVectorizer output with a Pandas DataFrame
>>> from pandas import DataFrame
>>> from sklearn.feature_extraction.text import CountVectorizer
>>> docs = ["You can catch more flies with honey than you can with vinegar.",
... "You can lead a horse to water, but you can't make him drink."]
>>> vect = CountVectorizer(min_df=0., max_df=1.0)
>>> X = vect.fit_transform(docs)
>>> print(DataFrame(X.A, columns=vect.get_feature_names()).to_string())
but can catch drink flies him honey horse lead make more than to vinegar water with you
0 0 2 1 0 1 0 1 0 0 0 1 1 0 1 0 2 2
1 1 2 0 1 0 1 0 1 1 1 0 0 1 0 1 0 2
@traumverloren
traumverloren / switch-to-rbenv.md
Last active March 29, 2024 00:54
Switch from RVM to rbenv

Switch from RVM to rbenv:

Get Rid of RVM if it's installed:

rvm implode

Cleanup your .bash_profile/.bashrc/.zshrc file to remove RVM from the path:

You should have something like this left from RVM. Delete it from the file. ``[[ -s "$HOME/.rvm/scripts/rvm" ]] && source "$HOME/.rvm/scripts/rvm"```

# -*- coding: utf-8 -*-
## EXPORTING TO PDF FROM revealjs OR jupyter notebook slides
## using nbconvert and decktape (https://github.com/astefanutti/decktape)
## to export pdf and/or html(revealjs)
## from jupyter notebook / revealjs html
## phantomjs must be included in path, and decktape directory must be place beside this export_reveal.py file
## for more detail, please check:
## nbconvert - https://github.com/jupyter/nbconvert

Introduction to Installing PySpark & Jupyter Notebooks on Mac OSX

Spark is used for large-scale distributed data processing. It has become the go to standard for a lot of companies in the technology industry. The Spark framework is capable of computing at high speeds, processing massive amounts of resilient sets of data, and it does it all while computing in a highly distributed manner.

Jupyter Notebooks, commenly called "Jupyter", has been a popular application within the Data Science community for many years.   It enables you to edit, run, and share Python code into a web view. It allows you to execute your code in a step by step process in order to share parts of your code in a very flexible way for data analysis work. This is why Jupyter is a great tool to prototype in, and should be used at all companies that are data centric.

Why use PySpark in a Jupyter Notebook?

Most data engineers argue that the Scala programming language version is more performant than Python version, and it is. Howev

@tobyshooters
tobyshooters / interface.html
Last active January 18, 2024 03:54
web bootstrap
<script>
const $ = ({
el, // existing string type to create
pr = null, // parent node
at = {}, // attributes
st = {}, // style
ev = {}, // events, element injected as first parameter
ih = "" // innerHTML
}) => {
let n = el;