- dplyr
- pandas
- ggplot2
- plotly
- seaborn
- mlr
- scikit-learn
- yellowbrick
- xgboost
- keras
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from urllib.request import urlopen | |
from bs4 import BeautifulSoup | |
from collections import Counter | |
import pickle | |
import nltk | |
from nltk.corpus import stopwords | |
from nltk.tokenize import word_tokenize | |
from __future__ import division | |
from nltk import FreqDist | |
from tqdm import tqdm |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# R stuff | |
.Rproj.user | |
.Rhistory | |
.RData | |
.Ruserdata | |
# datatypes | |
*csv | |
*tsv | |
*xls |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
plot.nnet<-function(mod.in,nid=T,all.out=T,all.in=T,bias=T,wts.only=F,rel.rsc=5, | |
circle.cex=5,node.labs=T,var.labs=T,x.lab=NULL,y.lab=NULL, | |
line.stag=NULL,struct=NULL,cex.val=1,alpha.val=1, | |
circle.col='lightblue',pos.col='black',neg.col='grey', | |
bord.col='lightblue', max.sp = F,...){ | |
require(scales) | |
#sanity checks | |
if('mlp' %in% class(mod.in)) warning('Bias layer not applicable for rsnns object') |
Tested with Apache Spark 1.3.1, Python 2.7.9 and Java 1.8.0_45 + workaround for Spark 1.4.x from @enahwe.
Download and install it from oracle.com
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pylab as pl | |
import pandas as pd | |
from sklearn import svm | |
from sklearn import linear_model | |
from sklearn import tree | |
from sklearn.metrics import confusion_matrix |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function(y, x = NULL, | |
est = c("mean", "median", "proportion"), | |
success = NULL, order = NULL, | |
method = c("theoretical","simulation"), | |
type = c("ci","ht"), | |
alternative = c("less","greater","twosided"), | |
null = NULL, | |
boot_method = c("perc","se"), | |
conflevel = 0.95, siglevel = 0.05, | |
nsim = 10000, simdist = FALSE, seed = NULL, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
%pylab inline | |
from pylab import * | |
pylab.rcParams['figure.figsize'] = (8.0, 6.4) | |
from mpl_toolkits.basemap import Basemap | |
import matplotlib.pyplot as plt | |
import numpy as np | |
map = Basemap(projection='ortho', lat_0=50, lon_0=-100, | |
resolution='l', area_thresh=1000.0) |
If you were to give recommendations to your "little brother/sister" on things that they need to do to become a data scientist, what would those things be?
I think the "Data Science Venn Diagram" (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) is a great place to start. You need three things to be a good data scientist:
- Statistical knowledge
- Programming/hacking skills
- Domain expertise
NewerOlder