Skip to content

Instantly share code, notes, and snippets.

@jermspeaks
Last active August 3, 2021 06:09
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jermspeaks/c9bc0cd14f5b21064268 to your computer and use it in GitHub Desktop.
Save jermspeaks/c9bc0cd14f5b21064268 to your computer and use it in GitHub Desktop.
12-WEEK DATA SCIENCE BOOTCAMP

12-Week Data Science Bootcamp in NYC

Original Link

WEEKS 1 & 2 - Data Science with R: Data Analysis

R beginner logo

Week 1 Morning: Data Science with R: Data Analysis Part I

Day 1-3: Basic Programming Elements

  • What is R?
  • Why R?
  • Why R?
  • How to get help
  • R language resources
  • RStudio
  • Installing and using packages
  • Workspace
  • Data Objects: Vectors, Matrices, Data Frames, and Lists
  • Local data import/export
  • Functions
  • Control Statements

Day 4-5: Primary Statistical Methods

  • Descriptive statistics
  • Hypothesis testing
  • Linear Regression
  • Logistic Regression
  • Introducing non-parametric statistics

Week 1 Afternoon: Source Code Control with Git, Github

Day 1: learn Git init, add, push, pull,merge

Day 2: learn Github features

Day 3: work in team using branching and merging

Day 4: make student portfolio page

Week 2: Data Science with R: Data Analysis Part II

Day 1-3: Data Manipulation

  • Data sorting
  • Merging Data
  • Remodeling Data
  • String manipulation
  • Dates and time stamps
  • Web data capture
  • API data sources
  • Connecting to an external database

Day 4-5: Data Visualization

  • Histograms
  • Point graphics
  • Columnar graphics
  • Line charts
  • Pie charts
  • Box Plots
  • Scatter plots
  • Visualizing multivariate data
  • Matrix-based visualizations
  • Maps

WEEKS 3 & 4 - Data Science with R: Machine Learning

R intermediate logo

Week 3: Data Science with R: Machine Learning Part I

Day 1-2: Introducing Data mining

  • What is data mining and how to do it
  • Steps to apply data mining to your data
  • Supervised versus unsupervised learning
  • Regression versus classification problems
  • Review of linear models
  • Simple linear regression
  • Logistic regression
  • Generalized linear models

Day 3: Performance Measures and Dimension Reduction

  • Evaluating model performance
  • Confusion matrices
  • Beyond accuracy
  • Estimating future performance
  • Extension of linear models
  • Subset selection
  • Shrinkage methods
  • Dimension reduction methods

Day 4-5: KNN and Naive Bayes models

  • The k-Nearest Neighbors model
  • Understanding the kNN algorithm
  • Calculating distance
  • Choosing an appropriate k
  • Case study
  • Naive Bayes models
  • Understanding joint probability
  • The Naive Bayes algorithm
  • The Laplace estimator
  • Case study

Week 4: Data Science with R: Machine Learning Part II

Day 1-3: Tree models and SVMs

  • Tree models
  • Regression trees and classification trees
  • Tree models with party
  • Tree models with rpart
  • Random Forest models
  • GBM models
  • Support Vector Machines
  • Maximal margin classifiers
  • Support vector classifiers
  • Support vector machines

Day 4-5: The Association Rule and More Models

  • Market Basket Analysis
  • Understanding association rules
  • The a priori algorithm
  • Case study
  • Unsupervised learning
  • K-means clustering
  • Hierarchical clustering
  • Case study
  • Time series models
  • Stationary time series
  • The ARIMA model
  • The seasonal model

Week 4 Afternoon(Optional): Data Visualization with D3.js

Day 1: Development Tools,Scatter plots

Day 2: Loading Data,Bar charts – Grouped and Stacked

Day 3: Line charts,Brushing, Reusable charts

Day 4: Choropleth maps,Projections

D3 Class

d3 image

The D3.js library is one of the more exciting visualization libraries to be released in the last few years. Based on the concept of building data-driven documents, D3 skills are highly useful for any data scientist that wants to build top-quality interactive visualizations for the web. This class will go over the basics of designing good visualizations and leveraging the browser to communicate data in a really effective manner. Students will have a chance to explore a variety of data sets using D3.js, including plotting dynamic, geographic data using a variety of projections. Finally, we’ll explore other libraries built on top of D3 that make building time-series data visualization really simple.

WEEK 5 - Most Popular and Useful R Toolkits

R logo

Day 1: Knitr – Dynamic and Reproducible Reporting
Day 2: Shiny – Make Web Applications
Day 3: rCharts – Bring R and D3.js
Day 4: QuantMod – R for Finance
Day 5: Slidify – Make html5 slides with R

WEEK 6 - Data Science with Python: Data Analysis

Python logo beginner

Week 6: Data Science with Python: Data Analysis Part I

Day 1: The Python Programming Language

  • Overview of syntax, built in functions and data structures
  • Introduction to the standard library
  • Object oriented programming

Day 2: Computational Statistics

  • Review of probability and statistics
  • Hypothesis testing
  • Introduction to Pandas

Day 3: Data Analysis with Pandas

  • The exploratory data analysis process
  • Working with real world data
  • Data visualization with Matplotlib

Day 4: Getting Data from the Web

  • Web scraping
  • Accessing APIs
  • Building web applications

Day 5: Introduction to Machine Learning

  • What is machine learning?
  • The Scikit-Learn API
  • Image Processing / Text Classification

WEEK 7 - Data Science with Python: Machine Learning

Python logo intermediate

Day 1 – Introduction

Mathematics review Linear Regression Multivariate linear regression Lab: Numpy/Scikit-Learn

Day 2 – Regression and Classification

Naive Bayes Classifiers k-Nearest Neighbors Logistic Regression Linear Discriminant Analysis Lab: Supervised Learning

Day 3 – Resampling and Model selection

Cross-validation Bootstrap Feature selection Lab: Model selection and regularization

Day 4 – Support Vector Machines and Decision Trees

Support Vector Machines Decision Trees Forests Lab: Decision Trees and SVMs

Day 5 – Unsupervised Learning

Principal Component Analysis Clustering with K-Means State Estimation Lab: PCA and clustering

Algorithms we cover in week 7:

Methods Algorithms
Regression linear_model.LinearRegression
linear_model.Ridge
linear_model.Lasso
linear_model.ElasticNet
Classification(Discriminant Analysis) lda.LDA
qda.QDA
Classification(Tree based model) tree.DecisionTreeClassifier
ensemble.RandomForestClassifier
Classification(the others) linear_model.LogisticRegression
svm.SVC
Classification(Nearest Neighbors) neighbors.KNeighborsClassifier
neighbors.RadiusNeighborsClassifier
Classification(Naive Bayes) naive_bayes.GaussianNB
naive_bayes.MultinomialNB
naive_bayes.BernoulliNB
Unsupervised Learning decomposition.PCA
cluster.KMeans
cluster.AgglomerativeClustering
Feature Selection feature_selection.VarianceThreshold
feature_selection.SelectKBest
feature_selection.SelectPercentile
Cross-Validation cross_validation.KFold
cross_validation.StratifiedKFold
cross_validation.cross_val_score
cross_validation.train_test_split
Model Selection linear_model.RidgeCV
linear_model.LassoCV
linear_model.ElasticNetCV
grid_search.GridSearchCV

WEEK 8 - Big Data with Hadoop: Data Engineering Professionals

Hadoop logo beginner

Day 1

  • Introduction to the origin and functions of Hadoop
  • How to build a Hadoop cluster on Amazon cloud

Day 2

  • The principle operations of Hadoop Distributed File System (HDFS)
  • HDFS API programming.

Day 3

  • The principle system and working mechanisms of Map-Reduce
  • Hadoop data flow
  • Map-Reduce programming
  • Connecting Eclipse to a Hadoop cluster

Day 4

  • Advanced Hadoop applications
  • Installation and applications of Pig
  • Architecture and installation of Hive
  • Applications of HiveQL
  • Data Mining with Mahout

Day 5

  • Architecture of HBase and Zookeeper
  • Installation and management of HBase
  • The data model of HBase

WEEK 9 - Big Data with Hadoop: 5 Real World Applications

Hadoop logo intermediate

Day 1

  • Review of Hadoop basics
  • Summary of Hadoop applications
  • Analysis of high volume website log systems
  • Retrieving KPI data (using Map-Reduce)

Day 2

  • LBS applications for telecommunication companies
  • Analysis of trace of users’ mobile phones (using Map-Reduce)
  • User analysis for telecommunication companies
  • Labeling duplicate users by the fingerprint of calls (using Map-Reduce)
  • Recommendation systems for E-commerce companies (using Map-Reduce)

Day 3

  • Complicated recommendation system applications (using Mahout)
  • Social networks
  • Distance between users
  • Community detection (using Pig)
  • Importance of nodes in a social network (using Map-Reduce)

Day 4

  • Application of clustering algorithms
  • Analysis of VIP (using Map-Reduce, Mahout)
  • Financial data analysis
  • Retrieving reverse repurchase information from historical data (using Hive)
  • Setting stock strategies with data analysis (using Map-Reduce, Hive)

Day 5

  • GPS applications
  • Sign-in data analysis (using Pig)
  • Implementation and optimization of sorting (using Map-Reduce)
  • Middleware development
  • Cooperation between multiple Hadoop clusters

WEEKS 1 - 9 - Afternoon Hardware Project

Raspberry Pi Class

Raspberry Pi

Alan Perlis once wrote, "I think that it's extraordinarily important that we in computer science keep fun in computing." Learning about the Raspberry Pi is just about that: it’s supposed to be FUN! It's also an inexpensive ticket to discovering more about hardware hacking, operating systems, and programming languages. Need a small web server? Done. Want to build a small, amateur weather station? Done. Want to watch your home DVD collection? Done, all from the same device. We'll cover the basics of this credit-card sized computer as well as explore fun applications in software and hardware. This series will focus on the RPi Model B+ with kits provided for students. The first few classes will cover setup and install, while later classes will cover installing new programs and packages, and finally, students will interface the RPi to hobby electronics and sensors. You will create your own data-collecting machine and be able to leverage your new data science skills to make sense of it. Raspberry Pi is a trademark of the Raspberry Pi Foundation. This class is not officially endorsed by the Raspberry Pi Foundation.

WEEKS 10 & 11 - Capstone Project

2-Week Student Project guided by Instructor and TA's

WEEK 12 - Interview Preparation, Students Virtual Job Fair & Interview Arrangements

Network and promote yourself to our many hiring partners in New York City on our digital hiring platform. Leverage a network of mentors, alumni, and partner companies. If the firm is interested in your projects, you will be scheduled to interview with them through our platform.

We will focus on practice interviews, professional resume feedback, presentation coaching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment