- Regularization and variable selection method.
- Sparse Representation
- Exihibits grouping effect.
- Prticulary useful when number of predictors (p) >> number of observations (n).
- LARS-EN algorithm to compute elastic net regularization path.
- Link to paper.
An example of using the Watson Speech to Text API to translate a podcast from ProPublica: How a Reporter Pierced the Hype Behind Theranos
This is just a simpler demo of the same technique I demonstrate to make automated video supercuts in this repo: https://github.com/dannguyen/watson-word-watcher
The transcription takes just a few minutes (less if you parallelize the requests to IBM) and is free...but it isn't perfect by any means. It doesn't fare super well on proper nouns:
- Charles Ornstein's last name is transcribed as 'Orenstein'
- John Carreyrou's last name becomes "John Kerry Roo"
# load_packages ----------------------------------------------------------- | |
packages <- | |
c('nbastatR', #devtools::install_github("abresler/nbastatR") | |
'explodingboxplotR', #devtools::install_github("timelyportfolio/explodingboxplotR") | |
'ggplot2', | |
'dplyr', | |
'purrr', | |
'magrittr') |
""" | |
Minimal character-level demo. Written by Andrej Karpathy (@karpathy) | |
BSD License | |
""" | |
import numpy as np | |
# data I/O | |
data = open('data.txt', 'r').read() # should be simple plain text file | |
chars = list(set(data)) | |
print '%d unique characters in data.' % (len(chars), ) |
Notes from reading through R Packages by Hadley Wickham. This is meant to review, not replace, a thorough readthrough. I mainly wrote this as a personal review, since writing summaries and attempting to teach others are some of the best ways to learn things.
-
Packages are used to organize code together so that it can be used repeatedly and shared with others.
-
A lot of work with packages is done via the devtools package.
If you were to give recommendations to your "little brother/sister" on things that they need to do to become a data scientist, what would those things be?
I think the "Data Science Venn Diagram" (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) is a great place to start. You need three things to be a good data scientist:
- Statistical knowledge
- Programming/hacking skills
- Domain expertise