Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from matplotlib import pyplot | |
import random | |
import time | |
pyplot.style.use("ggplot") | |
now = time.time() | |
def generate_user(censor=now): | |
# Pick some point in time the user was created | |
t_created = t = now - random.random() * 1e7 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library("tidyverse") | |
library("sparklyr") | |
library("sparklyr.nested") | |
library("cowplot") | |
library("ggsci") | |
#Spark config | |
config <- spark_config() | |
# Allowing to GCP datasets access |
If you were to give recommendations to your "little brother/sister" on things that they need to do to become a data scientist, what would those things be?
I think the "Data Science Venn Diagram" (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) is a great place to start. You need three things to be a good data scientist:
- Statistical knowledge
- Programming/hacking skills
- Domain expertise
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import luigi | |
import time | |
class TimeTaskMixin(object): | |
''' | |
A mixin that when added to a luigi task, will print out | |
the tasks execution time to standard out, when the task is | |
finished | |
''' | |
@luigi.Task.event_handler(luigi.Event.PROCESSING_TIME) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
_ssh_auth_save() { | |
ln -sf "$SSH_AUTH_SOCK" "$HOME/.ssh/ssh-auth-sock.$HOSTNAME" | |
} | |
alias screen='_ssh_auth_save ; export HOSTNAME=$(hostname) ; screen' | |
alias tmux='_ssh_auth_save ; export HOSTNAME=$(hostname) ; tmux' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Intro | |
extremely simple and unsophisticated cross process data sharing | |
supports one read-write master process and an arbitrary number of read-only processes | |
please consider using pickle/cPickle/ctype to store complex data | |
# References |