Skip to content

Instantly share code, notes, and snippets.

View andrewmusselman's full-sized avatar

Andrew Musselman andrewmusselman

View GitHub Profile
@andrewmusselman
andrewmusselman / data-sci-eng.md
Last active March 7, 2016 19:23
Data Scientist and Engineer Job Descriptions

Job title – Data Scientist

Basic Qualifications

  1. Bachelor’s Degree in Mathematics, Technical Science, Computer Science (or equivalent) or Engineering
  2. Minimum 1 year programming experience in at least one item from each:
  • R, SAS,Mathematica, MATLAB, Sagemath
  • Python, Ruby, Perl, Java, Scala
  • Linux
  • Bash scripting including sed, awk, cut, uniq, sort, tr
$ mahout
Running on hadoop, using /home/akm/hadoop-2.4.1/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/akm/mahout/examples/target/mahout-examples-0.10.1-SNAPSHOT-job.jar
An example program must be given as the first argument.
Valid program names are:
arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
buildforest: : Build the random forest classifier
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
import numpy as np
mp = matrix([[0.00, 0.25, 0.25, 0.8],[0.75, 0.00, 0.25, 0.9],[0.25, 0.75, 0.50, 0.3]])
numpy.linalg.svd(mp)
U, s, V = np.linalg.svd(mp, full_matrices=True)
U.shape, V.shape, s.shape
S = np.zeros((3, 4), dtype=complex)
S[:3, :3] = np.diag(s)
np.allclose(mp, np.dot(U, np.dot(S, V)))