Skip to content

Instantly share code, notes, and snippets.

View vsingh58's full-sized avatar

Venu Kanaparthy vsingh58

  • ESRI
  • California
View GitHub Profile
/*
This example uses Scala. Please see the MLlib documentation for a Java example.
Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.
This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
Spark: http://spark.apache.org/
*/
import scala.collection.mutable

If you were to give recommendations to your "little brother/sister" on things that they need to do to become a data scientist, what would those things be?

I think the "Data Science Venn Diagram" (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) is a great place to start. You need three things to be a good data scientist:

  • Statistical knowledge
  • Programming/hacking skills
  • Domain expertise

Statistical knowledge

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@vsingh58
vsingh58 / file.md
Last active August 29, 2015 14:07 — forked from nicolewhite/file.md

Datasets for Graph Hack 2014.

All stores are Neo4j 2.1.3.

Transportation

What is related, and how?

Flight	ORIGIN Airport
@vsingh58
vsingh58 / Graphing
Last active August 29, 2015 14:07 — forked from msund/Graphing
{
"metadata": {
"name": "Three new matplotlib plots"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
  1. General Background and Overview
@vsingh58
vsingh58 / kmeans.py
Last active August 29, 2015 14:06 — forked from ceteri/kmeans.py
print(__doc__)
from time import time
import numpy as np
import pylab as pl
from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
@vsingh58
vsingh58 / build.log
Last active August 29, 2015 14:06 — forked from ceteri/build.log
bash-3.2$ git show | head
commit d78b48fffff32898a9f76e94923d45a84d7e330e
Author: Paco Nathan <ceteri@gmail.com>
Date: Sat Mar 16 19:11:46 2013 -0700
fixed cmd line opts to allow for a different label field, for the confusion matrix calculation
diff --git a/README.md b/README.md
index ed10626..2e8996e 100644
--- a/README.md