Skip to content

Instantly share code, notes, and snippets.

View vsingh58's full-sized avatar

Venu Kanaparthy vsingh58

  • ESRI
  • California
View GitHub Profile
/*
This example uses Scala. Please see the MLlib documentation for a Java example.
Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.
This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
Spark: http://spark.apache.org/
*/
import scala.collection.mutable

If you were to give recommendations to your "little brother/sister" on things that they need to do to become a data scientist, what would those things be?

I think the "Data Science Venn Diagram" (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) is a great place to start. You need three things to be a good data scientist:

  • Statistical knowledge
  • Programming/hacking skills
  • Domain expertise

Statistical knowledge

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@vsingh58
vsingh58 / file.md
Last active August 29, 2015 14:07 — forked from nicolewhite/file.md

Datasets for Graph Hack 2014.

All stores are Neo4j 2.1.3.

Transportation

What is related, and how?

Flight	ORIGIN Airport
@vsingh58
vsingh58 / Graphing
Last active August 29, 2015 14:07 — forked from msund/Graphing
{
"metadata": {
"name": "Three new matplotlib plots"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
  1. General Background and Overview
@vsingh58
vsingh58 / 0.setup.sh
Last active August 29, 2015 14:06 — forked from ceteri/0.setup.sh
# using four part files to construct "minitweet"
cat rawtweets/part-0000[1-3] > minitweets
# change log4j properties to WARN to reduce noise during demo
mv conf/log4j.properties.template conf/log4j.properties
vim conf/log4j.properties # Change to WARN
# launch Spark shell REPL
./bin/spark-shell
import nltk
nltk.download()
## use nltk.download() within a Python prompt to
## download the `punkt` data
## Anaconda is recommended, to pick up NumPy, NLTK, etc.
## http://continuum.io/downloads
## this also requires TextBlob/PerceptronTagger