Venu Kanaparthy vsingh58

## LDA_SparkDocs
/*
This example uses Scala.  Please see the MLlib documentation for a Java example.

Try running this code in the Spark shell.  It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.

This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
Spark: http://spark.apache.org/
*/

import scala.collection.mutable

## ds-training.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vsingh58
                / ds-training.md
            
            
              Last active
              August 29, 2015 14:22
                — forked from hadley/ds-training.md
            
          
If you were to give recommendations to your "little brother/sister" on things
that they need to do to become a data scientist, what would those things be?

I think the "Data Science Venn Diagram" (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) is a great place to start. You need three things to be a good data scientist:

Statistical knowledge
Programming/hacking skills
Domain expertise

Statistical knowledge


## FB_quicklook.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vsingh58
                / FB_quicklook.ipynb
            
            
              Last active
              August 29, 2015 14:19
                — forked from kevindavenport/FB_quicklook.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## Dynamic-Time-Series-Modeling.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vsingh58
                / Dynamic-Time-Series-Modeling.ipynb
            
            
              Last active
              August 29, 2015 14:19
                — forked from kevindavenport/Dynamic-Time-Series-Modeling.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## 35-hour_workweek_with_python.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vsingh58
                / 35-hour_workweek_with_python.ipynb
            
            
              Last active
              August 29, 2015 14:19
                — forked from kevindavenport/35-hour_workweek_with_python.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## file.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vsingh58
                / file.md
            
            
              Last active
              August 29, 2015 14:07
                — forked from nicolewhite/file.md
            
          
    Datasets for Graph Hack 2014.

All stores are Neo4j 2.1.3.
Transportation

Flights

What is related, and how?
Flight	ORIGIN Airport


## Graphing
{
 "metadata": {
  "name": "Three new matplotlib plots"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {

## gist:17d7c3b5af2c2d31d3ba

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vsingh58
                / gist:17d7c3b5af2c2d31d3ba
            
            
              Last active
              August 29, 2015 14:07
                — forked from debasishg/gist:8172796
            
          
General Background and Overview


Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&amp;rep


## kmeans.py
print(__doc__)

from time import time
import numpy as np
import pylab as pl

from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA

## build.log
bash-3.2$ git show | head
commit d78b48fffff32898a9f76e94923d45a84d7e330e
Author: Paco Nathan <ceteri@gmail.com>
Date:   Sat Mar 16 19:11:46 2013 -0700

    fixed cmd line opts to allow for a different label field, for the confusion matrix calculation

diff --git a/README.md b/README.md
index ed10626..2e8996e 100644
--- a/README.md
	/*
	This example uses Scala. Please see the MLlib documentation for a Java example.

	Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.

	This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
	Spark: http://spark.apache.org/
	*/

	import scala.collection.mutable
	{
	"metadata": {
	"name": "Three new matplotlib plots"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	print(__doc__)

	from time import time
	import numpy as np
	import pylab as pl

	from sklearn import metrics
	from sklearn.cluster import KMeans
	from sklearn.datasets import load_digits
	from sklearn.decomposition import PCA
	bash-3.2$ git show \| head
	commit d78b48fffff32898a9f76e94923d45a84d7e330e
	Author: Paco Nathan <ceteri@gmail.com>
	Date: Sat Mar 16 19:11:46 2013 -0700

	fixed cmd line opts to allow for a different label field, for the confusion matrix calculation

	diff --git a/README.md b/README.md
	index ed10626..2e8996e 100644
	--- a/README.md