Skip to content

Instantly share code, notes, and snippets.

View EvanZ's full-sized avatar

Evan Zamir EvanZ

  • Pinpoint Predictive
  • San Francisco
  • X @thecity2
View GitHub Profile

Configuring Spark 1.6.1 to work with Jupyter 4.x Notebooks on Mac OS X with Homebrew

I've looked around in a number of places and I have found several blog entries on setting up IPython notebooks to work with Spark. However since most of the blog posts have been written both IPython and Spark have been updated. Today, IPython has been transformed into Jupyter, and Spark is near release 1.6.2. Most of the information is out there to get things working, but I thought I'd capture this point in time with a working configuration and how I set it up.

I rely completely on Homebrew to manage packages on my Mac. So Spark, Jupyter, Python, Jenv and other things are installed via Homebrew. You should be able to achieve the same thing with Anaconda but I don't know that package manager.

Install Java

Make sure your Java installation is up to date. I use jEnv to manage Java installations on my Mac, so that adds another layer to make sure is set up correctly. You can download/update Java from Oracle, have Homebrew

@bishboria
bishboria / springer-free-maths-books.md
Last active April 25, 2024 06:27
Springer made a bunch of books available for free, these were the direct links
import org.apache.spark.ml.feature.{CountVectorizer, RegexTokenizer, StopWordsRemover}
import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer}
import org.apache.spark.mllib.linalg.Vector
import sqlContext.implicits._
val numTopics: Int = 100
val maxIterations: Int = 100
val vocabSize: Int = 10000
@staltz
staltz / introrx.md
Last active April 25, 2024 04:18
The introduction to Reactive Programming you've been missing
@pulse-
pulse- / gist:8655893
Created January 27, 2014 19:44
Django migrate sqlite3 db to postgres - The easy way.
I had this really small problem today. I wanted to migrate one of my small django apps to use postgres, just to make everything easy to manage. Sqlite3 is perfectly fine for the amount of load, however I am really much faster at administering postgres than I am on sqlite3. So I decided to migrate the stuff over.
I tried a few approaches, but what ultimately worked the best and the fastest fo rmy particular problem was to do the following.
Use original SQLITE3 connection in settings.py
1. python manage.py dumpdata > dump.json
(I read some things here about some options you can pass, at the end what just worked was the following)
2. Change DB connection string in settings.py to POSTGRES
@pssguy
pssguy / global.R
Created March 19, 2013 19:06
Upload Example in Shiny App. Takes Pupil's term marks and does some analyses
# load required libraries
library(shiny)
library(plyr)
library(ggplot2)
library(googleVis)
library(reshape2)
####creation of example data on local directory for uploading####
@mblondel
mblondel / lda_gibbs.py
Last active October 9, 2023 11:31
Latent Dirichlet Allocation with Gibbs sampler
"""
(C) Mathieu Blondel - 2010
License: BSD 3 clause
Implementation of the collapsed Gibbs sampler for
Latent Dirichlet Allocation, as described in
Finding scientifc topics (Griffiths and Steyvers)
"""