View TikaParquetParser.scala
import{File, FileOutputStream, IOException, InputStream}
import java.util
import scala.collection.JavaConverters._
import org.xml.sax.{ContentHandler, SAXException}
import org.apache.tika.metadata.Metadata
import org.apache.tika.metadata.HttpHeaders.CONTENT_TYPE
import org.apache.tika.mime.MediaType
import org.apache.tika.parser.{AbstractParser, ParseContext}
View gist:29247ec102a011f917d4f56a9f719685
def avgTime(message: String, f: => Any) {
var avg = 0L
val c = 42
1 to c foreach {
_ =>
val t0 = System.nanoTime()
val t1 = System.nanoTime()
avg += t1 - t0

Squashing Git Commits

The easy and flexible way

This method avoids merge conflicts if you have periodically pulled master into your branch. It also gives you the opportunity to squash into more than 1 commit, or to re-arrange your code into completely different commits (e.g. if you ended up working on three different features but the commits were not consecutive).

Note: You cannot use this method if you intend to open a pull request to merge your feature branch. This method requires committing directly to master.

Switch to the master branch and make sure you are up to date:



Sendy is a self hosted email newsletter application that lets you send trackable emails via Amazon Simple Email Service (SES).


You can deploy Sendy on Heroku using the following instructions (I assume you've already installed the heroku toolbelt).

  1. On Heroku, create a new app.
  2. Clone that app to your desktop
View date.sql
/* Adapted from Tom Cunningham's 'Data Warehousing with MySql' ( */
###### small-numbers table
DROP TABLE IF EXISTS numbers_small;
CREATE TABLE numbers_small (number INT);
INSERT INTO numbers_small VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
###### main numbers table
CREATE TABLE numbers (number BIGINT);
View gist:fa75903000899293cefd83c18566849e
/* Caveat: I'm not a PHP programmer, so this may or may
* not be the most idiomatic code...
* FPDF is a free PHP library for creating PDFs:
class PDF extends FPDF {
View gist:d73329202559b7e3c083aadb45334729
  1. General Background and Overview
from __future__ import print_function
from pyspark import SparkContext, SparkConf
from pyspark.mllib.linalg import DenseVector, VectorUDT
from pyspark.sql import SQLContext
from import MultilayerPerceptronClassifier
from import MulticlassClassificationEvaluator
from pyspark.sql.types import StructType, StructField, StringType, DoubleType, ArrayType
View LDA_SparkDocs
This example uses Scala. Please see the MLlib documentation for a Java example.
Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.
This example is paired with a blog post on LDA in Spark:
import scala.collection.mutable