Skip to content

Instantly share code, notes, and snippets.

View OElesin's full-sized avatar

Olalekan Fuad Elesin OElesin

View GitHub Profile

Examples for python and Spark

Link

  • Word Count
import sys
from operator import add
from pyspark import SparkContext

if __name__ == "__main__":
    if len(sys.argv) != 2:
-- This is a Hive program. Hive is an SQL-like language that compiles
-- into Hadoop Map/Reduce jobs. It's very popular among analysts at
-- Facebook, because it allows them to query enormous Hadoop data
-- stores using a language much like SQL.
-- Our logs are stored on the Hadoop Distributed File System, in the
-- directory /logs/randomhacks.net/access. They're ordinary Apache
-- logs in *.gz format.
--
-- We want to pretend that these gzipped log files are a database table,
@OElesin
OElesin / README.md
Last active August 29, 2015 14:24 — forked from mikedfunk/README.md

This uses Twitter Bootstrap classes for CodeIgniter pagination.

Drop this file into application/config.

@OElesin
OElesin / introrx.md
Last active August 29, 2015 14:24 — forked from staltz/introrx.md

The introduction to Reactive Programming you've been missing

(by @andrestaltz)

So you're curious in learning this new thing called Reactive Programming, particularly its variant comprising of Rx, Bacon.js, RAC, and others.

Learning it is hard, even harder by the lack of good material. When I started, I tried looking for tutorials. I found only a handful of practical guides, but they just scratched the surface and never tackled the challenge of building the whole architecture around it. Library documentations often don't help when you're trying to understand some function. I mean, honestly, look at this:

Rx.Observable.prototype.flatMapLatest(selector, [thisArg])

Projects each element of an observable sequence into a new sequence of observable sequences by incorporating the element's index and then transforms an observable sequence of observable sequences into an observable sequence producing values only from the most recent observable sequence.

@OElesin
OElesin / Mail.scala
Created October 9, 2015 05:41 — forked from mariussoutier/Mail.scala
Sending mails fluently in Scala
package object mail {
implicit def stringToSeq(single: String): Seq[String] = Seq(single)
implicit def liftToOption[T](t: T): Option[T] = Some(t)
sealed abstract class MailType
case object Plain extends MailType
case object Rich extends MailType
case object MultiPart extends MailType
@OElesin
OElesin / codeigniter-rating-lib.php
Created December 5, 2015 14:00 — forked from escapeboy/codeigniter-rating-lib.php
CodeIgniter Rating Library + Microdata (optional)
<?php if ( ! defined('BASEPATH')) exit('No direct script access allowed');
/**
* Rating Library
* Using jQuery Raty plugin to rate products
* @author Nikola Katsarov
* @website http://katsarov.biz
*/
class Rating {
package botkop.sparti.receiver
import com.rabbitmq.client._
import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.ReceiverInputDStream
import org.apache.spark.streaming.receiver.Receiver
import scala.reflect.ClassTag
@OElesin
OElesin / install_scala_centos.sh
Created March 9, 2016 18:12 — forked from Antwnis/install_scala_centos.sh
Install Scala CentOS
export SCALA_VERSION=scala-2.11.5
sudo wget http://www.scala-lang.org/files/archive/${SCALA_VERSION}.tgz
sudo echo "SCALA_HOME=/usr/local/scala/scala-2.11.5" > /etc/profile.d/scala.sh
sudo echo 'export SCALA_HOME' >> /etc/profile.d/scala.sh
sudo mkdir -p /usr/local/scala
sudo -s cp $SCALA_VERSION.tgz /usr/local/scala/
cd /usr/local/scala/
sudo -s tar xvf $SCALA_VERSION.tgz
sudo rm -f $SCALA_VERSION.tgz
sudo chown -R root:root /usr/local/scala
@OElesin
OElesin / IpServices.scala
Created July 4, 2016 14:21
This service class converts IP to long and reverse. This class is useful for generating userIDs from IP addresses if user IDs are not present in your data.
def ipToLong(ipAddress: String): Long = {
ipAddress.split("\\.").reverse.zipWithIndex.map(a=>a._1.toInt*math.pow(256,a._2).toLong).sum
}
def longToIP(long: Long): String = {
(0 until 4).map(a=>long / math.pow(256, a).floor.toInt % 256).reverse.mkString(".")
}
@OElesin
OElesin / LDA_SparkDocs
Created July 22, 2016 15:33 — forked from jkbradley/LDA_SparkDocs
LDA Example: Modeling topics in the Spark documentation
/*
This example uses Scala. Please see the MLlib documentation for a Java example.
Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.
This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
Spark: http://spark.apache.org/
*/
import scala.collection.mutable