This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import Image, ImageOps | |
import subprocess, sys, os, glob | |
# minimum run of adjacent pixels to call something a line | |
H_THRESH = 300 | |
V_THRESH = 300 | |
def get_hlines(pix, w, h): | |
"""Get start/end pixels of lines containing horizontal runs of at least THRESH black pix""" | |
hlines = [] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com.avira.ds.sparser.spark | |
import org.apache.hadoop.io.NullWritable | |
import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat | |
import org.apache.spark.{SparkContext, SparkConf} | |
import scala.language.implicitConversions | |
sealed trait Event | |
case class ClickEvent(blaBla: String) extends Event | |
case class ViewEvent(blaBla: String) extends Event |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- mode: ruby -*- | |
# vi: set ft=ruby : | |
ipythonPort = 8001 # Ipython port to forward (also set in IPython notebook config) | |
Vagrant.configure(2) do |config| | |
config.ssh.insert_key = true | |
config.vm.define "sparkvm" do |master| | |
master.vm.box = "sparkmooc/base" | |
master.vm.box_download_insecure = true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# set 8 cores and 6GB of RAM | |
Vagrant.configure(2) do |config| | |
config.vm.define "myvm" do |master| | |
master.vm.provider :virtualbox do |v| | |
v.customize ["modifyvm", :id, "--ioapic", "on"] # this one is important for setting cores | |
v.customize ["modifyvm", :id, "--cpus", 8] | |
v.customize ["modifyvm", :id, "--memory", 6144] | |
end | |
end | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# insert somewhere in function working with sc directly | |
sc.stop() | |
from pyspark import SparkContext | |
SparkContext.setSystemProperty('spark.executor.memory', '6g') # no sure which one works, use both | |
SparkContext.setSystemProperty('spark.python.worker.memory', '6g') # no sure which one works, use both | |
SparkContext.setSystemProperty('spark.shuffle.spill', 'false') | |
SparkContext.setSystemProperty('spark.driver.memory', '2g') | |
SparkContext.setSystemProperty('spark.io.compression.codec', 'snappy') # just to be sure | |
sc = SparkContext("local[8]", "Simple App") # set to your number of cores |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# maven | |
find . -name pom.xml -type f | xargs -L1 sh -c 'dirname $0' | xargs -L1 sh -c 'cd $0 && mvn clean' | |
# sbt | |
find . -name build.sbt -type f | xargs -L1 sh -c 'dirname $0' | xargs -L1 sh -c 'cd $0 && sbt clean' | |
# Hint: put 'git pull' as a last command and you will get all your repos updated |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# add these lines to .bashrc or to the other start script | |
export SEARCH_MR_JOB_JAR="/opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-job.jar" | |
alias dfsFind="hadoop jar ${SEARCH_MR_JOB_JAR} org.apache.solr.hadoop.HdfsFindTool" | |
#alias MapReduceIndexerTool="hadoop jar ${SEARCH_MR_JOB_JAR} org.apache.solr.hadoop.MapReduceIndexerTool" | |
# use it like regular find: | |
# dfsFind / -name "*.snappy" | grep flume |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
yarn node -list -all 2>>/dev/null|cut -f3|grep -v "Total Nodes"|grep -P "\:\d{2,}$"|cut -d':' -f1 | |
hadoop fs -mkdir /tmp/${tmp_dir} | |
hadoop fs -put ${dest} /tmp/${tmp_dir}/ | |
pdsh hadoop fs -get /tmp/${tmp_dir}/${dest} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#you'll need mencoder and x264 packages installed | |
#just add a video name like output.mp4 at the end | |
alias jpgtoh264="mencoder mf://*.jpg -nosound -of lavf -lavfopts format=mp4 -ovc x264 -x264encopts pass=1:bitrate=2000:crf=24 -mf type=jpg:fps=30 -o" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
#set correct path | |
HADOOP_HOME="/usr/lib/hadoop" | |
condition="" | |
fs="\t" | |
words="" | |
lines="" | |
chars="" |