Skip to content

Instantly share code, notes, and snippets.

View dportabella's full-sized avatar

David Portabella dportabella

  • Lausanne, Switzerland
View GitHub Profile
@dportabella
dportabella / dstat
Last active August 29, 2015 13:56
command line tool to list all files, with properties and checksum
#! /bin/bash
# use this script as follows: find . -mount -exec dstat {} \;
# this produces a list as follows:
# Regular File -rw-r--r-- david staff 2013-11-09 01:33:24 2013-11-09 01:33:24 14787 c3a7afd9e3cf89543352ee58e26cfb10 Invoice_41010102336895558_6601081486112013.pdf pdf ./accounting/files/Invoice_41010102336895557_6601081486112013.pdf
# Regular File -rw-r--r-- david staff 2013-09-01 00:41:05 2013-09-01 00:41:05 13636 55b47d2a41d5d6a072439ef2dabacac4 Invoice_41010102336895558_6601108809092013.pdf pdf ./accounting/files/Invoice_41010102336895557_6601108809092013.pdf
# ...
# see http://linux.die.net/man/1/stat for more stat options
if [ -z "$1" ] ; then
@dportabella
dportabella / gist:9388448
Last active August 29, 2015 13:57
ruby, converting an array of hashes to a hash based on hash key
def hasharray2hash(array, hash_key)
if (!array.kind_of?(Array) || !hash_key.kind_of?(String))
raise 'invalid parameters'
end
array.reduce({}){|cumulate,entry|
if !entry.kind_of?(Hash)
raise 'expecting a hash: ' + entry.to_s
end
key = entry[hash_key]
@dportabella
dportabella / biemond-orawls-vagrant-javaexec-log.txt
Created April 10, 2014 11:41
install weblogic with wlst using biemond-orawls puppet module. this is the log of all java executions, by using this patched vagrant https://github.com/dportabella/biemond-orawls-vagrant/tree/javaexec_log
+++++++++++++++++++++++++++++
+++ EXECUTING JAVA. CMD: java -version +++ start provisioning +++
+++ EXECUTING JAVA. time: 20140410-093331
+++ EXECUTING JAVA. java: /usr/java/jdk1.7.0_51/jre/bin/java
+++ EXECUTING JAVA. cp:
+++ EXECUTING JAVA. user: root
+++ EXECUTING JAVA. running. log: /tmp/log_puppet_weblogic/log-20140410-093331.txt
+++ EXECUTING JAVA. cat /tmp/log_puppet_weblogic/log-20140410-093331.txt. START.
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
@dportabella
dportabella / setup_puppet_developer_env_and_run_spec_tests.sh
Last active August 29, 2015 13:59
Set-up an environment to run puppet as a developer and run the spec tests
# here there is a guide about contributing to puppet:
# https://github.com/puppetlabs/puppet/blob/master/docs/quickstart.md
# It says:
# Quick Start to Developing on Puppet
# Before diving into the code, you should first take the time to make sure you have an environment where you can run puppet as a developer.
# In a nutshell you need: the puppet codebase, ruby versions, and dependencies.
# Once you've got all of that in place you can make sure that you have a working development system by running the puppet spec tests.
#
# I didn't manage to install all this in my OSX without breaking other dependencies, so I created a vagrant box with Ubuntu.
# I share here this set-up, so that it can be reproduced easily by anyone.
@dportabella
dportabella / gist:5766099
Last active December 18, 2015 10:09
Transform a callback function to an iterator/list (in Scala or Java). See an example use case: http://scalaenthusiast.wordpress.com/2013/06/12/transform-a-callback-function-to-an-iteratorlist-in-scala/
import java.util.concurrent.ArrayBlockingQueue
trait OptionNextToIterator[T] extends Iterator[T] {
def getOptionNext: Option[T];
var answerReady: Boolean = false
var eof: Boolean = false
var element: T = _
def hasNext = {
@dportabella
dportabella / build.sbt
Last active May 31, 2016 02:07
sbt project for the spark distribution examples
val sparkVersion = "1.6.1"
val hbaseVersion = "0.98.7-hadoop2"
name := "spark-examples"
version := sparkVersion
javacOptions ++= Seq("-source", "1.8", "-target", "1.8", "-Xlint")
initialize := {
@dportabella
dportabella / RunTestOnMultipleGithubRepos
Created November 8, 2016 21:13
An example Scala script that runs a test on all github projects with a given name and their forks and branches (you need to install ammonite: brew install ammonite-repl)
#!/usr/bin/env amm
/* To run this script:
* $ chmod +x ./RunTestOnMultipleGithubRepos
* $ ./RunTestOnMultipleGithubRepos
*/
import ammonite.ops._
import scalaj.http._
import $ivy.`org.eclipse.jgit:org.eclipse.jgit:4.5.0.201609210915-r`, org.eclipse.jgit.api.Git
@dportabella
dportabella / deserialize_hadoop_sequence_file.scala
Last active November 8, 2016 21:42
How to deserialize a hadoop result sequence file outside hadoop (or a spark saveAsObjectFile outside spark)
// libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.3"
import java.io.{ByteArrayInputStream, ObjectInputStream}
import org.apache.hadoop.conf._
import org.apache.hadoop.fs._
import org.apache.hadoop.io._
val f = "/path/to/part-00000"
val reader = new SequenceFile.Reader(new Configuration(), SequenceFile.Reader.file(new Path(f)))
@dportabella
dportabella / DeserializeHadoopSequenceFileWithoutClassDeclaration.scala
Last active November 8, 2016 22:32
How to deserialize a hadoop result sequence file outside hadoop (or a spark saveAsObjectFile outside spark) without having the class declaration
// resolvers += "dportabella-3rd-party-mvn-repo-releases" at "https://github.com/dportabella/3rd-party-mvn-repo/raw/master/releases/"
// libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.3"
// libraryDependencies += "com.github.dportabella.3rd-party-mvn-repo" % "jdeserialize" % "1.0.0",
import java.io._
import org.apache.hadoop.conf._
import org.apache.hadoop.fs._
import org.apache.hadoop.io._
import org.unsynchronized.jdeserialize
@dportabella
dportabella / FilterArchive.scala
Created February 8, 2017 09:39
Example to filter a WARC archive using Spark and storing the result back to a WARC archive
package application
import java.io._
import java.util
import org.apache.spark.rdd.RDD
import org.archive.format.warc.WARCConstants.WARCRecordType
import org.archive.io.warc.WARCRecordInfo
import org.warcbase.spark.archive.io.ArchiveRecord
import org.warcbase.spark.matchbox.RecordLoader