Skip to content

Instantly share code, notes, and snippets.

View cneud's full-sized avatar
🐙

Clemens Neudecker cneud

🐙
View GitHub Profile
@cneud
cneud / eunp-dir-cleaner.bat
Created April 18, 2014 15:11
Batch script for cleaning of data transfer directories in Europeana Newspapers
@ECHO OFF
REM recursively traverse through directories and delete all instances of JPG|PNG|TIF|JP2 image files
CHOICE /C:12345 /M "Really delete all images of type (1) JPG, (2) JP2, (3) TIF, (4) PNG or (5) Cancel?"
IF ERRORLEVEL 5 GOTO Cancel
IF ERRORLEVEL 4 GOTO PNG
IF ERRORLEVEL 3 GOTO TIF
IF ERRORLEVEL 2 GOTO JP2
IF ERRORLEVEL 1 GOTO JPG
GOTO END
:JPG
@cneud
cneud / eunp-img-conversion.sh
Last active August 29, 2015 14:00
Bash script for image conversion for Europeana Newspapers
#!/bin/bash
# Usage:
# ./eunp-img-conversion.sh input.tif temp.tif output.jp2
# 1. Invoke GraphicsMagick command line to convert master images to uncompressed 150ppi TIF with unsharp mask
# 2. Invoke Kakadu kdu_compress command line to convert uncompressed TIF to JP2000
gm convert $1 -resample 150x150 -unsharp 1.5 -compress None ptif:$2 | kdu_compress -i $2 -o $3 -rate 1.0,0.84,0.7,0.6,0.5,0.4,0.35,0.3,0.25,0.21,0.18,0.15,0.125,0.1,0.088,0.075,0.0625,0.05,0.04419,0.03716,0.03125,0.025,0.0221,0.01858,0.015625 Clevels=6 Stiles=\{1024,1024\} Cmodes=\{BYPASS\} Corder=RLCP Cblk=\{64,64\} -no_palette
@cneud
cneud / scape-opf-pig-script
Created April 18, 2014 15:55
Pig script for ARC analysis using WarcBase and Tika UDF's
register './warcbase_kb/target/warcbase-0.1.0-SNAPSHOT-fatjar.jar';
raw = load '/tmp/IAH-20080430204825-00000-blackbook.arc.gz' using
org.warcbase.pig.ArcLoader() as (url: chararray, date:chararray, mime:chararray, content:chararray);
a = foreach raw generate url,mime,content,SUBSTRING(date,0,12) as date,org.warcbase.pig.piggybank.DetectMimeType(content) as tikaMime;
b = filter a by (tikaMime == 'text/html');
c = foreach b generate url,mime,tikaMime,date,org.warcbase.pig.piggybank.ExtractRawText(content) as txt;
d = foreach c generate url,mime,tikaMime,date,org.warcbase.pig.piggybank.DetectLanguage(txt) as lang;
e = group d by (lang,date);
@cneud
cneud / countChars.bsh
Created April 18, 2014 15:59
Beanshell for counting chars in Taverna
BufferedReader getReader (String fileUrl) throws IOException {
InputStreamReader reader;
try {
reader = new FileReader(fileUrl);
}
catch (FileNotFoundException e) {
// try a real URL instead
URL url = new URL(fileUrl);
reader = new InputStreamReader (url.openStream());
}
@cneud
cneud / levenshtein.bsh
Created April 18, 2014 16:02
Beanshell for calculating Levenshtein distance of two input strings
import org.apache.commons.lang.StringUtils;
import java.text.DecimalFormat;
double ld = StringUtils.getLevenshteinDistance(text1, text2);
double avglen = ((double)text1.length()+(double)text2.length())/2.0;
double m = 1.0-(ld/avglen);
double normVal = (m<0)?0.0:m;
float f = (float) normVal * 100;
DecimalFormat s = new DecimalFormat("##.##");
normalized_levenshtein_distance = s.format(f);
@cneud
cneud / csv2list.bsh
Created April 18, 2014 16:05
Beanshell for csv -> list conversion
List leftList = new ArrayList();
List rightList = new ArrayList();
String[] lines = csv.split("\n");
for(line : lines) {
String[] urls = line.split("\"");
leftList.add(urls[1]);
rightList.add(urls[3]);
}
@cneud
cneud / rename.rb
Created April 18, 2014 16:13
Ruby script for recursively renaming files and directories
#!/usr/bin/ruby
def rename(dir, map)
Dir.foreach(dir) do |filename|
next if filename =~ /^\.+$/ or File.directory?("#{dir}/#{filename}")
(entry, extension) = filename.sub("file", "").split(".")
entry.sub!(/^0+/, "")
if map[entry].nil?
raise "PROBLEM: no entry for file #{dir}/#{filename} with id #{entry}"
else
@cneud
cneud / cluster-up.sh
Last active June 11, 2017 12:23
Bash script containing all steps required to fire up a CDH cluster
#!/bin/bash
# Hadoop cluster start-up script
#
# 1. Format the namenode (only required on 1st start!)
# sudo -u hdfs hdfs namenode -format
# 2. Start HDFS
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
# 3. Create the /temp directory
@cneud
cneud / XMLtoJSON.js
Last active August 29, 2015 14:17
XML to JSON converter function
// Converts XML to JSON
// from: http://coursesweb.net/javascript/convert-xml-json-javascript_s2
function XMLtoJSON() {
var me = this; // stores the object instance
// gets the content of an XML file and returns it in
me.fromFile = function(xml, rstr) {
// Creates an instance of a XMLHttpRequest object
var xhttp = (window.XMLHttpRequest) ? new XMLHttpRequest() : new ActiveXObject("Microsoft.XMLHTTP");
// sets and sends the request for calling "xml"
@cneud
cneud / traverse_dir_python_call.bat
Created February 17, 2016 12:44
Recursively traverse dirs & call Python program
FOR /R %%a IN (*.foo) DO python foo.py "%%a" > "%%~dpna.foo"