Skip to content

Instantly share code, notes, and snippets.

View cneud's full-sized avatar
🐙

Clemens Neudecker cneud

🐙
View GitHub Profile
@cneud
cneud / countChars.bsh
Created April 18, 2014 15:59
Beanshell for counting chars in Taverna
BufferedReader getReader (String fileUrl) throws IOException {
InputStreamReader reader;
try {
reader = new FileReader(fileUrl);
}
catch (FileNotFoundException e) {
// try a real URL instead
URL url = new URL(fileUrl);
reader = new InputStreamReader (url.openStream());
}
@cneud
cneud / scape-opf-pig-script
Created April 18, 2014 15:55
Pig script for ARC analysis using WarcBase and Tika UDF's
register './warcbase_kb/target/warcbase-0.1.0-SNAPSHOT-fatjar.jar';
raw = load '/tmp/IAH-20080430204825-00000-blackbook.arc.gz' using
org.warcbase.pig.ArcLoader() as (url: chararray, date:chararray, mime:chararray, content:chararray);
a = foreach raw generate url,mime,content,SUBSTRING(date,0,12) as date,org.warcbase.pig.piggybank.DetectMimeType(content) as tikaMime;
b = filter a by (tikaMime == 'text/html');
c = foreach b generate url,mime,tikaMime,date,org.warcbase.pig.piggybank.ExtractRawText(content) as txt;
d = foreach c generate url,mime,tikaMime,date,org.warcbase.pig.piggybank.DetectLanguage(txt) as lang;
e = group d by (lang,date);
@cneud
cneud / eunp-img-conversion.sh
Last active August 29, 2015 14:00
Bash script for image conversion for Europeana Newspapers
#!/bin/bash
# Usage:
# ./eunp-img-conversion.sh input.tif temp.tif output.jp2
# 1. Invoke GraphicsMagick command line to convert master images to uncompressed 150ppi TIF with unsharp mask
# 2. Invoke Kakadu kdu_compress command line to convert uncompressed TIF to JP2000
gm convert $1 -resample 150x150 -unsharp 1.5 -compress None ptif:$2 | kdu_compress -i $2 -o $3 -rate 1.0,0.84,0.7,0.6,0.5,0.4,0.35,0.3,0.25,0.21,0.18,0.15,0.125,0.1,0.088,0.075,0.0625,0.05,0.04419,0.03716,0.03125,0.025,0.0221,0.01858,0.015625 Clevels=6 Stiles=\{1024,1024\} Cmodes=\{BYPASS\} Corder=RLCP Cblk=\{64,64\} -no_palette
@cneud
cneud / eunp-dir-cleaner.bat
Created April 18, 2014 15:11
Batch script for cleaning of data transfer directories in Europeana Newspapers
@ECHO OFF
REM recursively traverse through directories and delete all instances of JPG|PNG|TIF|JP2 image files
CHOICE /C:12345 /M "Really delete all images of type (1) JPG, (2) JP2, (3) TIF, (4) PNG or (5) Cancel?"
IF ERRORLEVEL 5 GOTO Cancel
IF ERRORLEVEL 4 GOTO PNG
IF ERRORLEVEL 3 GOTO TIF
IF ERRORLEVEL 2 GOTO JP2
IF ERRORLEVEL 1 GOTO JPG
GOTO END
:JPG