Skip to content

Instantly share code, notes, and snippets.

View cneud's full-sized avatar

Clemens Neudecker cneud

View GitHub Profile
cneud /
Last active October 26, 2023 11:15
import click
import json
import requests
import os
from tqdm import tqdm
def get_text(obj, sub_entries=None):
if sub_entries is None:

Kommandozeilenaufrufe für CASDMIT 2022 Modul OCR

Ggf. anpassen der Bildschirmauflösung in der Virtuellen Maschine

xrandr --output VGA-1 --mode 1280x800

(1280x800 durch gewünschte Bildschirmauflösung ersetzen)

Installation des Texteditors sublime

cneud / wintess.bat
Created February 25, 2021 16:09
Windows batch processing with Tesseract (because I always forget)
@Echo off
Set _SourcePath=C:\path\to\images\*.tif
Set _OutputPath=C:\path\to\output\
Set _Tesseract=C:\path\to\tesseract\tesseract.exe
Set _TesseractLang=lang
Set _TesseractOutputFormat=alto
For &&A in (%_SourcePath%) Do Echo Processing %%A...&%_Tesseract% -l %_TesseractLang% %%A %_OutputPath%\%%~nA %_TesseractOutputFormat%
cneud / ocrd.js
Last active February 26, 2021 16:57
simple-keyboard with OCR-D special characters (Unicode subset)
const ocrd = {
default: [
"\uF1AC \u00AD \u00AC \u00BD \u00C0 \u00C3 \u00C4 \u00C6 \u00E0 \u00E3 \u00E4 \u00E6 \u0101 \u023A \u2C65 \uE42C",
"\uEFA1 \uF500 \uF532 \u0253 \uF524 \u00C7 \u00E7 \u0107 \uEEC4 \uEEC5 \uF501 \uF502 \uF517 \uF520 \uF522 \uF531",
"\uF50A \uF51B \u00C8 \u00C9 \u00CB \u00E8 \u00E9 \u00EB \u0113 \u0118 \u0119 \u0256 \u0247 \u1EBD \u204A \uE4E1",
"\uF158 \uF219 \uF515 \uFB00 \uFB01 \uFB02 \uFB03 \uA7A0 \uA7A1 \uF504 \uF505 \uF506 \uF521 \uF525 \u00CD \u00ED",
"\u00EF \u0129 \u012B \u0133 \uA76D \uF220 \uF533 \uEBE3 \uA742 \uA743 \uA7A2 \uA7A3 \u0141 \u0142 \uF4F9 \uF50B",
"\uE5B8 \uF519 \u00D1 \u00F1 \uA7A4 \uA7A5 \uE1DC \uE5DC \u00D2 \u00D5 \u00D6 \u00D8 \u00F2 \u00F5 \u00F6 {shift}"
shift: [
// jQuery arrow keys + escape key + enter key
$(document).keydown(function(e) {
switch(e.which) {
case 13: // enter
case 27: // escape
cneud /
Created May 11, 2019 00:38
Google Cloud Vision OCR Python
#!/usr/bin/env python
# Copyright 2017 Google Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
cneud / pdf2tif.bat
Created August 22, 2018 16:29
PDF to TIF conversion for OCR on Windows (using imagemagick & ghostscript)
convert -density 300 -depth 8 -alpha Off -limit area 1 foo.pdf foo_%04d.tif
cneud / hocr2text.bat
Created August 22, 2018 16:28
hocr to text conversion on Windows
FOR /R %%G IN (*.hocr) DO java -jar saxon9he.jar -s:"%%G" -xsl:hocr2text.xsl -o:"%%~nG.txt"
cneud /
Last active August 23, 2019 15:56
cneud / voyant_this.html
Last active November 22, 2017 20:52
Voyant embedder
<title>Voyant This!</title>
<style type=text/css>
body {
text-align: center;
input[value] {
font-family: Verdana;