Skip to content

Instantly share code, notes, and snippets.

View cneud's full-sized avatar
🐙

Clemens Neudecker cneud

🐙
View GitHub Profile
@cneud
cneud / blm_extract.py
Last active October 26, 2023 11:15
blm_extract.py
import click
import json
import requests
import os
from tqdm import tqdm
def get_text(obj, sub_entries=None):
if sub_entries is None:

Kommandozeilenaufrufe für CASDMIT 2022 Modul OCR

Ggf. anpassen der Bildschirmauflösung in der Virtuellen Maschine

xrandr --output VGA-1 --mode 1280x800

(1280x800 durch gewünschte Bildschirmauflösung ersetzen)

Installation des Texteditors sublime

@cneud
cneud / wintess.bat
Created February 25, 2021 16:09
Windows batch processing with Tesseract (because I always forget)
:Start
@Echo off
Set _SourcePath=C:\path\to\images\*.tif
Set _OutputPath=C:\path\to\output\
Set _Tesseract=C:\path\to\tesseract\tesseract.exe
Set _TesseractLang=lang
Set _TesseractOutputFormat=alto
:Convert
For &&A in (%_SourcePath%) Do Echo Processing %%A...&%_Tesseract% -l %_TesseractLang% %%A %_OutputPath%\%%~nA %_TesseractOutputFormat%
:End
@cneud
cneud / ocrd.js
Last active February 26, 2021 16:57
simple-keyboard with OCR-D special characters (Unicode subset)
const ocrd = {
default: [
"\uF1AC \u00AD \u00AC \u00BD \u00C0 \u00C3 \u00C4 \u00C6 \u00E0 \u00E3 \u00E4 \u00E6 \u0101 \u023A \u2C65 \uE42C",
"\uEFA1 \uF500 \uF532 \u0253 \uF524 \u00C7 \u00E7 \u0107 \uEEC4 \uEEC5 \uF501 \uF502 \uF517 \uF520 \uF522 \uF531",
"\uF50A \uF51B \u00C8 \u00C9 \u00CB \u00E8 \u00E9 \u00EB \u0113 \u0118 \u0119 \u0256 \u0247 \u1EBD \u204A \uE4E1",
"\uF158 \uF219 \uF515 \uFB00 \uFB01 \uFB02 \uFB03 \uA7A0 \uA7A1 \uF504 \uF505 \uF506 \uF521 \uF525 \u00CD \u00ED",
"\u00EF \u0129 \u012B \u0133 \uA76D \uF220 \uF533 \uEBE3 \uA742 \uA743 \uA7A2 \uA7A3 \u0141 \u0142 \uF4F9 \uF50B",
"\uE5B8 \uF519 \u00D1 \u00F1 \uA7A4 \uA7A5 \uE1DC \uE5DC \u00D2 \u00D5 \u00D6 \u00D8 \u00F2 \u00F5 \u00F6 {shift}"
],
shift: [
// jQuery arrow keys + escape key + enter key
$(document).keydown(function(e) {
switch(e.which) {
case 13: // enter
break;
case 27: // escape
break;
@cneud
cneud / gcvocr.py
Created May 11, 2019 00:38
Google Cloud Vision OCR Python
#!/usr/bin/env python
# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
@cneud
cneud / pdf2tif.bat
Created August 22, 2018 16:29
PDF to TIF conversion for OCR on Windows (using imagemagick & ghostscript)
convert -density 300 -depth 8 -alpha Off -limit area 1 foo.pdf foo_%04d.tif
@cneud
cneud / hocr2text.bat
Created August 22, 2018 16:28
hocr to text conversion on Windows
FOR /R %%G IN (*.hocr) DO java -jar saxon9he.jar -s:"%%G" -xsl:hocr2text.xsl -o:"%%~nG.txt"
@cneud
cneud / ocros.md
Last active August 23, 2019 15:56
@cneud
cneud / voyant_this.html
Last active November 22, 2017 20:52
Voyant embedder
<html>
<head>
<title>Voyant This!</title>
<style type=text/css>
body {
text-align: center;
}
input[value] {
font-family: Verdana;