Skip to content

Instantly share code, notes, and snippets.

@lefth
lefth / ocrpdf.sh
Last active November 3, 2021 15:04 — forked from wcaleb/ocrpdf.sh
Take a PDF, OCR it, and add OCR Text as background layer to original PDF to make it searchable
#!/bin/bash
# NOTE: I recommend pdfsandwich instead of this script, partly because imagemagick (and pdftoppm) fail on large detailed images.
# While that technique does not preserve the original graphics, it can come close.
# To preserve color:
# pdfsandwich -rgb input.pdf
# To preserve grey tones:
# pdfsandwich -gray input.pdf
# To disable all preprocessing:
# pdfsandwich -nopreproc input.pdf
@lefth
lefth / testcase.p6
Created January 23, 2018 06:14
Concurrent Path operations Windows error
#!/usr/bin/env perl6
use Test;
my IO::Path $root-dir = ".".IO.absolute.IO;
sub setup {
mkdir 'files/40001/20006/10002';
'files/40001/20006/10002/0111'.IO.open(:w).close;
mkdir 'files/40009/20032';
#!/usr/bin/env perl6
# I run this in a directory with 30k files.
# I suggest using a SSD if you have one, for your sanity.
use v6;
sub gather-files(Str $start-path --> List) {
# Gather a list of all files. Run this is a huge directory.
my $files = Channel.new;