Skip to content

Instantly share code, notes, and snippets.

View dmgig's full-sized avatar
🏠

Dave M. Giglio dmgig

🏠
View GitHub Profile
@dmgig
dmgig / ocr.py
Created September 21, 2021 01:27
OCR for Document Search
#!/usr/bin/python3
import os
import sys
from datetime import datetime
import pytz
import logging
import ohhocr
import getopt
import argparse
@dmgig
dmgig / main.py
Created September 12, 2021 00:08
Images to PDF
#!/bin/python3
import os
import sys
import shutil
import glob
from PIL import Image, ImageEnhance
import re
from fpdf import FPDF
@dmgig
dmgig / README.md
Last active September 7, 2021 17:39
Pull YouTube Subtitles

YouTube Subtitles to Pdf

Pull machine-generated transcript files from YouTube, convert them into something more readable, and then save as markdown and pdf files.

Example output: Witness to War YouTube Channel Transcripts.

Pull subtitles, annotations, and description from YouTube using youtube-dl.

You can pull a single video, a playlist, or an entire channel. If doing a channel, I find that using the video tab (producing /videos at the end of the channel url) works best.

@dmgig
dmgig / ddev_and_docker-sync.md
Last active August 19, 2020 14:59
Getting DDEV and docker-sync to work together

Getting DDEV and docker-sync to work together

Here is a basic set up for running DDEV and docker-sync on a Mac.

Previously slow performance is dramatically improved.

I struggled with the documentation a bit, it seemed to show more of the complexities than the basics.

To explain it as I now understand it, with docker-sync we are setting up a new docker volume (our docker-sync volume) which acts as an intermediary between our Mac and the docker container.

@dmgig
dmgig / _ocr2.py
Last active December 3, 2020 11:43
Multithreaded OCR Process with Tesseract, TEXTCLEANER, and imagemagick
#!/usr/bin/python
import os
import sys
import getopt
import subprocess
import time
import pytesseract
import argparse
import cv2
@dmgig
dmgig / README.md
Last active August 31, 2018 16:19
Reduce duplicated images into md5 named image bank with symlinks from original location

Image Banker

Method to de-duplicate a large amount of images contained in a complicated folder structure which must be maintained.

Given a directory containing duplicate jpg images with any folder structure, img-banker.sh will:

  1. Recursively find all jpgs.
  2. Calculate the file's md5 hash.
  3. Copy the file to img-bank directory using the md5 hash as the new name (the "banked image").
  4. Replace the original image with a symlink to the banked image.
@dmgig
dmgig / intro-outro.sh
Last active July 23, 2018 00:19
Append overlapping intro and outro audio to another audio file with sox and ffmpeg
#!/bin/bash
IFS=$'\n'
IntroLen=9
OutroLen=12
OverlapLen=8
IntroSilenceLen=3
OutroSilenceLen=10
@dmgig
dmgig / FineReader.txt
Last active September 14, 2022 15:58
Abbyy FineReader Applescript Dictionary
You can choose File > Open Dictionary in Script Editor to examine the scripting dictionary of a scriptable application or scripting addition on your computer. Or you can drag an application icon to the Script Editor icon to display its dictionary (if it has one). You can also open scripting dictionaries in Xcode.
https://developer.apple.com/library/content/documentation/AppleScript/Conceptual/AppleScriptX/Concepts/work_with_as.html#//apple_ref/doc/uid/TP40001568-1153006
# finereader
finereader n :
get languages count v : Returns the number of recognition languages.
get languages count
@dmgig
dmgig / readme.json
Created December 7, 2017 15:24
Sample D3 Data for Chord Diagram (https://bl.ocks.org/mbostock/1046712)
[
{"name":"flare.analytics.cluster.AgglomerativeCluster","size":3938,"imports":["flare.animate.Transitioner","flare.vis.data.DataList","flare.util.math.IMatrix","flare.analytics.cluster.MergeEdge","flare.analytics.cluster.HierarchicalCluster","flare.vis.data.Data"]},
{"name":"flare.analytics.cluster.CommunityStructure","size":3812,"imports":["flare.analytics.cluster.HierarchicalCluster","flare.animate.Transitioner","flare.vis.data.DataList","flare.analytics.cluster.MergeEdge","flare.util.math.IMatrix"]},
{"name":"flare.analytics.cluster.HierarchicalCluster","size":6714,"imports":["flare.vis.data.EdgeSprite","flare.vis.data.NodeSprite","flare.vis.data.DataList","flare.vis.data.Tree","flare.util.Arrays","flare.analytics.cluster.MergeEdge","flare.util.Sort","flare.vis.operator.Operator","flare.util.Property","flare.vis.data.Data"]},
{"name":"flare.analytics.cluster.MergeEdge","size":743,"imports":[]},
{"name":"flare.analytics.graph.BetweennessCentrality","size":3534,"imports":["flare.animate.Transition
@dmgig
dmgig / batchProcess.scpt
Created November 4, 2017 18:41 — forked from duhaime/batchProcess.scpt
Batch process files with ABBYY FineReader using AppleScript
-- specify input and output directories
set infile_directory to "/Users/doug/Desktop/inputs/"
set outfile_directory to "/Users/doug/Desktop/outputs/"
-- get the basenames of each input file
tell application "System Events"
set infile_list to files of folder infile_directory
end tell
-- process each input file