Andrew Janco apjanco

## 1-vision_for_ocr.md

      
              6 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / 1-vision_for_ocr.md
            
            
              Last active
              October 12, 2017 14:05
            
          
    #Google Vision for OCR

This is a step by step guide to using Google Vision to identify and recognize text in document images.  There are lots of ways to OCR, this is just the best method that I have found so far.

The general documentation for Vision can be found here: https://cloud.google.com/vision/
Before using any of the scripts below, you'll need to create a Google Cloud account.  You'll also need to create a project, enable the Vision and Natural Language APIs on that project.
In APIs & services you can create credentials in the credentials tab.  Select service account key, your project name, JSON and then click on create.

A file should download to your machine.  If you get stuck there's more information here: https://cloud.google.com/docs/authentication/api-keys

Move the JSON key to a safe place and remember the path to that file.  I find it helpful to navigate in the terminal
to the directory containing the file and then enter pwd. This will show the location of the file (suc

  
## Variable Fonts.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / Variable Fonts.md
            
            
              Last active
              November 13, 2018 01:00
            
          
    Variable Fonts

http://bit.ly/variable_fonts
Mozilla
Variable Fonts Guide
Firefox Edit Fonts
Examples:

  
## spaCy.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / spaCy.md
            
            
              Created
              March 4, 2019 20:30
            
              
                Workshop proposal for DH2019
              
          
    Introduction to natural language processing for DH research with spaCy - A fast and accessible library that integrates modern machine learning technology.

This half-day tutorial will introduce DH scholars to spaCy, a free and open-source library for text analysis. Developed by Matthew Hannibal and Ines Montari in Berlin, spaCy offers a suite of tools for applied natural language processing (NLP) that are fast, practical and allow for quick experimentation and evaluation of language models. These tools make it possible for individual scholars to quickly train models that can infer customized categories in named entity recognition tasks, match phrases, and visualize model performance. While comparable to the Natural Language Toolkit (NLTK), spaCy offers neural network models, integrated word vectors, dependency parsing and a variety of new features that are not available elsewhere. Participants will learn how to use spaCy for common research tasks in the Digital Humanities and gain an understanding of how

  
## gist:1932be2c9d7adaf6e8f2eae9e9388099
# https://gist.github.com/zupo/5849843
import argparse
import os
import shutil

N = 1000000  # the number of files in seach subfolder folder


def make_files_list(abs_dirname):
    files = []

## dh_budapest_schedule.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / dh_budapest_schedule.md
            
            
              Created
              July 29, 2019 10:56
            
          
    Schedule
survey of features most relevant to work with TEI
9:00-10:45

Intro to spaCy (Andy)

Linguistic Features
Rule-based Matching
NER (w/ pre-trained models)
displacy


comparison of available models


## gist:4d7b2adbe780d7fb7d029da78827933a

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / gist:4d7b2adbe780d7fb7d029da78827933a
            
            
              Last active
              February 27, 2020 15:04
            
          
    https://bit.ly/spacyischool


The Programming Historian
DH 2019 spaCy Workshop

Working with TEI and NLP libraries -- David Lassner's standoff converter
working with Prodigy versus spaCy


Test drive spaCy's entity linking capabilities with Hakluyt's Principal Navigations.


## PH_proposal.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / PH_proposal.md
            
            
              Last active
              April 20, 2020 18:02
            
              
                Proposal for Find all the Places in Text with the World-Historical Gazetteer
              
          
    Programming Historian Lesson Proposal

If you are interested in writing a lesson and submitting it to the Programming Historian, please fill in this form to give the Editorial Board enough detail to comment on your idea. If you are experiencing difficulty with the form you can contact our Managing Editor directly:
English: Anandi Silva Knuppel (anandi.silva.knuppel@emory.edu)
Spanish: Maria José Afanador Llach (mj.afanador28@uniandes.co)
French: Sofia Papastamkou (spapastamkou@gmail.com)
About You


## gunicorn.service
[Unit]
Description=gunicorn daemon
After=network.target

[Service]
User=[your_user, say www-data]
Group=www-data
WorkingDirectory=[path to app directory]
Environment="PATH=[myvenv/bin]"
ExecStart=[myvenv/bin/gunicorn] --access-logfile - --workers 4 -k uvicorn.workers.UvicornWorker --bind unix:/tmp/myapp.sock main:app

## andy.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / andy.md
            
            
              Last active
              July 1, 2020 11:50
            
          
    Andrew Janco Professional Interactions

9 Rich Freedman, presentation at DH2019, regular CRIM project meetings

4 Darin Hayton, regular project meetings, GreekPal

1 Yvette Granata, consultation

3 Kathryne Corbin, consolutation, class instruction on Omeka

2 Jane Chandlee, taught class session on applied NLP

8 Nimisha Ladva, writing program instruction sessions

4 Sarah Watson, S.C Kaplan, regular project meetings, Books of Duchesses

8 Jake Culbertson, regular co-teaching

  
## culbertson.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / culbertson.md
            
            
              Last active
              August 17, 2020 14:56
            
          
    Tuppawaka and Taniwha database
WordPress Site
Dashboard
Palladio
	# https://gist.github.com/zupo/5849843
	import argparse
	import os
	import shutil

	N = 1000000 # the number of files in seach subfolder folder


	def make_files_list(abs_dirname):
	files = []
	[Unit]
	Description=gunicorn daemon
	After=network.target

	[Service]
	User=[your_user, say www-data]
	Group=www-data
	WorkingDirectory=[path to app directory]
	Environment="PATH=[myvenv/bin]"
	ExecStart=[myvenv/bin/gunicorn] --access-logfile - --workers 4 -k uvicorn.workers.UvicornWorker --bind unix:/tmp/myapp.sock main:app