Andrew Janco apjanco

## flujo-de-trabajo.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / flujo-de-trabajo.md
            
            
              Created
              February 20, 2024 17:03
            
              
                flujo-de-trabajo
              
          
    Flujo de trabajo iterativo de eScriptorium

Flujo de trabajo iterativo:


Muestra: seleccione un lote de páginas de documentos (digamos 100). El corpus debe reflejar tipos comunes de documentos de la colección. Dividir en conjuntos de tren y prueba.
Predicción: transcripción automática con el mejor modelo actual usando Trainer (¿comenzando con Vision o Araucania?)
Cargar: cargue los archivos de imagen y las transcripciones con Fetcher
Corregir los errores en eScriptorium.
Ajustar el mejor modelo actual con los nuevos datos.
Evaluar la mejora utilizando datos de prueba. Genere métricas de error de caracteres de palabras y tasa de error de palabras.
Evaluar transcripciones de modelos para tareas de investigación. Registre los problemas y áreas que requieren mejora.


## gist:1a4595716b1119d1f247fd4e7ba5e10b

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / gist:1a4595716b1119d1f247fd4e7ba5e10b
            
            
              Last active
              February 20, 2024 16:38
            
              
                workflow.md
              
          
    eScriptorium Iterative Workflow

Iterative workflow:


Sample: Curate a batch of document pages (say 100). The corpus should reflect common kinds of documents in the collection. Split into train and test sets.
Predict: Auto-transcribe with the current best model using Trainer (starting with Vision or Araucania?)
Upload: Upload the image files and transcriptions with Fetcher
Correct the errors in eScriptorium
Fine-tune the current best model on the new data
Assess improvement using test data. Generate word character error and word error rate metrics.
Evaluate model transcriptions for research tasks. Record issues and areas that require improvement.


## gist:aa9368f019acffdc39f1a6cfa8a98060
title: "Demo Project Workflow. From images to research data in Obsidian"
description: >
  This project offers a workflow to process historical documents from the Circuit Court of Istmina, Chocó, Colombia.
  https://eap.bl.uk/project/EAP1477

  In this project, we will:
  - Fetch the IIIF Images and metadata from the British Library
  - Segment the images with Kraken
  - Transcribe the images using Google Vision
  - Upload the images to eScriptorium where the transcriptions can be corrected

## gist:9bf54ac33ce4149a05c65e439420b61f
Log in here with pennkey (no @upenn.edu)
use pennkey password
https://rdp-lab.library.upenn.edu/maps

In the remote computer, open the browser and search for ExcelAlmaLookup, is should give you this ulr:
https://github.com/pulibrary/ExcelAlmaLookup/#readme

In the readme find the link to download the exe file.
Tinker with Windows to download and open the exe file to install the app.

## instructions.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / instructions.md
            
            
              Created
              October 3, 2023 00:42
            
          
    How to create a tiny server 

September 2023
Overview

This is a tutorial on how to create a local web server to serve static websites. We will re-purpose a wifi router to serve data over wifi to the browser on local devices such as phones and tablets.  This is a great way to share a digital archive with people in locations with limited internet access.
At the end of this tutorial we will have:

A working wifi router running OpenWRT (Linux)
A static website with search using PageFind


## gist:fad2c90fcb7fdaa1d49d4766ba43e133

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / gist:fad2c90fcb7fdaa1d49d4766ba43e133
            
            
              Last active
              August 11, 2023 18:56
            
              
                Think Tank -- Text Analysis August 11, 2023
              
          
Finding Places in Text
Russian and NLP
BookNLP
Prozhito / data
Nomic

Russian diaries


also Russian diaries


## UpennWSK.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / UpennWSK.md
            
            
              Created
              February 10, 2023 21:34
            
          
    App URL: https://nexis.pennds.org/UpennWSK/homepage/
Repo: https://github.com/upenn-libraries/lexis-wsk
To run the app
docker-compose up
To access logs
docker logs app

  
## update_bridge.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / update_bridge.md
            
            
              Last active
              March 26, 2022 19:45
            
          
    Care and Feeding of the Bridge Server

Here are the steps currently needed to fix texts.py after a new text is imported.


The main problem is that textFileDict (a dictionary that connects the human readable titles of texts to the Python file for that text either in data/Greek or data/Latin) gets deleted
textFileDict is needed for the selection of text sections.

Good housekeeping

To connect to the Bridge server, open terminal and enter: ssh bmulliga@64.227.97.179
The server is just like any other computer. You need to keep it up to date so that people can't hack into it and use it to mine bitcoin. Whenever you log in, it's a good practice to run two commands: sudo apt update to update the computer and sudo apt upgrade.


## fastapi.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / fastapi.md
            
            
              Created
              January 7, 2022 19:22
            
          
    ⚡FastAPI

Introduction

This tutorial is a quick introduction to FastAPI, which is a simple Python web-framework for creating REST APIs, static HTML pages and many other web applications. Sebastián Ramírez, the creator of FastAPI has excellent documentation and gitter forum.
FastAPI, in many respects, is an updated version of Flask. It's built with the features and capabilities of Python3 in mind, particularly type hints for data validation.  It also embraces asyncronous functions and other features of modern web design.
In the following sections, I'll share several use cases for FastAPI.  I am particularly fond of FastAPI as a general toolkit that can be used for building simple static HTML or serving advanced machine learning models.  It's minimal and simple, but capable of growing as your project evolves and becomes more complex.

  
## confusion.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                apjanco
                / confusion.md
            
            
              Last active
              November 3, 2021 13:31
            
              
                corpus/snippets/document_result.html
              
          
    I am currently working on issue 291 to highligh the search query in the search results.
Search uses DocumentSearchView, which has a get_queryset method that returns the results and renders corpus/document_list.html
The result's description is line 27-28 in document_result.html
        {# description #}
        <p class="description">{{ document.description.0|truncatewords:25 }}</p>
Given that document.description is just a string, the simplest solution would be to add mark tags around the query in the description.
	title: "Demo Project Workflow. From images to research data in Obsidian"
	description: >
	This project offers a workflow to process historical documents from the Circuit Court of Istmina, Chocó, Colombia.
	https://eap.bl.uk/project/EAP1477

	In this project, we will:
	- Fetch the IIIF Images and metadata from the British Library
	- Segment the images with Kraken
	- Transcribe the images using Google Vision
	- Upload the images to eScriptorium where the transcriptions can be corrected
	Log in here with pennkey (no @upenn.edu)
	use pennkey password
	https://rdp-lab.library.upenn.edu/maps

	In the remote computer, open the browser and search for ExcelAlmaLookup, is should give you this ulr:
	https://github.com/pulibrary/ExcelAlmaLookup/#readme

	In the readme find the link to download the exe file.
	Tinker with Windows to download and open the exe file to install the app.