Skip to content

Instantly share code, notes, and snippets.

@apjanco
apjanco / flujo-de-trabajo.md
Created February 20, 2024 17:03
flujo-de-trabajo

Flujo de trabajo iterativo de eScriptorium

Flujo de trabajo iterativo:

  • Muestra: seleccione un lote de páginas de documentos (digamos 100). El corpus debe reflejar tipos comunes de documentos de la colección. Dividir en conjuntos de tren y prueba.
  • Predicción: transcripción automática con el mejor modelo actual usando Trainer (¿comenzando con Vision o Araucania?)
  • Cargar: cargue los archivos de imagen y las transcripciones con Fetcher
  • Corregir los errores en eScriptorium.
  • Ajustar el mejor modelo actual con los nuevos datos.
  • Evaluar la mejora utilizando datos de prueba. Genere métricas de error de caracteres de palabras y tasa de error de palabras.
  • Evaluar transcripciones de modelos para tareas de investigación. Registre los problemas y áreas que requieren mejora.

eScriptorium Iterative Workflow

Iterative workflow:

  • Sample: Curate a batch of document pages (say 100). The corpus should reflect common kinds of documents in the collection. Split into train and test sets.
  • Predict: Auto-transcribe with the current best model using Trainer (starting with Vision or Araucania?)
  • Upload: Upload the image files and transcriptions with Fetcher
  • Correct the errors in eScriptorium
  • Fine-tune the current best model on the new data
  • Assess improvement using test data. Generate word character error and word error rate metrics.
  • Evaluate model transcriptions for research tasks. Record issues and areas that require improvement.
title: "Demo Project Workflow. From images to research data in Obsidian"
description: >
This project offers a workflow to process historical documents from the Circuit Court of Istmina, Chocó, Colombia.
https://eap.bl.uk/project/EAP1477
In this project, we will:
- Fetch the IIIF Images and metadata from the British Library
- Segment the images with Kraken
- Transcribe the images using Google Vision
- Upload the images to eScriptorium where the transcriptions can be corrected
Log in here with pennkey (no @upenn.edu)
use pennkey password
https://rdp-lab.library.upenn.edu/maps
In the remote computer, open the browser and search for ExcelAlmaLookup, is should give you this ulr:
https://github.com/pulibrary/ExcelAlmaLookup/#readme
In the readme find the link to download the exe file.
Tinker with Windows to download and open the exe file to install the app.

How to create a tiny server

September 2023

Overview

This is a tutorial on how to create a local web server to serve static websites. We will re-purpose a wifi router to serve data over wifi to the browser on local devices such as phones and tablets. This is a great way to share a digital archive with people in locations with limited internet access.

At the end of this tutorial we will have:

  • A working wifi router running OpenWRT (Linux)
  • A static website with search using PageFind

Care and Feeding of the Bridge Server

Here are the steps currently needed to fix texts.py after a new text is imported.

  • The main problem is that textFileDict (a dictionary that connects the human readable titles of texts to the Python file for that text either in data/Greek or data/Latin) gets deleted
  • textFileDict is needed for the selection of text sections.

Good housekeeping

  1. To connect to the Bridge server, open terminal and enter: ssh bmulliga@64.227.97.179
  2. The server is just like any other computer. You need to keep it up to date so that people can't hack into it and use it to mine bitcoin. Whenever you log in, it's a good practice to run two commands: sudo apt update to update the computer and sudo apt upgrade.

⚡FastAPI

Introduction

This tutorial is a quick introduction to FastAPI, which is a simple Python web-framework for creating REST APIs, static HTML pages and many other web applications. Sebastián Ramírez, the creator of FastAPI has excellent documentation and gitter forum.

FastAPI, in many respects, is an updated version of Flask. It's built with the features and capabilities of Python3 in mind, particularly type hints for data validation. It also embraces asyncronous functions and other features of modern web design.

In the following sections, I'll share several use cases for FastAPI. I am particularly fond of FastAPI as a general toolkit that can be used for building simple static HTML or serving advanced machine learning models. It's minimal and simple, but capable of growing as your project evolves and becomes more complex.

@apjanco
apjanco / confusion.md
Last active November 3, 2021 13:31
corpus/snippets/document_result.html

I am currently working on issue 291 to highligh the search query in the search results. Search uses DocumentSearchView, which has a get_queryset method that returns the results and renders corpus/document_list.html

The result's description is line 27-28 in document_result.html

        {# description #}
        <p class="description">{{ document.description.0|truncatewords:25 }}</p>

Given that document.description is just a string, the simplest solution would be to add mark tags around the query in the description.