Skip to content

Instantly share code, notes, and snippets.

View alix-tz's full-sized avatar
⚙️

Alix Chagué alix-tz

⚙️
View GitHub Profile
@alix-tz
alix-tz / hocr_to_kraken_transcribe.xsl
Created March 11, 2019 16:28 — forked from PonteIneptique/hocr_to_kraken_transcribe.xsl
XSL for transforming (need Saxon-EE > 9.8) HOCR from tesseract to transcribing file for Kraken (à la ketos prefill)
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/"
xmlns:my="foo.bar"
exclude-result-prefixes="xs my saxon uuid"
xpath-default-namespace="http://www.w3.org/1999/xhtml"
version="2.0"
xmlns:uuid="java:java.util.UUID">
@alix-tz
alix-tz / cremma-print-badges-chars.json
Created August 26, 2021 12:43 — forked from PonteIneptique/cremma-print-badges-chars.json
This gist only exists in order to feed Badges of the CREMMA HTR-United badges
{"schemaVersion":1,"label":"Transcribed Characters","message":"83728","color":"informational","style":"flat-square"}

Règles générales de transcription pour les corpus CREMMA

Les corpus CREMMA désignent un ensemble de corpus de vérité de terrain produit dans le cadre du projet CREMMA (Consortium pour la Reconnaissance des Écritures Manuscrites des Matériaux Anciens).

@alix-tz
alix-tz / lexical_exploration_duvalais.py
Created November 23, 2023 16:07
Python script to explore the lexical variety of the French Du Valais Recensement
import os
import unicodedata
from collections import Counter
from spacy.lang.fr import French
from tqdm import tqdm
import pandas as pd
import lxml.etree as ET