Skip to content

Instantly share code, notes, and snippets.

View elotroalex's full-sized avatar
💭
Working on a minimal academic stack

Alex Gil elotroalex

💭
Working on a minimal academic stack
View GitHub Profile
@elotroalex
elotroalex / get-url-list-from-html.js
Created October 26, 2023 14:38
Get a list of URLs from a page
console.log([...new Set([...document.querySelectorAll('a')].filter(a=>a.href.match('pdf')).map(a=>a.href.replace(/\/[^\/]+\.pdf.*$/,'')))].join('\n'))
console.log([...new Set([...document.querySelectorAll('a')].filter(a=>a.href.match('155-publications')).map(a=>a.href.replace(/155\-publications.*$/,'155-publications')))].join('\n'))
county,link,name,state
Indianapolis,http://www.hspa.com/,Hoosier State Press Association,Indiana
Anderson,http://www.heraldbulletin.com/,Anderson Herald Bulletin,Indiana
Muncie,http://www.dailynews.bsu.edu/,Ball State Daily News,Indiana
Greencastle,http://www.bannergraphic.com/,Banner-Graphic,Indiana
Bloomington,http://www.indepen.com/,Bloomington Independent,Indiana
Nashville,http://www.browncountyindiana.com/,Brown County Democrat,Indiana
Chesterton,http://chestertontribune.com/,Chesterton Tribune,Indiana
Connersville,http://www.connersvillein.com/news-examiner/,Connersville News-Examiner,Indiana
Corydon,http://www.corydondemocrat.com/,The Corydon Democrat,Indiana
@elotroalex
elotroalex / non-euro-tei.md
Last active April 10, 2023 22:44
A list of projects and articles on TEI in Non-European Languages
@elotroalex
elotroalex / gist:8e2773a7d9850cb9177b
Created September 27, 2014 15:29
African Diaspora 2.0 Links
Curatescape | A web and mobile app framework for curating the landscape
http://curatescape.org/
Amara - Caption, translate, subtitle and transcribe video.
http://amara.org/en/
Pop Up Archive
https://www.popuparchive.com/
The Mulka Project
@elotroalex
elotroalex / i18n.feature
Created March 14, 2012 18:09
cucumber test for i18n
Feature: Check language
In order to test the selected language is right
As a cosmopolitan developer
I want to make sure the right words are present in the header
Scenario: English pages
Given the language is set to English
When I visit the homepage
Then the header should have the words 'Prism is a tool'
@elotroalex
elotroalex / wordsToChars.py
Created March 10, 2012 07:39
Necessary step for the Google-diff to work. The original code was meant for lines. This revision creates an array of words. The major challenge for me was incorporating the regex to iterate beyond the \n.
import re
def diff_wordsToChars(text1, text2):
"""Split two texts into an array of words. Reduce the texts to a string
of hashes where each Unicode character represents one word.
Args:
text1: First string.
text2: Second string.
@elotroalex
elotroalex / iter-diff.py
Created March 7, 2012 14:40
An iterative diff
import diff
def diff_wordMode(text1, text2):
dmp = diff.diff_match_patch()
a = dmp.diff_linesToWords(text1, text2)
lineText1 = a[0]
lineText2 = a[1]
lineArray = a[2]
diffs = dmp.diff_main(lineText1, lineText2)
@elotroalex
elotroalex / diff_match_patch.py
Created March 6, 2012 14:06
the Google diff
#!/usr/bin/python2.4
from __future__ import division
"""Diff Match and Patch
Copyright 2006 Google Inc.
http://code.google.com/p/google-diff-match-patch/
Licensed under the Apache License, Version 2.0 (the "License");