Skip to content

Instantly share code, notes, and snippets.

@xflr6
xflr6 / glottolog.ipynb
Last active December 17, 2018 20:34
Glottolog with Python
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@xflr6
xflr6 / feedsizes.py
Last active December 18, 2021 00:13
Compare RSS feed enclosure length with content-length header of file when downloading the URL
"""Compare feed enclosure length with content-length of file url."""
import urllib.request
import xml.etree.ElementTree as etree
URL = 'https://feeds.feedburner.com/thebuglefeed?format=xml'
with urllib.request.urlopen(URL) as f:
tree = etree.parse(f)
@xflr6
xflr6 / pl_pgsql.py
Last active December 22, 2021 23:42
SQL injection safe dynamic query execution via PL/pgSQL quote_ident() and format('%I')
"""SQL-injection safe dynamic query with pl/pgsql."""
import sqlalchemy as sa
UNIQUE_NULL = [('contributioncontributor', ['contribution_pk', 'contributor_pk'], []),
('contributionreference', ['contribution_pk', 'source_pk', 'description'], []),
('editor', ['dataset_pk', 'contributor_pk'], []),
('languageidentifier', ['language_pk', 'identifier_pk'], []),
('languagesource', ['language_pk', 'source_pk'], []),
('sentencereference', ['sentence_pk', 'source_pk', 'description'], []),
@xflr6
xflr6 / common_prefix.py
Last active December 22, 2021 23:43
Case-insensitive longest common prefix of two strings
"""Longest common prefix."""
import itertools
def common_prefix(left: str, right: str) -> str:
"""Return the case-insensitive longest common prefix of two strings.
>>> common_prefix('spam', 'spameggs')
'spam'
@xflr6
xflr6 / langdoc_csv.py
Last active January 8, 2022 14:06
Download and combine https://glottolog.org/glottolog/language.csv parts using pandas
"""Combine https://glottolog.org/langdoc.csv parts."""
import urllib.parse
import pandas as pd
ENDPOINT = urllib.parse.urlparse('https://glottolog.org/langdoc.csv')
QUERY = {'sEcho': 1,
'iSortingCols': 1,
@xflr6
xflr6 / languoids_csv.py
Last active January 8, 2022 14:08
Dump basic https://glottolog.org languoid info into CSV file
"""Dump basic https://glottolog.org languoid info to CSV file."""
import pandas as pd
ENGINE = 'postgresql://postgres@/glottolog3'
QUERY = '''
SELECT
l.id AS glottocode,
l.name,
@xflr6
xflr6 / Wikidata.ipynb
Last active May 22, 2022 09:56
Check Glottolog -> Wikidata mapping
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@xflr6
xflr6 / ControlCharacters.ipynb
Last active May 22, 2022 09:56
Drop Glottolog bibfiles for control characters
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@xflr6
xflr6 / Pandas_read_sparql_query.ipynb
Last active May 22, 2022 09:57
Read pandas.DataFrame from SPARQL query
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@xflr6
xflr6 / rdfbuild.py
Last active June 4, 2022 07:45
Example building RDF with rdflib
"""Build RDF with rdflib and serialize in turtle format."""
import rdflib
from rdflib.namespace import DCTERMS, RDF, RDFS, SKOS
GOLD = rdflib.Namespace('http://purl.org/linguistics/gold/')
LANGUOID = rdflib.Namespace('http://glottolog.org/resource/languoid/id/')
VOID = rdflib.Namespace('http://rdfs.org/ns/void#')