Skip to content

Instantly share code, notes, and snippets.

@GuyAglionby
GuyAglionby / record_adsbexchange_data.py
Created May 29, 2023 17:12
Pull data out from the ADS-B Exchange feed client
import json
import sqlite3
from time import sleep
SCHEMA = """CREATE TABLE IF NOT EXISTS planes (
hex TEXT NOT NULL,
unix_time INT NOT NULL,
flight TEXT,
category TEXT);
@GuyAglionby
GuyAglionby / aclanthology_duplicate_videos.py
Created October 18, 2022 12:38
Remove duplicate video entries from ACL Anthology data
import glob
import itertools
from lxml import etree
from networkx.utils import UnionFind
from tqdm import tqdm
def elems_same(elem1, elem2):
return elem1.attrib == elem2.attrib
import pandas as pd
import re
from datetime import datetime
from collections import defaultdict, OrderedDict
import yaml
re_session_extract = re.compile(r'\w+ (\w+) (\d+), (\d+) (\d+\w) ([\w\d\s:\-.,()]+-\d+) (\d+):(\d\d) UTC(.*)')
def extract_date(x):
@GuyAglionby
GuyAglionby / papers.csv
Last active June 12, 2020 20:57
(incomplete) MiniConf papers.csv for ACL 2020
We can't make this file beautiful and searchable because it's too large.
UID,title,authors,abstract,keywords,session
1,Learning to Understand Child-directed and Adult-directed Speech,Lieke Gelderloos|Grzegorz Chrupała|Afra Alishahi,"Speech directed to children differs from adult-directed speech in linguistic aspects such as repetition, word choice, and sentence length, as well as in aspects of the speech signal itself, such as prosodic and phonemic variation. Human language acquisition research indicates that child-directed speech helps language learners. This study explores the effect of child-directed speech when learning to extract semantic information from speech directly. We compare the task performance of models trained on adult-directed speech (ADS) and child-directed speech (CDS). We find indications that CDS helps in the initial stages of learning, but eventually, models trained on ADS reach comparable task performance, and generalize better. The results suggest that this is at least partially due to linguistic rather than acoustic properties of the two registers, as we s
@GuyAglionby
GuyAglionby / miniconf_paper_convert.py
Last active June 12, 2020 21:13
Convert csv with columns (title, author, paper type) to something that MiniConf will take in
import pandas as pd
import re
from typing import List
re_author_split = re.compile(' and |, ')
re_curly_brace = re.compile('{([A-Za-z0-9 ]+)}')
acceptable_chars = '[\'`\/:\-()?\w\s\d.,]+'
re_newline = re.compile('[ ]*\n[ ]*')
re_inline_italics = re.compile(r'{\\(?:em|it) (' + acceptable_chars + ')}')
@GuyAglionby
GuyAglionby / fix-abstracts.py
Created December 27, 2019 16:42
Downloads ACL PDFs, extracts abstracts, and put them into the XML used to build the ACL Anthology (cf https://github.com/acl-org/acl-anthology/issues/714). Slightly more robust than previous version (relies less on pdfquery). It's still not perfect; manual verification of changes is required. You'll need a slightly modified version of pdfquery …
import pdfquery
import re
from lxml import etree as ET
import urllib.request
import urllib.error
from collections import Counter
import random
import numpy as np
import fileinput
import multiprocessing
@GuyAglionby
GuyAglionby / fix-ws-abstracts.py
Last active December 23, 2019 15:07
Downloads ACL PDFs, extracts abstracts, and put them into the XML used to build the ACL Anthology (cf https://github.com/acl-org/acl-anthology/issues/714)
import pdfquery
import re
from lxml import etree as ET
import urllib.request
import urllib.error
# https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt
with open('words_alpha.txt', 'r') as f:
words = set([x.strip() for x in f])
words.add('embeddings')