Skip to content

Instantly share code, notes, and snippets.

@gpiat
gpiat / 01_Prolog_Intro.pl
Created November 25, 2022 18:51
An introduction to CSPs (Constraint Satisfaction Problems) in Prolog
/* Created by Guilhem Piat, 2022
View the interactive notebook version of this guide here:
https://swish.swi-prolog.org/p/intro_to_pl.swinb
##### INTRODUCTION TO PROLOG FOR CSPs #####
Prolog is a logic programming language that is typically meant to
solve logic problems, but can also easily be used as a CSP solver.
To use Prolog, you may use the online editor/console SWISH:
@gpiat
gpiat / aircc.cls
Last active July 19, 2022 14:19
LaTeX document class for AIRCC-formatted article submissions. Use with LuaLaTex or XeLaTex for fonts to work. Based on code found in amelentev/java-oo.
% ============================================================================
%% aircc.cls V 1.2, 2012/09/13, (c) 2012 Thomas Zink
%% minor modifications by Artem Melentyev, 2014 and Guilhem Piat, 2022.
%%
%% This is an unofficial Latex class for Authors of AIRCC Papers.
%% It tries to follow the formating guidelines set in the official
%% template "aircc_template.doc" as close as possible.
%% Unfortunately, some are not easily applicable in Latex. Examples include
%% text style combinations like bold italicized small caps which simply are
%% generally not supported. Some font sizes are also not directly supported.
@gpiat
gpiat / .ater_blacklist
Last active April 20, 2022 16:08
Get the current listing of TA job offers in France from government website and filter out non-CS-related/outdated ones
0020087J
0141408E
0141720U
0170145R
0171463Y
0211237F
0211139Z
0290119X
0290346U
0310152X
@gpiat
gpiat / get_books_and_wiki_corpora.md
Last active November 19, 2021 15:52
This document describes how I acquire plaintext versions of the books and wikipedia corpora.

How to: get books corpus and wikipedia corpus

As specified by the authors, the books corpus needs to be downloaded from smashwords. However, there is no easy download option, it seems that it needs to be scraped.

The Wikipedia dataset can be downloaded from Wikimedia but only as XML.

Huggingface makes these datasets available, making it easier to acquire them.

The steps are as follow:

@gpiat
gpiat / IOB2_to_IOBES.py
Created February 2, 2021 17:35
Convert IOB2 file to IOBES format. Usage: `python IOB2_to_IOBES.py /path/to/input/file` creates an appropriately named output file in the same directory as the input file.
from sys import argv
from os import path
def parse_line(line):
if line is None:
return None, None, None
semtype = None
tok, tag = line
if tag == 'O':
@gpiat
gpiat / interactive_BELT_training.ipynb
Last active November 19, 2020 16:06
Jupyter notebook for interactively training a BELT model. The necessary pickle files can be found in the comments.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
#!/bin/bash
# find the tsv files here: https://drive.google.com/drive/folders/1kC0I2UGl2ltrluI9NqDjaQJGw5iliw_J
cat NER-de-train.tsv \
| grep -v "^#" | cut -f 2,3 | tr '\t' ' ' > train.txt.tmp
cat NER-de-dev.tsv \
| grep -v "^#" | cut -f 2,3 | tr '\t' ' ' > dev.txt.tmp
cat NER-de-test.tsv \
| grep -v "^#" | cut -f 2,3 | tr '\t' ' ' > test.txt.tmp
@gpiat
gpiat / convert_PubTator_BIO.py
Created November 10, 2020 16:05
This script converts PubTator corpora like MedMentions to BIO format. At the moment, input files must be MedMentionsCorpus objects serialized with pickle.
import os
import pickle
import sys
def convert_targets(mode, targets):
if mode == 'bin':
return ['O' if t is None else 'M' for t in targets]
elif mode == 'cuid':
return ['O' if t is None else t[0] for t in targets]