Skip to content

Instantly share code, notes, and snippets.

View DrDub's full-sized avatar

Pablo Duboue DrDub

View GitHub Profile
DrDub / detag.php
Created June 21, 2023 07:44
Detagging HTML to do pre-trained transformers fine-tuning
// This code Copyright (C) 2023 Textualization Sofware Ltd. is dual
// licensed PHP and LGPLv2.1 and it comes with NO WARRANTIES.
// v1.0
How to use it to obtain text for fine-tuning transformer models
DrDub /
Created December 29, 2022 11:37
Python UIMA-CPP Concept code
# this is a concept file showcasing what a deep Python-UIMACPP could enable
from uima import AnalysisEngine, AnalysisEngineType
from uima.framework import buildPipeline, TypeMapper, SetFeature, Remote
from uima.index import Index, AnnotationIndex
from uima.typesystem.fs import (
digraph trellis {
// start
l_0 [label="0"];
l_1_1 [label="1.1"];
l_1_2 [label="1.2"];
l_1_3 [label="1.3"];
// turn 1
l_0 -> l_1_1 [label="♣"];
l_0 -> l_1_2 [label="♦"];
DrDub /
Created January 5, 2022 22:26
Jupyter multi-label text classification widget, ideal for creating few-shot learning annotations
# Annotator Widget
# Copyright (C) 2022 Pablo Duboue - Licensed under MIT license
# define the following variables beforehand:
# classes = list of strings, three character classes display better
# titles = list of strings, one title per document to be annotated
# texts = list of (list of strings), one list of lines (strings) per document to be annotated
# annotations = [ set() for _ in range(len(texts)) ] # annotations
# current = 0 # current document being displayed
# fulltext = False # whether to show full text or top/bottom
from ipywidgets import widgets
DrDub /
Last active August 30, 2018 01:34

Keybase proof

I hereby claim:

  • I am drdub on github.
  • I am drdub ( on keybase.
  • I have a public key ASDA38Oa7QAhMdpM95jurd0fCFi4giHGKd0u6TmitrYiigo

To claim this, I am signing this object:

DrDub /
Created January 3, 2016 11:44
A file selection class build for ipywidgets without any extra dependencies.
import os
import ipywidgets as widgets
class FileBrowser(object):
def __init__(self):
self.path = os.getcwd()
DrDub / tikiqa.php
Last active December 16, 2015 12:09
The one-file-wonder behind
// this file assumes the augmented and interesting files are in ~/augmented, ~/interesting, respectively and a cloe of Elastica is available in ~/Elastica where ~ is /home/tikiqa
function __autoload ($class) {
$path = str_replace('\\', '/', $class);
if (file_exists('/home/tikiqa/Elastica/lib/' . $path . '.php')) {
require_once('/home/tikiqa/Elastica/lib/' . $path . '.php');
DrDub /
Created April 22, 2013 01:53
Preprocessing of #tikiwiki logs for use with the chat disentangler available at
#!/usr/bin/env python
#converts a gaim chatlog to a more ethical anonymized version
#format of the output is
#[datestamp timestamp] <name> comment
#[datestamp timestamp] *** name action
from random import shuffle
from sys import argv