Skip to content

Instantly share code, notes, and snippets.

View mtreviso's full-sized avatar

Marcos Treviso mtreviso

View GitHub Profile
@erickrf
erickrf / tokenizer.py
Last active March 5, 2023 05:12
Portuguese tokenizer
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from nltk.tokenize import RegexpTokenizer
import argparse
import os
"""
Script for tokenizing Portuguese text according to the Universal Dependencies
(UD) tokenization standards. This script was not created by the UD team; it was
@paniq
paniq / minmaxabssign.txt
Last active June 24, 2024 17:57
useful min/max/abs/sign identities
max(-x,-y) = -min(x,y)
min(-x,-y) = -max(x,y)
abs(x) = abs(-x)
abs(x) = max(x,-x) = -min(x,-x)
abs(x*a) = if (a >= 0) abs(x)*a
(a < 0) -abs(x)*a
// basically any commutative operation
min(x,y) + max(x,y) = x + y