Skip to content

Instantly share code, notes, and snippets.

@davidefiocco
davidefiocco / runlength.ex
Created June 8, 2017 21:53
Conor, Fernando, Davide' s solution for run length encoder/decoder at github.com/marjaimate/runlength
# Conor, Fernando, Davide + wisdom of the crowd solution for
# https://github.com/marjaimate/runlength
defmodule Runlength do
def encode(string) do
encode(string, "")
end
def encode("", acc) do
acc
'<?xml version="1.0" encoding="UTF-8"?>\n<TEI xmlns="http://www.tei-c.org/ns/1.0" \nxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" \nxsi:schemaLocation="http://www.tei-c.org/ns/1.0 /opt/grobid/grobid-home/schemas/xsd/Grobid.xsd"\n xmlns:xlink="http://www.w3.org/1999/xlink">\n\t<teiHeader xml:lang="fr">\n\t\t<encodingDesc>\n\t\t\t<appInfo>\n\t\t\t\t<application version="0.5.1-SNAPSHOT" ident="GROBID" when="2018-03-20T16:03+0000">\n\t\t\t\t\t<ref target="https://github.com/kermitt2/grobid">GROBID - A machine learning software for extracting information from scholarly documents</ref>\n\t\t\t\t</application>\n\t\t\t</appInfo>\n\t\t</encodingDesc>\n\t\t<fileDesc>\n\t\t\t<titleStmt>\n\t\t\t\t<title level="a" type="main">Manuscript Title Author Name 1 2</title>\n\t\t\t</titleStmt>\n\t\t\t<publicationStmt>\n\t\t\t\t<publisher/>\n\t\t\t\t<availability status="unknown"><licence/></availability>\n\t\t\t</publicationStmt>\n\t\t\t<sourceDesc>\n\t\t\t\t<biblStruct>\n\t\t\t\t\t<analytic>\n\t\t\t\t\t\t<title level="a"
@davidefiocco
davidefiocco / wdiffedited.txt
Created March 25, 2018 10:12
Checking wdiff output on a long text for https://stackoverflow.com/questions/49466417/how-can-i-invert-the-output-of-wdiff. To run execute `wdiff wdifforig.txt wdiffedited.txt > wdiffresult.txt` using wdiff from the command line.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In euismod nisl vel tortor dignissim porttitor. Cras vitae auctor diam, sed bibendum magna. Mauris ut ligula tempus, ullamcorper sapien rhoncus, venenatis nisi. Suspendisse scelerisque, dolor eget vestibulum iaculis, tellus turpis eleifend libero, eget scelerisque purus risus vel magna. Nunc sit amet leo luctus, placerat ligula sit amet, elementum ipsum. Mauris consequat nibh vitae erat posuere ultrices. Morbi quis sem ac nisi porta dapibus. Etiam scelerisque non orci non luctus.
Duis in risus quis nibh laoreet ornare ut in urna. Vivamus at turpis egestas, tempor nibh viverra, tristique metus. Aliquam interdum tristique sapien eu interdum. Sed non est efficitur, placerat turpis ultrices, tincidunt nibh. Phasellus nibh arcu, feugiat in arcu quis, egestas blandit velit. Morbi convallis feugiat magna. Etiam iaculis dui faucibus nisl congue efficitur. Aliquam erat volutpat. Proin vitae leo suscipit, lacinia nisi elementum, gravida diam. Suspendisse ornare,
@davidefiocco
davidefiocco / GetGrammarContributors.sql
Last active March 25, 2018 17:27
Get Wikipedia contributors for grammar-related edits from https://bigquery.cloud.google.com/
# One can check contributions/edits at
# https://en.wikipedia.org/w/index.php?limit=50&title=Special%3AContributions&contribs=user
# profile pages are at
# https://en.wikipedia.org/wiki/User:USERNAME
SELECT contributor_username, COUNT(id) AS counts
FROM [bigquery-public-data:samples.wikipedia]
WHERE comment LIKE '%grammar%'
GROUP BY contributor_username
ORDER BY counts DESC LIMIT 10;
@davidefiocco
davidefiocco / gist:eb7e85f64b6dada46aa92bc19d3fc99c
Created May 24, 2018 09:02
Silly insertion of new line breaks in Google spreadsheet
Suppose that we want to break
{Hello}, {World}
into
{Hello},
{World}
do
= SUBSTITUTE(B2, "}, {", CONCATENATE("'} ", char(10), "{"))
@davidefiocco
davidefiocco / imagenet_poppins.py
Created February 16, 2019 23:30
Reorganize image files with imagenet style folder hierarchy
import numpy as np
import shutil
from sklearn.model_selection import train_test_split
cats = ['negative', 'positives']
for cat in cats:
print(cat)
if not os.path.exists(data_folder + "/train/" + cat):
os.makedirs(data_folder + "/train/" + cat)
import prodigy
from prodigy.components.loaders import Images
from prodigy.util import split_string
def add_label_to_stream(stream, label):
for eg in stream:
# The 'label' you get from the command line is a list
# so let's just assume it's always one and take the first
eg["label"] = label[0]
yield eg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@davidefiocco
davidefiocco / ImdbBERTInterpretation.ipynb
Last active March 31, 2020 22:18
Trying captum interpretation on pretrained sentiment classifier
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.