Skip to content

Instantly share code, notes, and snippets.

View ivyleavedtoadflax's full-sized avatar
🥘

Matt Upson ivyleavedtoadflax

🥘
View GitHub Profile
{
"accountId": null,
"compute": {
"accelerator": "cpu",
"instanceSize": "small",
"instanceType": "c6i",
"scaling": {
"maxReplica": 1,
"minReplica": 1
}
@ivyleavedtoadflax
ivyleavedtoadflax / dvc.yaml
Created October 7, 2022 18:12
DVC issue example
stages:
foo:
cmd: echo '{"f1-score":0.99}' > f1.json
metrics:
- f1.json
stages:
train:
cmd: >-
rasa data validate &&
rasa train --fixed-model-name ./models/model --out ./
params:
- config.yml:
- pipeline
- policies
deps:
@ivyleavedtoadflax
ivyleavedtoadflax / init.vim
Last active September 28, 2020 14:55
Neovim configuration for remote machines
call plug#begin('~/.vim/plugged')
Plug 'jreybert/vimagit'
Plug 'tpope/vim-fugitive'
Plug 'tpope/vim-unimpaired'
Plug 'tpope/vim-sensible'
Plug 'dracula/vim'
Plug 'Vimjas/vim-python-pep8-indent'
Plug 'chrisbra/csv.vim'
Plug 'scrooloose/nerdtree'
@ivyleavedtoadflax
ivyleavedtoadflax / .tmux.conf
Last active October 5, 2020 18:55
Remote .tmux.conf
# Set prefix to capslock
set -g prefix C-b
# Set defaults
set -s escape-time 1
set -g base-index 1
setw -g pane-base-index 1
@ivyleavedtoadflax
ivyleavedtoadflax / find_overlapping_spans.py
Last active February 24, 2020 19:32
Find overlapping spans in prodigy documents
# coding: utf-8
import itertools
from itertools import groupby
from operator import itemgetter
from pprint import PrettyPrinter
import plac
from deep_reference_parser.io import read_jsonl, write_jsonl
.DEFAULT_GOAL := files
MATCH_PATH := s3://datalabs-dev/reach-airflow/output/match_annotated_titles
EVAL_PATH := s3://datalabs-dev/reach-airflow/output/policy-test/evaluation/results
eval = evaluation-results.json
PRODIGY_PATH = s3://datalabs-data/reach_evaluation/data/sync
prodigy = 2019.10.8_valid_TITLE.jsonl \
@ivyleavedtoadflax
ivyleavedtoadflax / bash_file_iterate.sh
Created August 6, 2019 15:19
Iterate through a bash file and do some things
for i in raw/*.json;
do
# Create new filename
filename=$(basename -- "$i")
extension="${filename##*.}"
filename="${filename%.*}"
new_filename=processed/refs_${filename}.txt
@ivyleavedtoadflax
ivyleavedtoadflax / spacy_doc_vectors.py
Last active July 7, 2019 10:35
Get document vectors from spacy
# Need to run:
# python -m spacy download en
# from console first to get the model
import spacy
import pandas as pd
nlp = spacy.load("en")
@ivyleavedtoadflax
ivyleavedtoadflax / customer_tokenizer.py
Last active November 9, 2018 13:28
Custom date tokenizer
from spacy.util import (compile_prefix_regex, compile_infix_regex, compile_suffix_regex)
def _custom_tokenizer(self, nlp, regex=[r"[-/,.\n\s]"]):
"""Custom tokenizer to split date formats like 05-05-2015
and 05/05/2015
"""
# Use the default prefixes and suffixes
prefix_re = compile_prefix_regex(nlp.Defaults.prefixes)
suffix_re = compile_suffix_regex(nlp.Defaults.suffixes)