Skip to content

Instantly share code, notes, and snippets.

View wpm's full-sized avatar

W.P. McNeill wpm

View GitHub Profile
@wpm
wpm / log.txt
Created October 5, 2021 19:09
Pip backtracking when installing spaCy from source
$ pip install --no-build-isolation --editable ".[transformers,ray]"
Obtaining file:///Users/wmcneill/Documents/src/spaCy
Preparing wheel metadata ... done
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.8 in /Users/wmcneill/opt/anaconda3/envs/spaCy/lib/python3.9/site-packages (from spacy==3.1.3) (3.0.8)
Requirement already satisfied: setuptools in /Users/wmcneill/opt/anaconda3/envs/spaCy/lib/python3.9/site-packages (from spacy==3.1.3) (58.0.4)
Requirement already satisfied: typer<0.5.0,>=0.3.0 in /Users/wmcneill/opt/anaconda3/envs/spaCy/lib/python3.9/site-packages (from spacy==3.1.3) (0.4.0)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /Users/wmcneill/opt/anaconda3/envs/spaCy/lib/python3.9/site-packages (from spacy==3.1.3) (2.26.0)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /Users/wmcneill/opt/anaconda3/envs/spaCy/lib/python3.9/site-packages (from spacy==3.1.3) (2.0.5)
Requirement already satisfied: packaging>=20.0 in /Users/wmcneill/opt/anaconda3/envs/spaCy/lib/python3.9/s
@wpm
wpm / celery_copy_directory.py
Last active December 22, 2019 00:11
Use Celery workers to copy a directory of files in parallel, with error tracking and a progress bar.
#!/usr/bin/env python
import shutil
from pathlib import Path
from typing import Iterable, Tuple, Optional
import click
from celery import Celery
from celery.utils.log import get_task_logger
from tqdm import tqdm
@wpm
wpm / maximal_fully_ordered_sublists.py
Created March 1, 2019 22:48
Maximal Fully-Ordered Sublists
def maximal_fully_ordered_sublists(s: List[T]) -> List[List[T]]:
"""
Find maximum-length sequences of in-order items in a list.
Let s be a list of items over which there exists a total ordering defined by the < operator.
Let a fully-ordered sublist s' of s be s with elements removed so that the elements of s' are monotonically
increasing.
The maximal fully-ordered sublists of s are the set of fully-ordered sublists such that no sublist is contained in
another one.
@wpm
wpm / birthday_corpus.py
Created January 3, 2018 20:37
Generate a corpus of texts mentioning birthdays that can be used to train a Prodigy named entity recognizer.
import json
import re
import time
from random import choice, random
from typing import TextIO, Callable, Sequence, Tuple, Optional
import click
NAME = DATE = str
SPAN_OFFSET = Tuple[int, int]
@wpm
wpm / json_to_jsonl.py
Created December 22, 2017 19:33
Tool to convert a JSON list into a JSONL file.
import json
from json import JSONDecodeError
from typing import Sequence
import click
class JSONList(click.ParamType):
def convert(self, value: str, _, __) -> Sequence:
@wpm
wpm / spacy_pattern_match.py
Created December 22, 2017 19:30
Utility that matches text patterns in spaCy/Prodigy training data
import json
from json import JSONDecodeError
from typing import Sequence, Iterable, List
import click
import spacy
from spacy.matcher import Matcher
def match_patterns(nlp, patterns: Sequence[dict], corpus: Iterable[str]) -> Iterable[str]:
@wpm
wpm / spacy_paragraph_segmenter.py
Created December 20, 2017 16:58
Segment a spaCy document into "paragraphs", treating whitespace tokens containing more than one line as a paragraph delimiter.
def paragraphs(document):
start = 0
for token in document:
if token.is_space and token.text.count("\n") > 1:
yield document[start:token.i]
start = token.i
yield document[start:]
@wpm
wpm / Entity Highlighting in Context.ipynb
Created December 4, 2017 16:12
Entity Highlighting in Context
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@wpm
wpm / stanford_sentiment_to_csv.py
Created December 3, 2017 19:24
Create CSV files from the Stanford Sentiment Treebank
"""
Put all the Stanford Sentiment Treebank phrase data into test, training, and dev CSVs.
Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013). Recursive Deep Models
for Semantic Compositionality Over a Sentiment Treebank. Presented at the Conference on Empirical Methods in Natural
Language Processing EMNLP.
https://nlp.stanford.edu/sentiment/
"""
@wpm
wpm / simple_mnist.py
Last active June 22, 2016 20:59
Minimal TensorFlow Example
"""
A minimal implementation of the MNIST handwritten digits classification task in TensorFlow.
This runs MNIST images images through a single hidden layer and softmax loss function.
It demonstrates in a single Python source file the basics of creating a model, training and evaluating data sets, and
writing summaries that can be visualized by TensorBoard.
"""
from __future__ import division