Skip to content

Instantly share code, notes, and snippets.

@witt3rd
Last active July 16, 2018 13:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save witt3rd/e482f7b376bef66c190cc6cd4e43d830 to your computer and use it in GitHub Desktop.
Save witt3rd/e482f7b376bef66c190cc6cd4e43d830 to your computer and use it in GitHub Desktop.
Explosion AI's Prodigy

Explosion AI's Prodigy

Install

# Linux
pip install prodigy-1.5.1-cp35.cp36-cp35m.cp36m-linux_x86_64.whl
conda install spacy

Download SpaCy Models

python -m spacy download en_core_web_lg

Teach Terms

prodigy dataset govt_terms "Seed terms for GOVT_AGENCY label"
prodigy terms.teach govt_terms en_core_web_lg --seeds "ICE, DHS, CBP, USCIS, TSA"

Open browser to indicated server and label a dozen or so samples. Ctrl-C the server and the dataset will be saved.

Terms to Patterns

prodigy terms.to-patterns govt_terms govt_patterns -l GOVT_AGENCY

This produces a JSONL file with the label patterns.

Train a NER Model

prodigy dataset govt_ner "Train GOVT_AGENCY label"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment