Skip to content

Instantly share code, notes, and snippets.

@aparrish
Last active April 24, 2019 13:16
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aparrish/509b5cfca74498085b05fe180d93aa3c to your computer and use it in GitHub Desktop.
Save aparrish/509b5cfca74498085b05fe180d93aa3c to your computer and use it in GitHub Desktop.
Links and resources for CMU DH Literacy Workshop 2018

Creative Writing with Natural Language Processing

Computational tools and statistical analysis are often deployed as a method to “read” texts. But what about using these same techniques to write them? In this workshop, we’ll investigate the state of the art of natural language processing with an eye toward using the sometimes-unintuitive abstractions of language produced by computational models to make programs that create surprising and poetic creative writing. Topics include: a whirlwind tour of spaCy for parsing English into syntactic constituents; a discussion of techniques for classifying and summarizing documents; and an explanation and demonstration of “word vectors” (like Google’s word2vec), an innovative language technology that allows computers to process written language less as discrete units and more like a continuous signal. Workshop participants will develop a number of small projects in text analysis and poetics using a public domain text of their choice. In becoming familiar with contemporary techniques for computational language analysis, critics and researchers will be able to reason better about language-based media on the Internet. Artists and writers, meanwhile, might just learn a few new techniques to add to their creative palette.

Instructions

To install spaCy on Anaconda, you'll need to open a Terminal window (or the equivalent on your operating system) and type

conda install -c conda-forge spacy==2.0.11

This line installs the library. You'll also need to download a language model. For that, type:

python -m spacy download en_core_web_md

(Replace en with the language code for your desired language, if there's a model available for it.) The language model contains the statistical information necessary to parse text into sentences and sentences into parts of speech. Note that this download is several hundred megabytes, so it might take a while!

Notes

If time:

Further resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment