Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

Getting Stanford NLP and MaltParser to work in NLTK for Windows Users

Firstly, I strongly think that if you're working with NLP/ML/AI related tools, getting things to work on Linux and Mac OS is much easier and save you quite a lot of time.

Disclaimer: I am not affiliated with Continuum (conda), Git, Java, Windows OS or Stanford NLP or MaltParser group. And the steps presented below is how I, IMHO, would setup a Windows computer if I own one.

Please please please understand the solution don't just copy and paste!!! We're not monkeys typing Shakespeare ;P


Step 1: Install Conda on your machine

To make sure that you get a working NLTK version that works properly for Windows when using Stanford / Malt,

Step 1a: Install Conda for Python 3.5 from https://www.continuum.io/downloads#_windows

screenshot 3 screenshot 4 screenshot 5 screenshot 6 screenshot 7 screenshot 8 screenshot 9 screenshot 10 screenshot 11 screenshot 12

Step 1b: Now check that Anaconda is installed on your machine.

screenshot 13 screenshot 14 screenshot 15 screenshot 16 screenshot 17 screenshot 18

Step 1c: Check that it work on PowerShell too

screenshot 19 screenshot 20 screenshot 21 screenshot 22


Step 2: Install Git on your Machine from https://git-scm.com/download/win (Optional)

You can skip this if you're not going to use Git but I've left the screenshots here, just in case.

screenshot 23 screenshot 24 screenshot 25 screenshot 26 screenshot 27 screenshot 28 screenshot 29 screenshot 30 screenshot 31 screenshot 32 screenshot 33 screenshot 34 screenshot 35 screenshot 36

Step 2b: Check that Git works on Power Shell

screenshot 37 screenshot 38 screenshot 39 screenshot 40


Step 3: Install Java from http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html

screenshot 47 screenshot 48 screenshot 49 screenshot 50 screenshot 51 screenshot 52


Step 4: Install NLTK

Step 4a: Open up Power Shell

screenshot 41 screenshot 42

Step 4b: Install NLTK using Anaconda

Use ONLY one of the below commands in Powershell to install NLTK (NOT ALL of them)

Now, install the NLTK in Powershell using

conda install nltk

or to install the bleeding edge (also installing through Powershell)

pip install -U https://github.com/nltk/nltk/archive/develop.zip

or through git:

pip install -U git+https://github.com/nltk/nltk.git

screenshot 43 screenshot 44 screenshot 45 screenshot 46


Step 5: Download and Extract Stanford NLP tools and MaltParser

Stay within the Power Shell, don't close it yet. Open the Python3.5 interpreter within Powershell and run the following code:

Step 5a: Install MaltParser (the cheater way)

The code below will automatically download and the files needed for MaltParser and the pre-trained English model.

REMEMBER TO CHANGE THE C:\Users\Thu\Desktop\ path to your user's Desktop path, e.g. if your user name is "Alvas" on Windows then most probably the path is C:\Users\Alvas\Desktop\:

The following code snippets are tested within Windows Powershell (I suppose it should also work in other modern Python IDEs).

In Python3:

import urllib.request
import zipfile

# First we retrieve the model file from the website.
urllib.request.urlretrieve(r'http://www.maltparser.org/mco/english_parser/engmalt.poly-1.7.mco', r'C:\Users\Thu\Desktop\engmalt.poly-1.7.mco')

# Then we retrieve the parser zip file from the website.
urllib.request.urlretrieve(r'http://maltparser.org/dist/maltparser-1.8.1.zip', r'C:\Users\Thu\Desktop\maltparser-1.8.1.zip')

# Then we create a Pythonic zipfile object by initializing it with the full path to the zipfile.
zfile = zipfile.ZipFile(r'C:\Users\Thu\Desktop\maltparser-1.8.1.zip')
# And asks python to extact the file to the directory: C:\Users\Thu\Desktop\maltparser-1.8.1
zfile.extractall(r'C:\Users\Thu\Desktop\maltparser-1.8.1')

from nltk.parse import malt
# We initialize the MaltParser API with the DIRECT PATH to the malt parser DIRECTORY (not the jar file) and the .mco file.
mp = malt.MaltParser(r'C:\Users\Thu\Desktop\maltparser-1.8.1',  r'C:\Users\Thu\Desktop\engmalt.poly-1.7.mco')
mp.parse_one('I shot an elephant in my pajamas .'.split()).tree()

maltout

Step 5b: Install Stanford NER (the cheater way)

The code below will automatically download and the files needed for Stanford NER.

import urllib.request
import zipfile
urllib.request.urlretrieve(r'http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip', r'C:\Users\Thu\Desktop\stanford-ner-2015-04-20.zip')
zfile = zipfile.ZipFile(r'C:\Users\Thu\Desktop\stanford-ner-2015-04-20.zip')
zfile.extractall(r'C:\Users\Thu\Desktop\stanford-ner')

from nltk.tag.stanford import StanfordNERTagger
# First we set the direct path to the NER Tagger.
_model_filename = r'C:\Users\Thu\Desktop\stanford-ner\classifiers/english.all.3class.distsim.crf.ser.gz'
_path_to_jar = r'C:\Users\Thu\Desktop\stanford-ner\stanford-ner.jar'
# Then we initialize the NLTK's Stanford NER Tagger API with the DIRECT PATH to the model and .jar file.
st = StanfordNERTagger(model_filename=_model_filename, path_to_jar=_path_to_jar)

Step 5c: Install Stanford POS (the cheater way)

Gotcha, there won't be a spoon-fed answer here but the idea is the same as the above steps.

As said at the beginning of this gist, understand the solution don't just copy and paste!!! We're not monkeys typing Shakespeare ;P

Now using the knowledge from step 5a and 5b, use the same steps to get the Stanford POS tagger from http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip

If you need some hints, see:

Step 5d: Install Stanford Parser

Do the same for Stanford Parser but do note that the API in NLTK for Stanford Parser is a little different and there will be a code overhaul once https://github.com/nltk/nltk/pull/1249 is merged.

Hint: Reading this carefully will help a lot.


Unsolicited Advice

Disclaimer: Skip this to avoid hate, anger, suffering, etc; they're just my personal opinion =)

Now that the Stanford + MaltParser works in NLTK in Powershell. But you need a proper enviornment so that you code happily and enjoy the Python + NLP awesomeness, so here's some unsolicited advice ;P

  • TRY NOT to use Python IDLE for NLP development (Python IDLE is a great tool to learn and start your Python journey but if you're going to do NLP work, you're better off using notepad and the command prompt terminal or other IDE). Also, I encourage you to try https://try.jupyter.org/ instead IDLE if you're moving from the basic lessons.
  • Make sure that you get NLTK v3.2 (it has quite a lot of bugfixes, esp. better Python 3.5 support and better Windows support)
  • TRY to use an IDE other than IDLE!! (There's lots of them out there, Atom, Vim, Emacs, PyCharm, Eclipse+PyDev, etc.)
  • Try IPython Notebooks (https://ipython.org/ipython-doc/2/install/install.html#windows)
  • Get Unix or Mac.
@ghost

This comment has been minimized.

Copy link

commented May 8, 2018

Hi,
Thanks for sharing.
I am stuck at 5a, as the first picture indicates
grafik
it has been like this for well like 10 Minutes!
The second picture shows my adjusted script:
grafik

I cannot see any difference,
Can someone please help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.