alvations/NLTK_StanfordTools_MaltParser_Windows.md

## NLTK_StanfordTools_MaltParser_Windows.md

      
    Raw
  

              NLTK_StanfordTools_MaltParser_Windows.md
            
          
    Getting Stanford NLP and MaltParser to work in NLTK for Windows Users

Firstly, I strongly think that if you're working with NLP/ML/AI related tools, getting things to work on Linux and Mac OS is much easier and save you quite a lot of time.
Disclaimer: I am not affiliated with Continuum (conda), Git, Java, Windows OS or Stanford NLP or MaltParser group. And the steps presented below is how I, IMHO, would setup a Windows computer if I own one.
Please please please understand the solution don't just copy and paste!!! We're not monkeys typing Shakespeare ;P

Step 1: Install Conda on your machine

To make sure that you get a working NLTK version that works properly for Windows when using Stanford / Malt,
Step 1a: Install Conda for Python 3.5 from https://www.continuum.io/downloads#_windows


Step 1b: Now check that Anaconda is installed on your machine.


Step 1c: Check that it work on PowerShell too


Step 2: Install Git on your Machine from https://git-scm.com/download/win (Optional)

You can skip this if you're not going to use Git but I've left the screenshots here, just in case.


Step 2b: Check that Git works on Power Shell


Step 3: Install Java from http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html


Step 4: Install NLTK

Step 4a: Open up Power Shell


Step 4b: Install NLTK using Anaconda

Use ONLY one of the below commands in Powershell to install NLTK (NOT ALL of them)
Now, install the NLTK in Powershell using
conda install nltk

or to install the bleeding edge (also installing through Powershell)
pip install -U https://github.com/nltk/nltk/archive/develop.zip

or through git:
pip install -U git+https://github.com/nltk/nltk.git


Step 5: Download and Extract Stanford NLP tools and MaltParser

Stay within the Power Shell, don't close it yet. Open the Python3.5 interpreter within Powershell and run the following code:
Step 5a: Install MaltParser (the cheater way)

The code below will automatically download and the files needed for MaltParser and the pre-trained English model.
REMEMBER TO CHANGE THE C:\Users\Thu\Desktop\ path to your user's Desktop path, e.g. if your user name is "Alvas" on Windows then most probably the path is C:\Users\Alvas\Desktop\:
The following code snippets are tested within Windows Powershell (I suppose it should also work in other modern Python IDEs).
In Python3:
import urllib.request
import zipfile

# First we retrieve the model file from the website.
urllib.request.urlretrieve(r'http://www.maltparser.org/mco/english_parser/engmalt.poly-1.7.mco', r'C:\Users\Thu\Desktop\engmalt.poly-1.7.mco')

# Then we retrieve the parser zip file from the website.
urllib.request.urlretrieve(r'http://maltparser.org/dist/maltparser-1.8.1.zip', r'C:\Users\Thu\Desktop\maltparser-1.8.1.zip')

# Then we create a Pythonic zipfile object by initializing it with the full path to the zipfile.
zfile = zipfile.ZipFile(r'C:\Users\Thu\Desktop\maltparser-1.8.1.zip')
# And asks python to extact the file to the directory: C:\Users\Thu\Desktop\maltparser-1.8.1
zfile.extractall(r'C:\Users\Thu\Desktop\maltparser-1.8.1')

from nltk.parse import malt
# We initialize the MaltParser API with the DIRECT PATH to the malt parser DIRECTORY (not the jar file) and the .mco file.
mp = malt.MaltParser(r'C:\Users\Thu\Desktop\maltparser-1.8.1',  r'C:\Users\Thu\Desktop\engmalt.poly-1.7.mco')
mp.parse_one('I shot an elephant in my pajamas .'.split()).tree()

Step 5b: Install Stanford NER (the cheater way)

The code below will automatically download and the files needed for Stanford NER.
import urllib.request
import zipfile
urllib.request.urlretrieve(r'http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip', r'C:\Users\Thu\Desktop\stanford-ner-2015-04-20.zip')
zfile = zipfile.ZipFile(r'C:\Users\Thu\Desktop\stanford-ner-2015-04-20.zip')
zfile.extractall(r'C:\Users\Thu\Desktop\stanford-ner')

from nltk.tag.stanford import StanfordNERTagger
# First we set the direct path to the NER Tagger.
_model_filename = r'C:\Users\Thu\Desktop\stanford-ner\classifiers/english.all.3class.distsim.crf.ser.gz'
_path_to_jar = r'C:\Users\Thu\Desktop\stanford-ner\stanford-ner.jar'
# Then we initialize the NLTK's Stanford NER Tagger API with the DIRECT PATH to the model and .jar file.
st = StanfordNERTagger(model_filename=_model_filename, path_to_jar=_path_to_jar)
Step 5c: Install Stanford POS (the cheater way)

Gotcha, there won't be a spoon-fed answer here but the idea is the same as the above steps.
As said at the beginning of this gist, understand the solution don't just copy and paste!!! We're not monkeys typing Shakespeare ;P
Now using the knowledge from step 5a and 5b, use the same steps to get the Stanford POS tagger from http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip
If you need some hints, see:

https://gist.github.com/alvations/e1df0ba227e542955a8a
https://github.com/alvations/nltk_cli/blob/master/stanford.py

Step 5d: Install Stanford Parser

Do the same for Stanford Parser but do note that the API in NLTK for Stanford Parser is a little different and there will be a code overhaul once nltk/nltk#1249 is merged.
Hint: Reading this carefully will help a lot.

Unsolicited Advice

Disclaimer: Skip this to avoid hate, anger, suffering, etc; they're just my personal opinion =)
Now that the Stanford + MaltParser works in NLTK in Powershell. But you need a proper enviornment so that you code happily and enjoy the Python + NLP awesomeness, so here's some unsolicited advice ;P

TRY NOT to use Python IDLE for NLP development (Python IDLE is a great tool to learn and start your Python journey but if you're going to do NLP work, you're better off using notepad and the command prompt terminal or other IDE). Also, I encourage you to try https://try.jupyter.org/ instead IDLE if you're moving from the basic lessons.
Make sure that you get NLTK v3.2 (it has quite a lot of bugfixes, esp. better Python 3.5 support and better Windows support)
TRY to use an IDE other than IDLE!! (There's lots of them out there, Atom, Vim, Emacs, PyCharm, Eclipse+PyDev, etc.)
Try IPython Notebooks (https://ipython.org/ipython-doc/2/install/install.html#windows)
Get Unix or Mac.