How to get almost equivalent Porter stemmers between Java and Python.
One remaining difference is how they handle non-dictionary words ending in "s":
Assuming you tokenize the following string after lowercase normalization using [a-z]\w+
...
"EMI"s request for an appeal should be denied until a final judgment is entered. Document filed by Michael Robertson. (js) (Entered: 02/01/2012)"
NLTK will map token js
-> j
where OpenNLP is js
-> js
.
Note: There may be other differences yet to be discovered.