Created
January 3, 2020 23:21
-
-
Save victoriastuart/9d63ad8fd7e05c65ddcbd02199ee81f3 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
============================================================================== | |
file: /mnt/Vancouver/apps/CoreNLP/_victoria/gist_for_SO44910934.txt | |
title: "CoreNLP" {Java | Python} Gist for StackOverflow #44910934 | |
author: Victoria A. Stuart | |
created: 2020-01-03 | |
version: 01 | |
last modified: 2020-01-03 | |
Versions: | |
* v01 : this | |
============================================================================== | |
To accompany code described in https://stackoverflow.com/a/59549039/1904943 | |
============================================================================== | |
JAVA | |
============================================================================== | |
[victoria@victoria _victoria]$ cd /mnt/Vancouver/apps/CoreNLP/src-local/stanford-corenlp-full-2018-10-05/ | |
[victoria@victoria stanford-corenlp-full-2018-10-05]$ date; pwd; echo; ls -l | |
Fri 03 Jan 2020 02:42:29 PM PST | |
/mnt/Vancouver/apps/CoreNLP/src-local/stanford-corenlp-full-2018-10-05 | |
total 1400680 | |
-rw-r--r-- 1 victoria victoria 3340 Dec 31 14:15 BasicPipelineExample.class | |
-rw-r--r-- 1 victoria victoria 4666 Dec 31 13:33 BasicPipelineExample.java | |
-rw-r--r-- 1 victoria victoria 6103 Oct 8 2018 build.xml | |
... | |
-rw-r--r-- 1 victoria victoria 8146873 Oct 8 2018 stanford-corenlp-3.9.2.jar | |
-rw-r--r-- 1 victoria victoria 9687426 Oct 8 2018 stanford-corenlp-3.9.2-javadoc.jar | |
-rw-r--r-- 1 victoria victoria 362565193 Oct 8 2018 stanford-corenlp-3.9.2-models.jar | |
-rw-r--r-- 1 victoria victoria 5370905 Oct 8 2018 stanford-corenlp-3.9.2-sources.jar | |
-rw-r--r-- 1 victoria victoria 7240 Oct 8 2018 StanfordCoreNlpDemo.java | |
-rw-r--r-- 1 victoria victoria 199885 Oct 8 2018 StanfordDependenciesManual.pdf | |
-rw-r--r-- 1 victoria victoria 1038970602 Dec 31 14:07 stanford-english-corenlp-2018-10-05-models.jar | |
... | |
[victoria@victoria stanford-corenlp-full-2018-10-05]$ time java -cp .:* BasicPipelineExample | |
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize | |
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit | |
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos | |
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec]. | |
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma | |
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner | |
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [0.9 sec]. | |
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.5 sec]. | |
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.4 sec]. | |
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1. | |
[main] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt | |
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580704 unique entries out of 581863 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns. | |
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4869 unique entries out of 4869 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns. | |
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585573 unique entries from 2 files | |
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse | |
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.3 sec]. | |
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse | |
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... | |
[main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 7.547 (s) | |
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [11.4 sec]. | |
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref | |
[main] INFO edu.stanford.nlp.coref.neural.NeuralCorefAlgorithm - Loading coref model edu/stanford/nlp/models/coref/neural/english-model-default.ser.gz ... done [0.4 sec]. | |
[main] INFO edu.stanford.nlp.coref.neural.NeuralCorefAlgorithm - Loading coref embeddings edu/stanford/nlp/models/coref/neural/english-embeddings.ser.gz ... done [0.4 sec]. | |
[main] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: rule | |
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator kbp | |
[main] INFO edu.stanford.nlp.pipeline.KBPAnnotator - Loading KBP classifier from: edu/stanford/nlp/models/kbp/english/tac-re-lr.ser.gz | |
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator quote | |
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... | |
[main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 8.286 (s) | |
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [9.3 sec]. | |
[main] INFO edu.stanford.nlp.pipeline.QuoteAnnotator - Setting quotes. | |
Example: token | |
he-4 | |
Example: sentence | |
Joe Smith was born in California. | |
Example: pos tags | |
[IN, CD, ,, PRP, VBD, TO, NNP, ,, NNP, IN, DT, NN, .] | |
Example: ner tags | |
[O, DATE, O, O, O, O, CITY, O, COUNTRY, O, O, DATE, O] | |
Example: constituency parse | |
(ROOT (S (PP (IN In) (NP (CD 2017))) (, ,) (NP (PRP he)) (VP (VBD went) (PP (TO to) (NP (NNP Paris) (, ,) (NNP France))) (PP (IN in) (NP (DT the) (NN summer)))) (. .))) | |
Example: dependency parse | |
-> went/VBD (root) | |
-> 2017/CD (nmod:in) | |
-> In/IN (case) | |
-> ,/, (punct) | |
-> he/PRP (nsubj) | |
-> Paris/NNP (nmod:to) | |
-> to/TO (case) | |
-> ,/, (punct) | |
-> France/NNP (appos) | |
-> summer/NN (nmod:in) | |
-> in/IN (case) | |
-> the/DT (det) | |
-> ./. (punct) | |
Example: relation | |
1.0 Jane Smith per:siblings Joe Smith | |
Example: entity mentions | |
[2017, Paris, France, summer, he] | |
Example: original entity mention | |
Joe | |
Example: canonical entity mention | |
Joe Smith | |
Example: coref chains for document | |
{23=CHAIN23-["Joe Smith" in sentence 1, "he" in sentence 2, "His" in sentence 3, "Joe" in sentence 4, "He" in sentence 5, "his" in sentence 5, "Joe 's" in sentence 6], 26=CHAIN26-["his sister Jane Smith" in sentence 5, "Jane" in sentence 6, "she" in sentence 6], 12=CHAIN12-["2017" in sentence 2, "2017" in sentence 3]} | |
Example: quote | |
"That was delicious!" | |
Example: original speaker of quote | |
Joe | |
Example: canonical speaker of quote | |
Joe Smith | |
0:47.68 | |
[victoria@victoria stanford-corenlp-full-2018-10-05]$ | |
============================================================================== | |
PYTHON | |
============================================================================== | |
[victoria@victoria ~]$ p37 | |
[Python 3.7 venv (source ~/venv/py3.7/bin/activate)] | |
(py3.7) [victoria@victoria ~]$ env | grep -i virtual | |
VIRTUAL_ENV=/home/victoria/venv/py3.7 | |
(py3.7) [victoria@victoria ~]$ python --version | |
Python 3.7.4 | |
(py3.7) [victoria@victoria ~]$ date | |
Fri 03 Jan 2020 02:49:42 PM PST | |
(py3.7) [victoria@victoria ~]$ python | |
Python 3.7.4 (default, Nov 20 2019, 11:36:53) | |
[GCC 9.2.0] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> import stanfordnlp | |
>>> from stanfordnlp.server import CoreNLPClient | |
>>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G') | |
>>> text = 'Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein.' | |
>>> ann = client.annotate(text) | |
Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-25ebbde9a1ad4065.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref | |
>>> sentence = ann.sentence[0] | |
Traceback (most recent call last): | |
File "<console>", line 1, in <module> | |
AttributeError: 'str' object has no attribute 'sentence' | |
>>> client.server.terminate() | |
>>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G') | |
>>> ann = client.annotate(text) | |
Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-9043ef7d7a744b78.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref | |
>>> sentence = ann.sentence[0] | |
Traceback (most recent call last): | |
File "<console>", line 1, in <module> | |
AttributeError: 'str' object has no attribute 'sentence' | |
>>> [Ctrl-D] | |
now exiting EditableBufferInteractiveConsole... | |
(py3.7) [victoria@victoria ~]$ psgrep -l corenlp | |
UID PID PPID C STIME TTY TIME CMD | |
victoria 321300 296292 0 Jan02 pts/2 00:02:09 java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-55bcad5a4c00431e.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref | |
(py3.7) [victoria@victoria ~]$ pgrep -l -f corenlp | |
321300 java | |
(py3.7) [victoria@victoria ~]$ kill -9 321300 | |
(py3.7) [victoria@victoria ~]$ python | |
Python 3.7.4 (default, Nov 20 2019, 11:36:53) | |
[GCC 9.2.0] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> import stanfordnlp | |
>>> from stanfordnlp.server import CoreNLPClient | |
>>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G') | |
>>> text = 'Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein.' | |
>>> ann = client.annotate(text) | |
Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-ba065446f2fa404d.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref | |
>>> ## [took ~20" or so to start] | |
>>> sentence = ann.sentence[0] | |
Traceback (most recent call last): | |
File "<console>", line 1, in <module> | |
AttributeError: 'str' object has no attribute 'sentence' | |
>>> ## deleted `output_format='text'` argument: | |
>>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', timeout=30000, memory='16G') | |
>>> ann = client.annotate(text) | |
Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-423b84293ffe47f3.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref | |
>>> sentence = ann.sentence[0] | |
>>> print(sentence) | |
token { | |
word: "Breast" | |
pos: "NN" | |
value: "Breast" | |
before: "" | |
after: " " | |
originalText: "Breast" | |
ner: "CAUSE_OF_DEATH" | |
lemma: "breast" | |
beginChar: 0 | |
endChar: 6 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 0 | |
endIndex: 1 | |
tokenBeginIndex: 0 | |
tokenEndIndex: 1 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "CAUSE_OF_DEATH" | |
corefMentionIndex: 0 | |
corefMentionIndex: 3 | |
entityMentionIndex: 0 | |
} | |
token { | |
word: "cancer" | |
pos: "NN" | |
value: "cancer" | |
before: " " | |
after: " " | |
originalText: "cancer" | |
ner: "CAUSE_OF_DEATH" | |
lemma: "cancer" | |
beginChar: 7 | |
endChar: 13 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 1 | |
endIndex: 2 | |
tokenBeginIndex: 1 | |
tokenEndIndex: 2 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "CAUSE_OF_DEATH" | |
corefMentionIndex: 0 | |
corefMentionIndex: 3 | |
entityMentionIndex: 0 | |
} | |
token { | |
word: "susceptibility" | |
pos: "NN" | |
value: "susceptibility" | |
before: " " | |
after: " " | |
originalText: "susceptibility" | |
ner: "O" | |
lemma: "susceptibility" | |
beginChar: 14 | |
endChar: 28 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 2 | |
endIndex: 3 | |
tokenBeginIndex: 2 | |
tokenEndIndex: 3 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "O" | |
corefMentionIndex: 3 | |
} | |
token { | |
word: "gene" | |
pos: "NN" | |
value: "gene" | |
before: " " | |
after: " " | |
originalText: "gene" | |
ner: "O" | |
lemma: "gene" | |
beginChar: 29 | |
endChar: 33 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 3 | |
endIndex: 4 | |
tokenBeginIndex: 3 | |
tokenEndIndex: 4 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "O" | |
corefMentionIndex: 3 | |
} | |
token { | |
word: "1" | |
pos: "CD" | |
value: "1" | |
before: " " | |
after: " " | |
originalText: "1" | |
ner: "NUMBER" | |
normalizedNER: "1.0" | |
lemma: "1" | |
beginChar: 34 | |
endChar: 35 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 4 | |
endIndex: 5 | |
tokenBeginIndex: 4 | |
tokenEndIndex: 5 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "NUMBER" | |
fineGrainedNER: "NUMBER" | |
corefMentionIndex: 1 | |
corefMentionIndex: 3 | |
entityMentionIndex: 1 | |
} | |
token { | |
word: "-LRB-" | |
pos: "-LRB-" | |
value: "-LRB-" | |
before: " " | |
after: "" | |
originalText: "(" | |
ner: "O" | |
lemma: "-lrb-" | |
beginChar: 36 | |
endChar: 37 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 5 | |
endIndex: 6 | |
tokenBeginIndex: 5 | |
tokenEndIndex: 6 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "O" | |
corefMentionIndex: 3 | |
} | |
token { | |
word: "BRCA1" | |
pos: "NN" | |
value: "BRCA1" | |
before: "" | |
after: "" | |
originalText: "BRCA1" | |
ner: "O" | |
lemma: "brca1" | |
beginChar: 37 | |
endChar: 42 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 6 | |
endIndex: 7 | |
tokenBeginIndex: 6 | |
tokenEndIndex: 7 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "O" | |
corefMentionIndex: 3 | |
corefMentionIndex: 4 | |
} | |
token { | |
word: "-RRB-" | |
pos: "-RRB-" | |
value: "-RRB-" | |
before: "" | |
after: " " | |
originalText: ")" | |
ner: "O" | |
lemma: "-rrb-" | |
beginChar: 42 | |
endChar: 43 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 7 | |
endIndex: 8 | |
tokenBeginIndex: 7 | |
tokenEndIndex: 8 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "O" | |
corefMentionIndex: 3 | |
} | |
token { | |
word: "is" | |
pos: "VBZ" | |
value: "is" | |
before: " " | |
after: " " | |
originalText: "is" | |
ner: "O" | |
lemma: "be" | |
beginChar: 44 | |
endChar: 46 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 8 | |
endIndex: 9 | |
tokenBeginIndex: 8 | |
tokenEndIndex: 9 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "O" | |
} | |
token { | |
word: "a" | |
pos: "DT" | |
value: "a" | |
before: " " | |
after: " " | |
originalText: "a" | |
ner: "O" | |
lemma: "a" | |
beginChar: 47 | |
endChar: 48 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 9 | |
endIndex: 10 | |
tokenBeginIndex: 9 | |
tokenEndIndex: 10 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "O" | |
corefMentionIndex: 5 | |
} | |
token { | |
word: "tumor" | |
pos: "NN" | |
value: "tumor" | |
before: " " | |
after: " " | |
originalText: "tumor" | |
ner: "CAUSE_OF_DEATH" | |
lemma: "tumor" | |
beginChar: 49 | |
endChar: 54 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 10 | |
endIndex: 11 | |
tokenBeginIndex: 10 | |
tokenEndIndex: 11 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "CAUSE_OF_DEATH" | |
corefMentionIndex: 2 | |
corefMentionIndex: 5 | |
entityMentionIndex: 2 | |
} | |
token { | |
word: "suppressor" | |
pos: "NN" | |
value: "suppressor" | |
before: " " | |
after: " " | |
originalText: "suppressor" | |
ner: "O" | |
lemma: "suppressor" | |
beginChar: 55 | |
endChar: 65 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 11 | |
endIndex: 12 | |
tokenBeginIndex: 11 | |
tokenEndIndex: 12 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "O" | |
corefMentionIndex: 5 | |
} | |
token { | |
word: "protein" | |
pos: "NN" | |
value: "protein" | |
before: " " | |
after: "" | |
originalText: "protein" | |
ner: "O" | |
lemma: "protein" | |
beginChar: 66 | |
endChar: 73 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 12 | |
endIndex: 13 | |
tokenBeginIndex: 12 | |
tokenEndIndex: 13 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "O" | |
corefMentionIndex: 5 | |
} | |
token { | |
word: "." | |
pos: "." | |
value: "." | |
before: "" | |
after: "" | |
originalText: "." | |
ner: "O" | |
lemma: "." | |
beginChar: 73 | |
endChar: 74 | |
utterance: 0 | |
speaker: "PER0" | |
beginIndex: 13 | |
endIndex: 14 | |
tokenBeginIndex: 13 | |
tokenEndIndex: 14 | |
hasXmlContext: false | |
isNewline: false | |
coarseNER: "O" | |
fineGrainedNER: "O" | |
} | |
tokenOffsetBegin: 0 | |
tokenOffsetEnd: 14 | |
sentenceIndex: 0 | |
characterOffsetBegin: 0 | |
characterOffsetEnd: 74 | |
parseTree { | |
child { | |
child { | |
child { | |
child { | |
child { | |
child { | |
value: "Breast" | |
} | |
value: "NN" | |
score: -13.085748672485352 | |
} | |
child { | |
child { | |
value: "cancer" | |
} | |
value: "NN" | |
score: -7.361298084259033 | |
} | |
child { | |
child { | |
value: "susceptibility" | |
} | |
value: "NN" | |
score: -12.832098960876465 | |
} | |
value: "NP" | |
score: -39.81563186645508 | |
} | |
child { | |
child { | |
child { | |
value: "gene" | |
} | |
value: "NN" | |
score: -7.761730194091797 | |
} | |
child { | |
child { | |
value: "1" | |
} | |
value: "CD" | |
score: -4.178682804107666 | |
} | |
value: "NP" | |
score: -19.19379997253418 | |
} | |
value: "NP" | |
score: -62.36488342285156 | |
} | |
child { | |
child { | |
child { | |
value: "-LRB-" | |
} | |
value: "-LRB-" | |
score: -0.06566064804792404 | |
} | |
child { | |
child { | |
child { | |
value: "BRCA1" | |
} | |
value: "NN" | |
score: -13.365689277648926 | |
} | |
value: "NP" | |
score: -16.57198715209961 | |
} | |
child { | |
child { | |
value: "-RRB-" | |
} | |
value: "-RRB-" | |
score: -0.06669137626886368 | |
} | |
value: "PRN" | |
score: -17.963926315307617 | |
} | |
value: "NP" | |
score: -86.23522186279297 | |
} | |
child { | |
child { | |
child { | |
value: "is" | |
} | |
value: "VBZ" | |
score: -0.14657023549079895 | |
} | |
child { | |
child { | |
child { | |
value: "a" | |
} | |
value: "DT" | |
score: -1.4235451221466064 | |
} | |
child { | |
child { | |
value: "tumor" | |
} | |
value: "NN" | |
score: -9.49818229675293 | |
} | |
child { | |
child { | |
value: "suppressor" | |
} | |
value: "NN" | |
score: -10.207574844360352 | |
} | |
child { | |
child { | |
value: "protein" | |
} | |
value: "NN" | |
score: -9.312461853027344 | |
} | |
value: "NP" | |
score: -36.75123977661133 | |
} | |
value: "VP" | |
score: -42.08717727661133 | |
} | |
child { | |
child { | |
value: "." | |
} | |
value: "." | |
score: -0.003481106134131551 | |
} | |
value: "S" | |
score: -131.2326202392578 | |
} | |
value: "ROOT" | |
score: -131.38381958007812 | |
} | |
basicDependencies { | |
node { | |
sentenceIndex: 0 | |
index: 1 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 2 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 3 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 4 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 5 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 6 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 7 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 8 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 9 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 10 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 11 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 12 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 13 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 14 | |
} | |
edge { | |
source: 4 | |
target: 1 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 2 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 3 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 5 | |
dep: "nummod" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 7 | |
dep: "appos" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 7 | |
target: 6 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 7 | |
target: 8 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 4 | |
dep: "nsubj" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 9 | |
dep: "cop" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 10 | |
dep: "det" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 11 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 12 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 14 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
root: 13 | |
} | |
collapsedDependencies { | |
node { | |
sentenceIndex: 0 | |
index: 1 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 2 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 3 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 4 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 5 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 6 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 7 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 8 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 9 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 10 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 11 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 12 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 13 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 14 | |
} | |
edge { | |
source: 4 | |
target: 1 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 2 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 3 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 5 | |
dep: "nummod" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 7 | |
dep: "appos" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 7 | |
target: 6 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 7 | |
target: 8 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 4 | |
dep: "nsubj" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 9 | |
dep: "cop" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 10 | |
dep: "det" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 11 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 12 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 14 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
root: 13 | |
} | |
collapsedCCProcessedDependencies { | |
node { | |
sentenceIndex: 0 | |
index: 1 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 2 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 3 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 4 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 5 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 6 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 7 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 8 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 9 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 10 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 11 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 12 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 13 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 14 | |
} | |
edge { | |
source: 4 | |
target: 1 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 2 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 3 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 5 | |
dep: "nummod" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 7 | |
dep: "appos" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 7 | |
target: 6 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 7 | |
target: 8 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 4 | |
dep: "nsubj" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 9 | |
dep: "cop" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 10 | |
dep: "det" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 11 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 12 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 14 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
root: 13 | |
} | |
paragraph: 1 | |
enhancedDependencies { | |
node { | |
sentenceIndex: 0 | |
index: 1 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 2 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 3 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 4 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 5 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 6 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 7 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 8 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 9 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 10 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 11 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 12 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 13 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 14 | |
} | |
edge { | |
source: 4 | |
target: 1 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 2 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 3 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 5 | |
dep: "nummod" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 7 | |
dep: "appos" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 7 | |
target: 6 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 7 | |
target: 8 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 4 | |
dep: "nsubj" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 9 | |
dep: "cop" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 10 | |
dep: "det" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 11 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 12 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 14 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
root: 13 | |
} | |
enhancedPlusPlusDependencies { | |
node { | |
sentenceIndex: 0 | |
index: 1 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 2 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 3 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 4 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 5 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 6 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 7 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 8 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 9 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 10 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 11 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 12 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 13 | |
} | |
node { | |
sentenceIndex: 0 | |
index: 14 | |
} | |
edge { | |
source: 4 | |
target: 1 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 2 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 3 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 5 | |
dep: "nummod" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 4 | |
target: 7 | |
dep: "appos" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 7 | |
target: 6 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 7 | |
target: 8 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 4 | |
dep: "nsubj" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 9 | |
dep: "cop" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 10 | |
dep: "det" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 11 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 12 | |
dep: "compound" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
edge { | |
source: 13 | |
target: 14 | |
dep: "punct" | |
isExtra: false | |
sourceCopy: 0 | |
targetCopy: 0 | |
language: UniversalEnglish | |
} | |
root: 13 | |
} | |
binarizedParseTree { | |
child { | |
child { | |
child { | |
child { | |
child { | |
child { | |
value: "Breast" | |
} | |
value: "NN" | |
} | |
child { | |
child { | |
child { | |
value: "cancer" | |
} | |
value: "NN" | |
} | |
child { | |
child { | |
value: "susceptibility" | |
} | |
value: "NN" | |
} | |
value: "@NP" | |
} | |
value: "NP" | |
} | |
child { | |
child { | |
child { | |
value: "gene" | |
} | |
value: "NN" | |
} | |
child { | |
child { | |
value: "1" | |
} | |
value: "CD" | |
} | |
value: "NP" | |
} | |
value: "NP" | |
} | |
child { | |
child { | |
child { | |
value: "-LRB-" | |
} | |
value: "-LRB-" | |
} | |
child { | |
child { | |
child { | |
child { | |
value: "BRCA1" | |
} | |
value: "NN" | |
} | |
value: "NP" | |
} | |
child { | |
child { | |
value: "-RRB-" | |
} | |
value: "-RRB-" | |
} | |
value: "@PRN" | |
} | |
value: "PRN" | |
} | |
value: "NP" | |
} | |
child { | |
child { | |
child { | |
child { | |
value: "is" | |
} | |
value: "VBZ" | |
} | |
child { | |
child { | |
child { | |
value: "a" | |
} | |
value: "DT" | |
} | |
child { | |
child { | |
child { | |
value: "tumor" | |
} | |
value: "NN" | |
} | |
child { | |
child { | |
child { | |
value: "suppressor" | |
} | |
value: "NN" | |
} | |
child { | |
child { | |
value: "protein" | |
} | |
value: "NN" | |
} | |
value: "@NP" | |
} | |
value: "@NP" | |
} | |
value: "NP" | |
} | |
value: "VP" | |
} | |
child { | |
child { | |
value: "." | |
} | |
value: "." | |
} | |
value: "@S" | |
} | |
value: "S" | |
} | |
value: "ROOT" | |
} | |
hasRelationAnnotations: false | |
hasNumerizedTokensAnnotation: true | |
mentions { | |
sentenceIndex: 0 | |
tokenStartInSentenceInclusive: 0 | |
tokenEndInSentenceExclusive: 2 | |
ner: "CAUSE_OF_DEATH" | |
entityType: "CAUSE_OF_DEATH" | |
entityMentionIndex: 0 | |
canonicalEntityMentionIndex: 0 | |
entityMentionText: "Breast cancer" | |
} | |
mentions { | |
sentenceIndex: 0 | |
tokenStartInSentenceInclusive: 4 | |
tokenEndInSentenceExclusive: 5 | |
ner: "NUMBER" | |
normalizedNER: "1.0" | |
entityType: "NUMBER" | |
entityMentionIndex: 1 | |
canonicalEntityMentionIndex: 1 | |
entityMentionText: "1" | |
} | |
mentions { | |
sentenceIndex: 0 | |
tokenStartInSentenceInclusive: 10 | |
tokenEndInSentenceExclusive: 11 | |
ner: "CAUSE_OF_DEATH" | |
entityType: "CAUSE_OF_DEATH" | |
entityMentionIndex: 2 | |
canonicalEntityMentionIndex: 2 | |
entityMentionText: "tumor" | |
} | |
mentionsForCoref { | |
mentionID: 0 | |
mentionType: "NOMINAL" | |
number: "SINGULAR" | |
gender: "NEUTRAL" | |
animacy: "INANIMATE" | |
person: "UNKNOWN" | |
startIndex: 0 | |
endIndex: 2 | |
headIndex: 1 | |
headString: "cancer" | |
nerString: "O" | |
originalRef: 4294967295 | |
goldCorefClusterID: -1 | |
corefClusterID: 0 | |
mentionNum: 1 | |
sentNum: 0 | |
utter: 0 | |
paragraph: 1 | |
isSubject: false | |
isDirectObject: false | |
isIndirectObject: false | |
isPrepositionObject: false | |
hasTwin: false | |
generic: false | |
isSingleton: false | |
hasBasicDependency: true | |
hasEnhancedDepenedncy: true | |
hasContextParseTree: true | |
headIndexedWord { | |
sentenceNum: 4294967295 | |
tokenIndex: 1 | |
copyCount: 0 | |
} | |
dependingVerb { | |
sentenceNum: 4294967295 | |
tokenIndex: 4294967295 | |
} | |
headWord { | |
sentenceNum: 4294967295 | |
tokenIndex: 1 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 0 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 1 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 2 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 3 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 4 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 5 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 6 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 7 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 8 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 9 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 10 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 11 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 12 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 13 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 0 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 1 | |
} | |
} | |
mentionsForCoref { | |
mentionID: 1 | |
mentionType: "PROPER" | |
number: "SINGULAR" | |
gender: "UNKNOWN" | |
animacy: "INANIMATE" | |
person: "UNKNOWN" | |
startIndex: 4 | |
endIndex: 5 | |
headIndex: 4 | |
headString: "1" | |
nerString: "NUMBER" | |
originalRef: 4294967295 | |
goldCorefClusterID: -1 | |
corefClusterID: 1 | |
mentionNum: 2 | |
sentNum: 0 | |
utter: 0 | |
paragraph: 1 | |
isSubject: false | |
isDirectObject: false | |
isIndirectObject: false | |
isPrepositionObject: false | |
hasTwin: false | |
generic: false | |
isSingleton: false | |
hasBasicDependency: true | |
hasEnhancedDepenedncy: true | |
hasContextParseTree: true | |
headIndexedWord { | |
sentenceNum: 4294967295 | |
tokenIndex: 4 | |
copyCount: 0 | |
} | |
dependingVerb { | |
sentenceNum: 4294967295 | |
tokenIndex: 4294967295 | |
} | |
headWord { | |
sentenceNum: 4294967295 | |
tokenIndex: 4 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 0 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 1 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 2 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 3 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 4 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 5 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 6 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 7 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 8 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 9 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 10 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 11 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 12 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 13 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 4 | |
} | |
} | |
mentionsForCoref { | |
mentionID: 2 | |
mentionType: "NOMINAL" | |
number: "SINGULAR" | |
gender: "NEUTRAL" | |
animacy: "INANIMATE" | |
person: "UNKNOWN" | |
startIndex: 10 | |
endIndex: 11 | |
headIndex: 10 | |
headString: "tumor" | |
nerString: "O" | |
originalRef: 4294967295 | |
goldCorefClusterID: -1 | |
corefClusterID: 2 | |
mentionNum: 5 | |
sentNum: 0 | |
utter: 0 | |
paragraph: 1 | |
isSubject: false | |
isDirectObject: false | |
isIndirectObject: false | |
isPrepositionObject: false | |
hasTwin: false | |
generic: false | |
isSingleton: false | |
hasBasicDependency: true | |
hasEnhancedDepenedncy: true | |
hasContextParseTree: true | |
headIndexedWord { | |
sentenceNum: 4294967295 | |
tokenIndex: 10 | |
copyCount: 0 | |
} | |
dependingVerb { | |
sentenceNum: 4294967295 | |
tokenIndex: 4294967295 | |
} | |
headWord { | |
sentenceNum: 4294967295 | |
tokenIndex: 10 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 0 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 1 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 2 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 3 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 4 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 5 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 6 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 7 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 8 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 9 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 10 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 11 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 12 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 13 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 10 | |
} | |
} | |
mentionsForCoref { | |
mentionID: 3 | |
mentionType: "NOMINAL" | |
number: "SINGULAR" | |
gender: "UNKNOWN" | |
animacy: "INANIMATE" | |
person: "UNKNOWN" | |
startIndex: 0 | |
endIndex: 8 | |
headIndex: 3 | |
headString: "gene" | |
nerString: "O" | |
originalRef: 4294967295 | |
goldCorefClusterID: -1 | |
corefClusterID: 3 | |
mentionNum: 0 | |
sentNum: 0 | |
utter: 0 | |
paragraph: 1 | |
isSubject: false | |
isDirectObject: false | |
isIndirectObject: false | |
isPrepositionObject: false | |
hasTwin: false | |
generic: false | |
isSingleton: false | |
hasBasicDependency: true | |
hasEnhancedDepenedncy: true | |
hasContextParseTree: true | |
headIndexedWord { | |
sentenceNum: 4294967295 | |
tokenIndex: 3 | |
copyCount: 0 | |
} | |
dependingVerb { | |
sentenceNum: 4294967295 | |
tokenIndex: 4294967295 | |
} | |
headWord { | |
sentenceNum: 4294967295 | |
tokenIndex: 3 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 0 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 1 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 2 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 3 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 4 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 5 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 6 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 7 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 8 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 9 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 10 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 11 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 12 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 13 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 0 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 1 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 2 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 3 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 4 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 5 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 6 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 7 | |
} | |
} | |
mentionsForCoref { | |
mentionID: 4 | |
mentionType: "NOMINAL" | |
number: "SINGULAR" | |
gender: "UNKNOWN" | |
animacy: "UNKNOWN" | |
person: "UNKNOWN" | |
startIndex: 6 | |
endIndex: 7 | |
headIndex: 6 | |
headString: "brca1" | |
nerString: "O" | |
originalRef: 4294967295 | |
goldCorefClusterID: -1 | |
corefClusterID: 4 | |
mentionNum: 3 | |
sentNum: 0 | |
utter: 0 | |
paragraph: 1 | |
isSubject: false | |
isDirectObject: false | |
isIndirectObject: false | |
isPrepositionObject: false | |
hasTwin: false | |
generic: false | |
isSingleton: false | |
hasBasicDependency: true | |
hasEnhancedDepenedncy: true | |
hasContextParseTree: true | |
headIndexedWord { | |
sentenceNum: 4294967295 | |
tokenIndex: 6 | |
copyCount: 0 | |
} | |
dependingVerb { | |
sentenceNum: 4294967295 | |
tokenIndex: 4294967295 | |
} | |
headWord { | |
sentenceNum: 4294967295 | |
tokenIndex: 6 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 0 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 1 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 2 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 3 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 4 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 5 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 6 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 7 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 8 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 9 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 10 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 11 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 12 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 13 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 6 | |
} | |
appositions: 3 | |
} | |
mentionsForCoref { | |
mentionID: 5 | |
mentionType: "NOMINAL" | |
number: "SINGULAR" | |
gender: "NEUTRAL" | |
animacy: "INANIMATE" | |
person: "UNKNOWN" | |
startIndex: 9 | |
endIndex: 13 | |
headIndex: 12 | |
headString: "protein" | |
nerString: "O" | |
originalRef: 4294967295 | |
goldCorefClusterID: -1 | |
corefClusterID: 5 | |
mentionNum: 4 | |
sentNum: 0 | |
utter: 0 | |
paragraph: 1 | |
isSubject: false | |
isDirectObject: false | |
isIndirectObject: false | |
isPrepositionObject: false | |
hasTwin: false | |
generic: false | |
isSingleton: false | |
hasBasicDependency: true | |
hasEnhancedDepenedncy: true | |
hasContextParseTree: true | |
headIndexedWord { | |
sentenceNum: 4294967295 | |
tokenIndex: 12 | |
copyCount: 0 | |
} | |
dependingVerb { | |
sentenceNum: 4294967295 | |
tokenIndex: 4294967295 | |
} | |
headWord { | |
sentenceNum: 4294967295 | |
tokenIndex: 12 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 0 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 1 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 2 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 3 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 4 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 5 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 6 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 7 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 8 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 9 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 10 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 11 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 12 | |
} | |
sentenceWords { | |
sentenceNum: 4294967295 | |
tokenIndex: 13 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 9 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 10 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 11 | |
} | |
originalSpan { | |
sentenceNum: 4294967295 | |
tokenIndex: 12 | |
} | |
predicateNominatives: 3 | |
} | |
hasCorefMentionsAnnotation: true | |
hasEntityMentionsAnnotation: true | |
>>> ## ALL OF THAT (ABOVE) WAS FOR ONE SENTENCE! :-O | |
>>> ## SAME OUTPUT: | |
>>> print(ann) | |
text: "Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein." | |
sentence { | |
token { | |
word: "Breast" | |
pos: "NN" | |
value: "Breast" | |
before: "" | |
after: " " | |
[ ... snip ... ] | |
>>> ## **MUCH** MORE COMPACT: | |
>>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G') | |
>>> ann = client.annotate(text) | |
Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-163b9ecb6a9947a8.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref | |
>>> print(ann) | |
Sentence #1 (14 tokens): | |
Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein. | |
Tokens: | |
[Text=Breast CharacterOffsetBegin=0 CharacterOffsetEnd=6 PartOfSpeech=NN Lemma=breast NamedEntityTag=CAUSE_OF_DEATH] | |
[Text=cancer CharacterOffsetBegin=7 CharacterOffsetEnd=13 PartOfSpeech=NN Lemma=cancer NamedEntityTag=CAUSE_OF_DEATH] | |
[Text=susceptibility CharacterOffsetBegin=14 CharacterOffsetEnd=28 PartOfSpeech=NN Lemma=susceptibility NamedEntityTag=O] | |
[Text=gene CharacterOffsetBegin=29 CharacterOffsetEnd=33 PartOfSpeech=NN Lemma=gene NamedEntityTag=O] | |
[Text=1 CharacterOffsetBegin=34 CharacterOffsetEnd=35 PartOfSpeech=CD Lemma=1 NamedEntityTag=NUMBER NormalizedNamedEntityTag=1.0] | |
[Text=-LRB- CharacterOffsetBegin=36 CharacterOffsetEnd=37 PartOfSpeech=-LRB- Lemma=-lrb- NamedEntityTag=O] | |
[Text=BRCA1 CharacterOffsetBegin=37 CharacterOffsetEnd=42 PartOfSpeech=NN Lemma=brca1 NamedEntityTag=O] | |
[Text=-RRB- CharacterOffsetBegin=42 CharacterOffsetEnd=43 PartOfSpeech=-RRB- Lemma=-rrb- NamedEntityTag=O] | |
[Text=is CharacterOffsetBegin=44 CharacterOffsetEnd=46 PartOfSpeech=VBZ Lemma=be NamedEntityTag=O] | |
[Text=a CharacterOffsetBegin=47 CharacterOffsetEnd=48 PartOfSpeech=DT Lemma=a NamedEntityTag=O] | |
[Text=tumor CharacterOffsetBegin=49 CharacterOffsetEnd=54 PartOfSpeech=NN Lemma=tumor NamedEntityTag=CAUSE_OF_DEATH] | |
[Text=suppressor CharacterOffsetBegin=55 CharacterOffsetEnd=65 PartOfSpeech=NN Lemma=suppressor NamedEntityTag=O] | |
[Text=protein CharacterOffsetBegin=66 CharacterOffsetEnd=73 PartOfSpeech=NN Lemma=protein NamedEntityTag=O] | |
[Text=. CharacterOffsetBegin=73 CharacterOffsetEnd=74 PartOfSpeech=. Lemma=. NamedEntityTag=O] | |
Constituency parse: | |
(ROOT | |
(S | |
(NP | |
(NP | |
(NP (NN Breast) (NN cancer) (NN susceptibility)) | |
(NP (NN gene) (CD 1))) | |
(PRN (-LRB- -LRB-) | |
(NP (NN BRCA1)) | |
(-RRB- -RRB-))) | |
(VP (VBZ is) | |
(NP (DT a) (NN tumor) (NN suppressor) (NN protein))) | |
(. .))) | |
Dependency Parse (enhanced plus plus dependencies): | |
root(ROOT-0, protein-13) | |
compound(gene-4, Breast-1) | |
compound(gene-4, cancer-2) | |
compound(gene-4, susceptibility-3) | |
nsubj(protein-13, gene-4) | |
nummod(gene-4, 1-5) | |
punct(BRCA1-7, -LRB--6) | |
appos(gene-4, BRCA1-7) | |
punct(BRCA1-7, -RRB--8) | |
cop(protein-13, is-9) | |
det(protein-13, a-10) | |
compound(protein-13, tumor-11) | |
compound(protein-13, suppressor-12) | |
punct(protein-13, .-14) | |
Extracted the following NER entity mentions: | |
Breast cancer CAUSE_OF_DEATH | |
1 NUMBER | |
tumor CAUSE_OF_DEATH | |
# ============================================================================ | |
>>> import stanfordnlp | |
>>> stanfordnlp.download('en') | |
Using the default treebank "en_ewt" for language "en". | |
Would you like to download the models for: en_ewt now? (Y/n) Y | |
Default download directory: /home/victoria/stanfordnlp_resources | |
Hit enter to continue or type an alternate directory. | |
Downloading models for: en_ewt | |
Download location: /home/victoria/stanfordnlp_resources/en_ewt_models.zip | |
100%|█████████████████████████████████████| 235M/235M [01:15<00:00, 3.09MB/s] | |
Download complete. Models saved to: /home/victoria/stanfordnlp_resources/en_ewt_models.zip | |
Extracting models file for: en_ewt | |
Cleaning up...Done. | |
>>> nlp = stanfordnlp.Pipeline() | |
Use device: cpu | |
--- | |
Loading: tokenize | |
With settings: | |
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
--- | |
Loading: pos | |
With settings: | |
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
--- | |
Loading: lemma | |
With settings: | |
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
Building an attentional Seq2Seq model... | |
Using a Bi-LSTM encoder | |
Using soft attention for LSTM. | |
Finetune all embeddings. | |
[Running seq2seq lemmatizer with edit classifier] | |
--- | |
Loading: depparse | |
With settings: | |
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_parser.pt', 'pretrain_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
Done loading processors! | |
--- | |
>>> text = 'Bananas are an excellent source of potassium.' | |
>>> text_nlp = nlp(text) | |
>>> text_nlp.sentences[0].print_dependencies() | |
('Bananas', '5', 'nsubj') | |
('are', '5', 'cop') | |
('an', '5', 'det') | |
('excellent', '5', 'amod') | |
('source', '0', 'root') | |
('of', '7', 'case') | |
('potassium', '5', 'nmod') | |
('.', '5', 'punct') | |
# ============================================================================ | |
>>> import stanfordnlp | |
>>> from spacy_stanfordnlp import StanfordNLPLanguage | |
>>> snlp = stanfordnlp.Pipeline(lang="en") | |
Use device: cpu | |
--- | |
Loading: tokenize | |
With settings: | |
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
--- | |
Loading: pos | |
With settings: | |
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
--- | |
Loading: lemma | |
With settings: | |
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
Building an attentional Seq2Seq model... | |
Using a Bi-LSTM encoder | |
Using soft attention for LSTM. | |
Finetune all embeddings. | |
[Running seq2seq lemmatizer with edit classifier] | |
--- | |
Loading: depparse | |
With settings: | |
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_parser.pt', 'pretrain_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
Done loading processors! | |
--- | |
>>> nlp = StanfordNLPLanguage(snlp) | |
>>> doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.") | |
>>> for token in doc: | |
... print(token.text, token.lemma_, token.pos_, token.dep_) | |
... | |
Barack Barack PROPN nsubj:pass | |
Obama Obama PROPN flat | |
was be AUX aux:pass | |
born bear VERB root | |
in in ADP case | |
Hawaii Hawaii PROPN obl | |
. . PUNCT punct | |
He he PRON nsubj:pass | |
was be AUX aux:pass | |
elected elect VERB root | |
president president PROPN xcomp | |
in in ADP case | |
2008 2008 NUM obl | |
. . PUNCT punct | |
>>> | |
============================================================================== | |
============================================================================== | |
END OF FILE | |
============================================================================== | |
============================================================================== |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment