Skip to content

Instantly share code, notes, and snippets.

@victoriastuart
Created January 3, 2020 23:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save victoriastuart/9d63ad8fd7e05c65ddcbd02199ee81f3 to your computer and use it in GitHub Desktop.
Save victoriastuart/9d63ad8fd7e05c65ddcbd02199ee81f3 to your computer and use it in GitHub Desktop.
==============================================================================
file: /mnt/Vancouver/apps/CoreNLP/_victoria/gist_for_SO44910934.txt
title: "CoreNLP" {Java | Python} Gist for StackOverflow #44910934
author: Victoria A. Stuart
created: 2020-01-03
version: 01
last modified: 2020-01-03
Versions:
* v01 : this
==============================================================================
To accompany code described in https://stackoverflow.com/a/59549039/1904943
==============================================================================
JAVA
==============================================================================
[victoria@victoria _victoria]$ cd /mnt/Vancouver/apps/CoreNLP/src-local/stanford-corenlp-full-2018-10-05/
[victoria@victoria stanford-corenlp-full-2018-10-05]$ date; pwd; echo; ls -l
Fri 03 Jan 2020 02:42:29 PM PST
/mnt/Vancouver/apps/CoreNLP/src-local/stanford-corenlp-full-2018-10-05
total 1400680
-rw-r--r-- 1 victoria victoria 3340 Dec 31 14:15 BasicPipelineExample.class
-rw-r--r-- 1 victoria victoria 4666 Dec 31 13:33 BasicPipelineExample.java
-rw-r--r-- 1 victoria victoria 6103 Oct 8 2018 build.xml
...
-rw-r--r-- 1 victoria victoria 8146873 Oct 8 2018 stanford-corenlp-3.9.2.jar
-rw-r--r-- 1 victoria victoria 9687426 Oct 8 2018 stanford-corenlp-3.9.2-javadoc.jar
-rw-r--r-- 1 victoria victoria 362565193 Oct 8 2018 stanford-corenlp-3.9.2-models.jar
-rw-r--r-- 1 victoria victoria 5370905 Oct 8 2018 stanford-corenlp-3.9.2-sources.jar
-rw-r--r-- 1 victoria victoria 7240 Oct 8 2018 StanfordCoreNlpDemo.java
-rw-r--r-- 1 victoria victoria 199885 Oct 8 2018 StanfordDependenciesManual.pdf
-rw-r--r-- 1 victoria victoria 1038970602 Dec 31 14:07 stanford-english-corenlp-2018-10-05-models.jar
...
[victoria@victoria stanford-corenlp-full-2018-10-05]$ time java -cp .:* BasicPipelineExample
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [0.9 sec].
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.5 sec].
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.4 sec].
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
[main] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580704 unique entries out of 581863 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns.
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4869 unique entries out of 4869 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns.
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585573 unique entries from 2 files
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.3 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
[main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 7.547 (s)
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [11.4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
[main] INFO edu.stanford.nlp.coref.neural.NeuralCorefAlgorithm - Loading coref model edu/stanford/nlp/models/coref/neural/english-model-default.ser.gz ... done [0.4 sec].
[main] INFO edu.stanford.nlp.coref.neural.NeuralCorefAlgorithm - Loading coref embeddings edu/stanford/nlp/models/coref/neural/english-embeddings.ser.gz ... done [0.4 sec].
[main] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: rule
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator kbp
[main] INFO edu.stanford.nlp.pipeline.KBPAnnotator - Loading KBP classifier from: edu/stanford/nlp/models/kbp/english/tac-re-lr.ser.gz
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator quote
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
[main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 8.286 (s)
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [9.3 sec].
[main] INFO edu.stanford.nlp.pipeline.QuoteAnnotator - Setting quotes.
Example: token
he-4
Example: sentence
Joe Smith was born in California.
Example: pos tags
[IN, CD, ,, PRP, VBD, TO, NNP, ,, NNP, IN, DT, NN, .]
Example: ner tags
[O, DATE, O, O, O, O, CITY, O, COUNTRY, O, O, DATE, O]
Example: constituency parse
(ROOT (S (PP (IN In) (NP (CD 2017))) (, ,) (NP (PRP he)) (VP (VBD went) (PP (TO to) (NP (NNP Paris) (, ,) (NNP France))) (PP (IN in) (NP (DT the) (NN summer)))) (. .)))
Example: dependency parse
-> went/VBD (root)
-> 2017/CD (nmod:in)
-> In/IN (case)
-> ,/, (punct)
-> he/PRP (nsubj)
-> Paris/NNP (nmod:to)
-> to/TO (case)
-> ,/, (punct)
-> France/NNP (appos)
-> summer/NN (nmod:in)
-> in/IN (case)
-> the/DT (det)
-> ./. (punct)
Example: relation
1.0 Jane Smith per:siblings Joe Smith
Example: entity mentions
[2017, Paris, France, summer, he]
Example: original entity mention
Joe
Example: canonical entity mention
Joe Smith
Example: coref chains for document
{23=CHAIN23-["Joe Smith" in sentence 1, "he" in sentence 2, "His" in sentence 3, "Joe" in sentence 4, "He" in sentence 5, "his" in sentence 5, "Joe 's" in sentence 6], 26=CHAIN26-["his sister Jane Smith" in sentence 5, "Jane" in sentence 6, "she" in sentence 6], 12=CHAIN12-["2017" in sentence 2, "2017" in sentence 3]}
Example: quote
"That was delicious!"
Example: original speaker of quote
Joe
Example: canonical speaker of quote
Joe Smith
0:47.68
[victoria@victoria stanford-corenlp-full-2018-10-05]$
==============================================================================
PYTHON
==============================================================================
[victoria@victoria ~]$ p37
[Python 3.7 venv (source ~/venv/py3.7/bin/activate)]
(py3.7) [victoria@victoria ~]$ env | grep -i virtual
VIRTUAL_ENV=/home/victoria/venv/py3.7
(py3.7) [victoria@victoria ~]$ python --version
Python 3.7.4
(py3.7) [victoria@victoria ~]$ date
Fri 03 Jan 2020 02:49:42 PM PST
(py3.7) [victoria@victoria ~]$ python
Python 3.7.4 (default, Nov 20 2019, 11:36:53)
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import stanfordnlp
>>> from stanfordnlp.server import CoreNLPClient
>>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G')
>>> text = 'Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein.'
>>> ann = client.annotate(text)
Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-25ebbde9a1ad4065.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref
>>> sentence = ann.sentence[0]
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'str' object has no attribute 'sentence'
>>> client.server.terminate()
>>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G')
>>> ann = client.annotate(text)
Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-9043ef7d7a744b78.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref
>>> sentence = ann.sentence[0]
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'str' object has no attribute 'sentence'
>>> [Ctrl-D]
now exiting EditableBufferInteractiveConsole...
(py3.7) [victoria@victoria ~]$ psgrep -l corenlp
UID PID PPID C STIME TTY TIME CMD
victoria 321300 296292 0 Jan02 pts/2 00:02:09 java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-55bcad5a4c00431e.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref
(py3.7) [victoria@victoria ~]$ pgrep -l -f corenlp
321300 java
(py3.7) [victoria@victoria ~]$ kill -9 321300
(py3.7) [victoria@victoria ~]$ python
Python 3.7.4 (default, Nov 20 2019, 11:36:53)
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import stanfordnlp
>>> from stanfordnlp.server import CoreNLPClient
>>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G')
>>> text = 'Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein.'
>>> ann = client.annotate(text)
Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-ba065446f2fa404d.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref
>>> ## [took ~20" or so to start]
>>> sentence = ann.sentence[0]
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'str' object has no attribute 'sentence'
>>> ## deleted `output_format='text'` argument:
>>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', timeout=30000, memory='16G')
>>> ann = client.annotate(text)
Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-423b84293ffe47f3.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref
>>> sentence = ann.sentence[0]
>>> print(sentence)
token {
word: "Breast"
pos: "NN"
value: "Breast"
before: ""
after: " "
originalText: "Breast"
ner: "CAUSE_OF_DEATH"
lemma: "breast"
beginChar: 0
endChar: 6
utterance: 0
speaker: "PER0"
beginIndex: 0
endIndex: 1
tokenBeginIndex: 0
tokenEndIndex: 1
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "CAUSE_OF_DEATH"
corefMentionIndex: 0
corefMentionIndex: 3
entityMentionIndex: 0
}
token {
word: "cancer"
pos: "NN"
value: "cancer"
before: " "
after: " "
originalText: "cancer"
ner: "CAUSE_OF_DEATH"
lemma: "cancer"
beginChar: 7
endChar: 13
utterance: 0
speaker: "PER0"
beginIndex: 1
endIndex: 2
tokenBeginIndex: 1
tokenEndIndex: 2
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "CAUSE_OF_DEATH"
corefMentionIndex: 0
corefMentionIndex: 3
entityMentionIndex: 0
}
token {
word: "susceptibility"
pos: "NN"
value: "susceptibility"
before: " "
after: " "
originalText: "susceptibility"
ner: "O"
lemma: "susceptibility"
beginChar: 14
endChar: 28
utterance: 0
speaker: "PER0"
beginIndex: 2
endIndex: 3
tokenBeginIndex: 2
tokenEndIndex: 3
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "O"
corefMentionIndex: 3
}
token {
word: "gene"
pos: "NN"
value: "gene"
before: " "
after: " "
originalText: "gene"
ner: "O"
lemma: "gene"
beginChar: 29
endChar: 33
utterance: 0
speaker: "PER0"
beginIndex: 3
endIndex: 4
tokenBeginIndex: 3
tokenEndIndex: 4
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "O"
corefMentionIndex: 3
}
token {
word: "1"
pos: "CD"
value: "1"
before: " "
after: " "
originalText: "1"
ner: "NUMBER"
normalizedNER: "1.0"
lemma: "1"
beginChar: 34
endChar: 35
utterance: 0
speaker: "PER0"
beginIndex: 4
endIndex: 5
tokenBeginIndex: 4
tokenEndIndex: 5
hasXmlContext: false
isNewline: false
coarseNER: "NUMBER"
fineGrainedNER: "NUMBER"
corefMentionIndex: 1
corefMentionIndex: 3
entityMentionIndex: 1
}
token {
word: "-LRB-"
pos: "-LRB-"
value: "-LRB-"
before: " "
after: ""
originalText: "("
ner: "O"
lemma: "-lrb-"
beginChar: 36
endChar: 37
utterance: 0
speaker: "PER0"
beginIndex: 5
endIndex: 6
tokenBeginIndex: 5
tokenEndIndex: 6
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "O"
corefMentionIndex: 3
}
token {
word: "BRCA1"
pos: "NN"
value: "BRCA1"
before: ""
after: ""
originalText: "BRCA1"
ner: "O"
lemma: "brca1"
beginChar: 37
endChar: 42
utterance: 0
speaker: "PER0"
beginIndex: 6
endIndex: 7
tokenBeginIndex: 6
tokenEndIndex: 7
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "O"
corefMentionIndex: 3
corefMentionIndex: 4
}
token {
word: "-RRB-"
pos: "-RRB-"
value: "-RRB-"
before: ""
after: " "
originalText: ")"
ner: "O"
lemma: "-rrb-"
beginChar: 42
endChar: 43
utterance: 0
speaker: "PER0"
beginIndex: 7
endIndex: 8
tokenBeginIndex: 7
tokenEndIndex: 8
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "O"
corefMentionIndex: 3
}
token {
word: "is"
pos: "VBZ"
value: "is"
before: " "
after: " "
originalText: "is"
ner: "O"
lemma: "be"
beginChar: 44
endChar: 46
utterance: 0
speaker: "PER0"
beginIndex: 8
endIndex: 9
tokenBeginIndex: 8
tokenEndIndex: 9
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "O"
}
token {
word: "a"
pos: "DT"
value: "a"
before: " "
after: " "
originalText: "a"
ner: "O"
lemma: "a"
beginChar: 47
endChar: 48
utterance: 0
speaker: "PER0"
beginIndex: 9
endIndex: 10
tokenBeginIndex: 9
tokenEndIndex: 10
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "O"
corefMentionIndex: 5
}
token {
word: "tumor"
pos: "NN"
value: "tumor"
before: " "
after: " "
originalText: "tumor"
ner: "CAUSE_OF_DEATH"
lemma: "tumor"
beginChar: 49
endChar: 54
utterance: 0
speaker: "PER0"
beginIndex: 10
endIndex: 11
tokenBeginIndex: 10
tokenEndIndex: 11
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "CAUSE_OF_DEATH"
corefMentionIndex: 2
corefMentionIndex: 5
entityMentionIndex: 2
}
token {
word: "suppressor"
pos: "NN"
value: "suppressor"
before: " "
after: " "
originalText: "suppressor"
ner: "O"
lemma: "suppressor"
beginChar: 55
endChar: 65
utterance: 0
speaker: "PER0"
beginIndex: 11
endIndex: 12
tokenBeginIndex: 11
tokenEndIndex: 12
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "O"
corefMentionIndex: 5
}
token {
word: "protein"
pos: "NN"
value: "protein"
before: " "
after: ""
originalText: "protein"
ner: "O"
lemma: "protein"
beginChar: 66
endChar: 73
utterance: 0
speaker: "PER0"
beginIndex: 12
endIndex: 13
tokenBeginIndex: 12
tokenEndIndex: 13
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "O"
corefMentionIndex: 5
}
token {
word: "."
pos: "."
value: "."
before: ""
after: ""
originalText: "."
ner: "O"
lemma: "."
beginChar: 73
endChar: 74
utterance: 0
speaker: "PER0"
beginIndex: 13
endIndex: 14
tokenBeginIndex: 13
tokenEndIndex: 14
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "O"
}
tokenOffsetBegin: 0
tokenOffsetEnd: 14
sentenceIndex: 0
characterOffsetBegin: 0
characterOffsetEnd: 74
parseTree {
child {
child {
child {
child {
child {
child {
value: "Breast"
}
value: "NN"
score: -13.085748672485352
}
child {
child {
value: "cancer"
}
value: "NN"
score: -7.361298084259033
}
child {
child {
value: "susceptibility"
}
value: "NN"
score: -12.832098960876465
}
value: "NP"
score: -39.81563186645508
}
child {
child {
child {
value: "gene"
}
value: "NN"
score: -7.761730194091797
}
child {
child {
value: "1"
}
value: "CD"
score: -4.178682804107666
}
value: "NP"
score: -19.19379997253418
}
value: "NP"
score: -62.36488342285156
}
child {
child {
child {
value: "-LRB-"
}
value: "-LRB-"
score: -0.06566064804792404
}
child {
child {
child {
value: "BRCA1"
}
value: "NN"
score: -13.365689277648926
}
value: "NP"
score: -16.57198715209961
}
child {
child {
value: "-RRB-"
}
value: "-RRB-"
score: -0.06669137626886368
}
value: "PRN"
score: -17.963926315307617
}
value: "NP"
score: -86.23522186279297
}
child {
child {
child {
value: "is"
}
value: "VBZ"
score: -0.14657023549079895
}
child {
child {
child {
value: "a"
}
value: "DT"
score: -1.4235451221466064
}
child {
child {
value: "tumor"
}
value: "NN"
score: -9.49818229675293
}
child {
child {
value: "suppressor"
}
value: "NN"
score: -10.207574844360352
}
child {
child {
value: "protein"
}
value: "NN"
score: -9.312461853027344
}
value: "NP"
score: -36.75123977661133
}
value: "VP"
score: -42.08717727661133
}
child {
child {
value: "."
}
value: "."
score: -0.003481106134131551
}
value: "S"
score: -131.2326202392578
}
value: "ROOT"
score: -131.38381958007812
}
basicDependencies {
node {
sentenceIndex: 0
index: 1
}
node {
sentenceIndex: 0
index: 2
}
node {
sentenceIndex: 0
index: 3
}
node {
sentenceIndex: 0
index: 4
}
node {
sentenceIndex: 0
index: 5
}
node {
sentenceIndex: 0
index: 6
}
node {
sentenceIndex: 0
index: 7
}
node {
sentenceIndex: 0
index: 8
}
node {
sentenceIndex: 0
index: 9
}
node {
sentenceIndex: 0
index: 10
}
node {
sentenceIndex: 0
index: 11
}
node {
sentenceIndex: 0
index: 12
}
node {
sentenceIndex: 0
index: 13
}
node {
sentenceIndex: 0
index: 14
}
edge {
source: 4
target: 1
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 2
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 3
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 5
dep: "nummod"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 7
dep: "appos"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 7
target: 6
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 7
target: 8
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 4
dep: "nsubj"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 9
dep: "cop"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 10
dep: "det"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 11
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 12
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 14
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
root: 13
}
collapsedDependencies {
node {
sentenceIndex: 0
index: 1
}
node {
sentenceIndex: 0
index: 2
}
node {
sentenceIndex: 0
index: 3
}
node {
sentenceIndex: 0
index: 4
}
node {
sentenceIndex: 0
index: 5
}
node {
sentenceIndex: 0
index: 6
}
node {
sentenceIndex: 0
index: 7
}
node {
sentenceIndex: 0
index: 8
}
node {
sentenceIndex: 0
index: 9
}
node {
sentenceIndex: 0
index: 10
}
node {
sentenceIndex: 0
index: 11
}
node {
sentenceIndex: 0
index: 12
}
node {
sentenceIndex: 0
index: 13
}
node {
sentenceIndex: 0
index: 14
}
edge {
source: 4
target: 1
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 2
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 3
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 5
dep: "nummod"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 7
dep: "appos"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 7
target: 6
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 7
target: 8
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 4
dep: "nsubj"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 9
dep: "cop"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 10
dep: "det"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 11
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 12
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 14
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
root: 13
}
collapsedCCProcessedDependencies {
node {
sentenceIndex: 0
index: 1
}
node {
sentenceIndex: 0
index: 2
}
node {
sentenceIndex: 0
index: 3
}
node {
sentenceIndex: 0
index: 4
}
node {
sentenceIndex: 0
index: 5
}
node {
sentenceIndex: 0
index: 6
}
node {
sentenceIndex: 0
index: 7
}
node {
sentenceIndex: 0
index: 8
}
node {
sentenceIndex: 0
index: 9
}
node {
sentenceIndex: 0
index: 10
}
node {
sentenceIndex: 0
index: 11
}
node {
sentenceIndex: 0
index: 12
}
node {
sentenceIndex: 0
index: 13
}
node {
sentenceIndex: 0
index: 14
}
edge {
source: 4
target: 1
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 2
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 3
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 5
dep: "nummod"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 7
dep: "appos"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 7
target: 6
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 7
target: 8
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 4
dep: "nsubj"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 9
dep: "cop"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 10
dep: "det"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 11
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 12
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 14
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
root: 13
}
paragraph: 1
enhancedDependencies {
node {
sentenceIndex: 0
index: 1
}
node {
sentenceIndex: 0
index: 2
}
node {
sentenceIndex: 0
index: 3
}
node {
sentenceIndex: 0
index: 4
}
node {
sentenceIndex: 0
index: 5
}
node {
sentenceIndex: 0
index: 6
}
node {
sentenceIndex: 0
index: 7
}
node {
sentenceIndex: 0
index: 8
}
node {
sentenceIndex: 0
index: 9
}
node {
sentenceIndex: 0
index: 10
}
node {
sentenceIndex: 0
index: 11
}
node {
sentenceIndex: 0
index: 12
}
node {
sentenceIndex: 0
index: 13
}
node {
sentenceIndex: 0
index: 14
}
edge {
source: 4
target: 1
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 2
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 3
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 5
dep: "nummod"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 7
dep: "appos"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 7
target: 6
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 7
target: 8
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 4
dep: "nsubj"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 9
dep: "cop"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 10
dep: "det"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 11
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 12
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 14
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
root: 13
}
enhancedPlusPlusDependencies {
node {
sentenceIndex: 0
index: 1
}
node {
sentenceIndex: 0
index: 2
}
node {
sentenceIndex: 0
index: 3
}
node {
sentenceIndex: 0
index: 4
}
node {
sentenceIndex: 0
index: 5
}
node {
sentenceIndex: 0
index: 6
}
node {
sentenceIndex: 0
index: 7
}
node {
sentenceIndex: 0
index: 8
}
node {
sentenceIndex: 0
index: 9
}
node {
sentenceIndex: 0
index: 10
}
node {
sentenceIndex: 0
index: 11
}
node {
sentenceIndex: 0
index: 12
}
node {
sentenceIndex: 0
index: 13
}
node {
sentenceIndex: 0
index: 14
}
edge {
source: 4
target: 1
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 2
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 3
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 5
dep: "nummod"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 4
target: 7
dep: "appos"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 7
target: 6
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 7
target: 8
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 4
dep: "nsubj"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 9
dep: "cop"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 10
dep: "det"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 11
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 12
dep: "compound"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
edge {
source: 13
target: 14
dep: "punct"
isExtra: false
sourceCopy: 0
targetCopy: 0
language: UniversalEnglish
}
root: 13
}
binarizedParseTree {
child {
child {
child {
child {
child {
child {
value: "Breast"
}
value: "NN"
}
child {
child {
child {
value: "cancer"
}
value: "NN"
}
child {
child {
value: "susceptibility"
}
value: "NN"
}
value: "@NP"
}
value: "NP"
}
child {
child {
child {
value: "gene"
}
value: "NN"
}
child {
child {
value: "1"
}
value: "CD"
}
value: "NP"
}
value: "NP"
}
child {
child {
child {
value: "-LRB-"
}
value: "-LRB-"
}
child {
child {
child {
child {
value: "BRCA1"
}
value: "NN"
}
value: "NP"
}
child {
child {
value: "-RRB-"
}
value: "-RRB-"
}
value: "@PRN"
}
value: "PRN"
}
value: "NP"
}
child {
child {
child {
child {
value: "is"
}
value: "VBZ"
}
child {
child {
child {
value: "a"
}
value: "DT"
}
child {
child {
child {
value: "tumor"
}
value: "NN"
}
child {
child {
child {
value: "suppressor"
}
value: "NN"
}
child {
child {
value: "protein"
}
value: "NN"
}
value: "@NP"
}
value: "@NP"
}
value: "NP"
}
value: "VP"
}
child {
child {
value: "."
}
value: "."
}
value: "@S"
}
value: "S"
}
value: "ROOT"
}
hasRelationAnnotations: false
hasNumerizedTokensAnnotation: true
mentions {
sentenceIndex: 0
tokenStartInSentenceInclusive: 0
tokenEndInSentenceExclusive: 2
ner: "CAUSE_OF_DEATH"
entityType: "CAUSE_OF_DEATH"
entityMentionIndex: 0
canonicalEntityMentionIndex: 0
entityMentionText: "Breast cancer"
}
mentions {
sentenceIndex: 0
tokenStartInSentenceInclusive: 4
tokenEndInSentenceExclusive: 5
ner: "NUMBER"
normalizedNER: "1.0"
entityType: "NUMBER"
entityMentionIndex: 1
canonicalEntityMentionIndex: 1
entityMentionText: "1"
}
mentions {
sentenceIndex: 0
tokenStartInSentenceInclusive: 10
tokenEndInSentenceExclusive: 11
ner: "CAUSE_OF_DEATH"
entityType: "CAUSE_OF_DEATH"
entityMentionIndex: 2
canonicalEntityMentionIndex: 2
entityMentionText: "tumor"
}
mentionsForCoref {
mentionID: 0
mentionType: "NOMINAL"
number: "SINGULAR"
gender: "NEUTRAL"
animacy: "INANIMATE"
person: "UNKNOWN"
startIndex: 0
endIndex: 2
headIndex: 1
headString: "cancer"
nerString: "O"
originalRef: 4294967295
goldCorefClusterID: -1
corefClusterID: 0
mentionNum: 1
sentNum: 0
utter: 0
paragraph: 1
isSubject: false
isDirectObject: false
isIndirectObject: false
isPrepositionObject: false
hasTwin: false
generic: false
isSingleton: false
hasBasicDependency: true
hasEnhancedDepenedncy: true
hasContextParseTree: true
headIndexedWord {
sentenceNum: 4294967295
tokenIndex: 1
copyCount: 0
}
dependingVerb {
sentenceNum: 4294967295
tokenIndex: 4294967295
}
headWord {
sentenceNum: 4294967295
tokenIndex: 1
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 0
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 1
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 2
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 3
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 4
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 5
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 6
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 7
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 8
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 9
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 10
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 11
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 12
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 13
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 0
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 1
}
}
mentionsForCoref {
mentionID: 1
mentionType: "PROPER"
number: "SINGULAR"
gender: "UNKNOWN"
animacy: "INANIMATE"
person: "UNKNOWN"
startIndex: 4
endIndex: 5
headIndex: 4
headString: "1"
nerString: "NUMBER"
originalRef: 4294967295
goldCorefClusterID: -1
corefClusterID: 1
mentionNum: 2
sentNum: 0
utter: 0
paragraph: 1
isSubject: false
isDirectObject: false
isIndirectObject: false
isPrepositionObject: false
hasTwin: false
generic: false
isSingleton: false
hasBasicDependency: true
hasEnhancedDepenedncy: true
hasContextParseTree: true
headIndexedWord {
sentenceNum: 4294967295
tokenIndex: 4
copyCount: 0
}
dependingVerb {
sentenceNum: 4294967295
tokenIndex: 4294967295
}
headWord {
sentenceNum: 4294967295
tokenIndex: 4
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 0
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 1
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 2
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 3
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 4
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 5
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 6
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 7
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 8
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 9
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 10
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 11
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 12
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 13
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 4
}
}
mentionsForCoref {
mentionID: 2
mentionType: "NOMINAL"
number: "SINGULAR"
gender: "NEUTRAL"
animacy: "INANIMATE"
person: "UNKNOWN"
startIndex: 10
endIndex: 11
headIndex: 10
headString: "tumor"
nerString: "O"
originalRef: 4294967295
goldCorefClusterID: -1
corefClusterID: 2
mentionNum: 5
sentNum: 0
utter: 0
paragraph: 1
isSubject: false
isDirectObject: false
isIndirectObject: false
isPrepositionObject: false
hasTwin: false
generic: false
isSingleton: false
hasBasicDependency: true
hasEnhancedDepenedncy: true
hasContextParseTree: true
headIndexedWord {
sentenceNum: 4294967295
tokenIndex: 10
copyCount: 0
}
dependingVerb {
sentenceNum: 4294967295
tokenIndex: 4294967295
}
headWord {
sentenceNum: 4294967295
tokenIndex: 10
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 0
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 1
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 2
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 3
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 4
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 5
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 6
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 7
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 8
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 9
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 10
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 11
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 12
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 13
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 10
}
}
mentionsForCoref {
mentionID: 3
mentionType: "NOMINAL"
number: "SINGULAR"
gender: "UNKNOWN"
animacy: "INANIMATE"
person: "UNKNOWN"
startIndex: 0
endIndex: 8
headIndex: 3
headString: "gene"
nerString: "O"
originalRef: 4294967295
goldCorefClusterID: -1
corefClusterID: 3
mentionNum: 0
sentNum: 0
utter: 0
paragraph: 1
isSubject: false
isDirectObject: false
isIndirectObject: false
isPrepositionObject: false
hasTwin: false
generic: false
isSingleton: false
hasBasicDependency: true
hasEnhancedDepenedncy: true
hasContextParseTree: true
headIndexedWord {
sentenceNum: 4294967295
tokenIndex: 3
copyCount: 0
}
dependingVerb {
sentenceNum: 4294967295
tokenIndex: 4294967295
}
headWord {
sentenceNum: 4294967295
tokenIndex: 3
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 0
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 1
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 2
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 3
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 4
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 5
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 6
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 7
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 8
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 9
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 10
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 11
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 12
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 13
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 0
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 1
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 2
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 3
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 4
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 5
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 6
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 7
}
}
mentionsForCoref {
mentionID: 4
mentionType: "NOMINAL"
number: "SINGULAR"
gender: "UNKNOWN"
animacy: "UNKNOWN"
person: "UNKNOWN"
startIndex: 6
endIndex: 7
headIndex: 6
headString: "brca1"
nerString: "O"
originalRef: 4294967295
goldCorefClusterID: -1
corefClusterID: 4
mentionNum: 3
sentNum: 0
utter: 0
paragraph: 1
isSubject: false
isDirectObject: false
isIndirectObject: false
isPrepositionObject: false
hasTwin: false
generic: false
isSingleton: false
hasBasicDependency: true
hasEnhancedDepenedncy: true
hasContextParseTree: true
headIndexedWord {
sentenceNum: 4294967295
tokenIndex: 6
copyCount: 0
}
dependingVerb {
sentenceNum: 4294967295
tokenIndex: 4294967295
}
headWord {
sentenceNum: 4294967295
tokenIndex: 6
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 0
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 1
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 2
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 3
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 4
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 5
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 6
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 7
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 8
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 9
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 10
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 11
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 12
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 13
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 6
}
appositions: 3
}
mentionsForCoref {
mentionID: 5
mentionType: "NOMINAL"
number: "SINGULAR"
gender: "NEUTRAL"
animacy: "INANIMATE"
person: "UNKNOWN"
startIndex: 9
endIndex: 13
headIndex: 12
headString: "protein"
nerString: "O"
originalRef: 4294967295
goldCorefClusterID: -1
corefClusterID: 5
mentionNum: 4
sentNum: 0
utter: 0
paragraph: 1
isSubject: false
isDirectObject: false
isIndirectObject: false
isPrepositionObject: false
hasTwin: false
generic: false
isSingleton: false
hasBasicDependency: true
hasEnhancedDepenedncy: true
hasContextParseTree: true
headIndexedWord {
sentenceNum: 4294967295
tokenIndex: 12
copyCount: 0
}
dependingVerb {
sentenceNum: 4294967295
tokenIndex: 4294967295
}
headWord {
sentenceNum: 4294967295
tokenIndex: 12
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 0
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 1
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 2
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 3
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 4
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 5
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 6
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 7
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 8
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 9
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 10
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 11
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 12
}
sentenceWords {
sentenceNum: 4294967295
tokenIndex: 13
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 9
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 10
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 11
}
originalSpan {
sentenceNum: 4294967295
tokenIndex: 12
}
predicateNominatives: 3
}
hasCorefMentionsAnnotation: true
hasEntityMentionsAnnotation: true
>>> ## ALL OF THAT (ABOVE) WAS FOR ONE SENTENCE! :-O
>>> ## SAME OUTPUT:
>>> print(ann)
text: "Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein."
sentence {
token {
word: "Breast"
pos: "NN"
value: "Breast"
before: ""
after: " "
[ ... snip ... ]
>>> ## **MUCH** MORE COMPACT:
>>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G')
>>> ann = client.annotate(text)
Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-163b9ecb6a9947a8.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref
>>> print(ann)
Sentence #1 (14 tokens):
Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein.
Tokens:
[Text=Breast CharacterOffsetBegin=0 CharacterOffsetEnd=6 PartOfSpeech=NN Lemma=breast NamedEntityTag=CAUSE_OF_DEATH]
[Text=cancer CharacterOffsetBegin=7 CharacterOffsetEnd=13 PartOfSpeech=NN Lemma=cancer NamedEntityTag=CAUSE_OF_DEATH]
[Text=susceptibility CharacterOffsetBegin=14 CharacterOffsetEnd=28 PartOfSpeech=NN Lemma=susceptibility NamedEntityTag=O]
[Text=gene CharacterOffsetBegin=29 CharacterOffsetEnd=33 PartOfSpeech=NN Lemma=gene NamedEntityTag=O]
[Text=1 CharacterOffsetBegin=34 CharacterOffsetEnd=35 PartOfSpeech=CD Lemma=1 NamedEntityTag=NUMBER NormalizedNamedEntityTag=1.0]
[Text=-LRB- CharacterOffsetBegin=36 CharacterOffsetEnd=37 PartOfSpeech=-LRB- Lemma=-lrb- NamedEntityTag=O]
[Text=BRCA1 CharacterOffsetBegin=37 CharacterOffsetEnd=42 PartOfSpeech=NN Lemma=brca1 NamedEntityTag=O]
[Text=-RRB- CharacterOffsetBegin=42 CharacterOffsetEnd=43 PartOfSpeech=-RRB- Lemma=-rrb- NamedEntityTag=O]
[Text=is CharacterOffsetBegin=44 CharacterOffsetEnd=46 PartOfSpeech=VBZ Lemma=be NamedEntityTag=O]
[Text=a CharacterOffsetBegin=47 CharacterOffsetEnd=48 PartOfSpeech=DT Lemma=a NamedEntityTag=O]
[Text=tumor CharacterOffsetBegin=49 CharacterOffsetEnd=54 PartOfSpeech=NN Lemma=tumor NamedEntityTag=CAUSE_OF_DEATH]
[Text=suppressor CharacterOffsetBegin=55 CharacterOffsetEnd=65 PartOfSpeech=NN Lemma=suppressor NamedEntityTag=O]
[Text=protein CharacterOffsetBegin=66 CharacterOffsetEnd=73 PartOfSpeech=NN Lemma=protein NamedEntityTag=O]
[Text=. CharacterOffsetBegin=73 CharacterOffsetEnd=74 PartOfSpeech=. Lemma=. NamedEntityTag=O]
Constituency parse:
(ROOT
(S
(NP
(NP
(NP (NN Breast) (NN cancer) (NN susceptibility))
(NP (NN gene) (CD 1)))
(PRN (-LRB- -LRB-)
(NP (NN BRCA1))
(-RRB- -RRB-)))
(VP (VBZ is)
(NP (DT a) (NN tumor) (NN suppressor) (NN protein)))
(. .)))
Dependency Parse (enhanced plus plus dependencies):
root(ROOT-0, protein-13)
compound(gene-4, Breast-1)
compound(gene-4, cancer-2)
compound(gene-4, susceptibility-3)
nsubj(protein-13, gene-4)
nummod(gene-4, 1-5)
punct(BRCA1-7, -LRB--6)
appos(gene-4, BRCA1-7)
punct(BRCA1-7, -RRB--8)
cop(protein-13, is-9)
det(protein-13, a-10)
compound(protein-13, tumor-11)
compound(protein-13, suppressor-12)
punct(protein-13, .-14)
Extracted the following NER entity mentions:
Breast cancer CAUSE_OF_DEATH
1 NUMBER
tumor CAUSE_OF_DEATH
# ============================================================================
>>> import stanfordnlp
>>> stanfordnlp.download('en')
Using the default treebank "en_ewt" for language "en".
Would you like to download the models for: en_ewt now? (Y/n) Y
Default download directory: /home/victoria/stanfordnlp_resources
Hit enter to continue or type an alternate directory.
Downloading models for: en_ewt
Download location: /home/victoria/stanfordnlp_resources/en_ewt_models.zip
100%|█████████████████████████████████████| 235M/235M [01:15<00:00, 3.09MB/s]
Download complete. Models saved to: /home/victoria/stanfordnlp_resources/en_ewt_models.zip
Extracting models file for: en_ewt
Cleaning up...Done.
>>> nlp = stanfordnlp.Pipeline()
Use device: cpu
---
Loading: tokenize
With settings:
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: pos
With settings:
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: lemma
With settings:
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
[Running seq2seq lemmatizer with edit classifier]
---
Loading: depparse
With settings:
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_parser.pt', 'pretrain_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Done loading processors!
---
>>> text = 'Bananas are an excellent source of potassium.'
>>> text_nlp = nlp(text)
>>> text_nlp.sentences[0].print_dependencies()
('Bananas', '5', 'nsubj')
('are', '5', 'cop')
('an', '5', 'det')
('excellent', '5', 'amod')
('source', '0', 'root')
('of', '7', 'case')
('potassium', '5', 'nmod')
('.', '5', 'punct')
# ============================================================================
>>> import stanfordnlp
>>> from spacy_stanfordnlp import StanfordNLPLanguage
>>> snlp = stanfordnlp.Pipeline(lang="en")
Use device: cpu
---
Loading: tokenize
With settings:
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: pos
With settings:
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: lemma
With settings:
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
[Running seq2seq lemmatizer with edit classifier]
---
Loading: depparse
With settings:
{'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_parser.pt', 'pretrain_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Done loading processors!
---
>>> nlp = StanfordNLPLanguage(snlp)
>>> doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
>>> for token in doc:
... print(token.text, token.lemma_, token.pos_, token.dep_)
...
Barack Barack PROPN nsubj:pass
Obama Obama PROPN flat
was be AUX aux:pass
born bear VERB root
in in ADP case
Hawaii Hawaii PROPN obl
. . PUNCT punct
He he PRON nsubj:pass
was be AUX aux:pass
elected elect VERB root
president president PROPN xcomp
in in ADP case
2008 2008 NUM obl
. . PUNCT punct
>>>
==============================================================================
==============================================================================
END OF FILE
==============================================================================
==============================================================================
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment