Skip to content

Instantly share code, notes, and snippets.

@EntilZha
Last active April 6, 2016 16:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save EntilZha/62c0fdf265279f489106c6aaf7dc202e to your computer and use it in GitHub Desktop.
Save EntilZha/62c0fdf265279f489106c6aaf7dc202e to your computer and use it in GitHub Desktop.
In addition to The Black Hermit and Wizard of the Crow, this author is better known for a novel in which Mugo betrays the revolutionary Kihika, as well as another in which Munira burns down Wanjas brothel.
Annotating the data with gazetteers
Annotating the data with context-aggregation features (if necessary)
Done Annotating the data with expressive features...
Annontating data with the models tagger, the inference algoritm is: GREEDY
Extracting features for level 2 inference
Done - Extracting features for level 2 inference
Done Annontating data with the models tagger, the inference algoritm is: GREEDY
Inference time: 32 milliseconds
Constructing a problem for the following text:
For 10 points, name this author of Sketches by Boz, who wrote about a clock-stopping jilted spinster who mindfucks Estelle and Pip in Great Expectations.
Annotating mention view..
2 milliseconds elapsed on constructing the TF-IDF representation of the input text...101469-4.txt
Getting the wikifiable mentions candidates
Getting the Wikifiable entitites
Getting the text annotation
Adding NER candidates for 101469-4.txt
Adding SHALLOW_PARSE and subChunk candidates for 101469-4.txt
Done - Getting the text annotation
Adding manually specified mentions
Regex matching...
Matched regex entity Estelle and Pip in Great Expectations[115-152]{21-27}
Matched regex entity Pip in Great Expectations[127-152]{23-27}
Matched regex entity Great Expectations[134-152]{25-27}
Finished adding regex large chunk matching
Extracting the candidate disambiguations for the mentions
Done constructing the Wikifiable entities
---- almost there....
107 milliseconds elapsed on constructing potentially wikifiable entitites in the input text...101469-4.txt
Done constructing the problem; running the inference
Inference on the document -- 101469-4.txt
1 milliseconds elapsed extracting features for the level: FeatureExtractorTitlesMatch
1 milliseconds elapsed ranking the candidates at level...FeatureExtractorTitlesMatch
10 milliseconds elapsed extracting features for the level: FeatureExtractorLexical
1 milliseconds elapsed ranking the candidates at level...FeatureExtractorLexical
3 milliseconds elapsed extracting features for the level: FeatureExtractorCoherence
0 milliseconds elapsed ranking the candidates at level...FeatureExtractorCoherence
Relational inference took 0ms
Discarded 0 hypothesis
Annotation at test time--76 milliseconds elapsed to annotate the document 101469-4.txt
Done running the inference
Saving the simplest-form no nested entities Wikification output in html format
Saving the full annotation in XML
Saving the NER output
Processing the file : data/input/101471-0.txt
Constructing the problem...
Annotating the data with expressive features...
Brown clusters OOV statistics:
Data statistics:
- Total tokens with repetitions =21
- Total unique tokens =20
- Total unique tokens ignore case =20
* OOV statistics for the resource: data/NER_Data//BrownHierarchicalWordClusters/brown-english-wikitext.case-intact.txt-c1000-freq10-v3.txt(covers 1288301 unique tokens)
- Total OOV tokens, Case Sensitive =1
- OOV tokens, no repetitions, Case Sensitive =1
- Total OOV tokens even after lowercasing =1
- OOV tokens even after lowercasing, no repetition =1
* OOV statistics for the resource: data/NER_Data//BrownHierarchicalWordClusters/brownBllipClusters(covers 95262 unique tokens)
- Total OOV tokens, Case Sensitive =3
- OOV tokens, no repetitions, Case Sensitive =3
- Total OOV tokens even after lowercasing =3
- OOV tokens even after lowercasing, no repetition =3
* OOV statistics for the resource: data/NER_Data//BrownHierarchicalWordClusters/rcv1.clean.tokenized-c1000-p1.paths.txt(covers 85963 unique tokens)
- Total OOV tokens, Case Sensitive =3
- OOV tokens, no repetitions, Case Sensitive =3
- Total OOV tokens even after lowercasing =3
- OOV tokens even after lowercasing, no repetition =3
Annotating the data with gazetteers
Annotating the data with context-aggregation features (if necessary)
Done Annotating the data with expressive features...
Annontating data with the models tagger, the inference algoritm is: GREEDY
Extracting features for level 2 inference
Done - Extracting features for level 2 inference
Done Annontating data with the models tagger, the inference algoritm is: GREEDY
Inference time: 41 milliseconds
Constructing a problem for the following text:
In one of this author's novels, Cheges son loves the preacher Joshuas daughter and abandons the Kameno community.
Annotating mention view..
0 milliseconds elapsed on constructing the TF-IDF representation of the input text...101471-0.txt
Getting the wikifiable mentions candidates
Getting the Wikifiable entitites
Getting the text annotation
Adding NER candidates for 101471-0.txt
Adding SHALLOW_PARSE and subChunk candidates for 101471-0.txt
Done - Getting the text annotation
Adding manually specified mentions
Regex matching...
Finished adding regex large chunk matching
Extracting the candidate disambiguations for the mentions
Done constructing the Wikifiable entities
---- almost there....
54 milliseconds elapsed on constructing potentially wikifiable entitites in the input text...101471-0.txt
Done constructing the problem; running the inference
Inference on the document -- 101471-0.txt
0 milliseconds elapsed extracting features for the level: FeatureExtractorTitlesMatch
0 milliseconds elapsed ranking the candidates at level...FeatureExtractorTitlesMatch
7 milliseconds elapsed extracting features for the level: FeatureExtractorLexical
1 milliseconds elapsed ranking the candidates at level...FeatureExtractorLexical
1 milliseconds elapsed extracting features for the level: FeatureExtractorCoherence
0 milliseconds elapsed ranking the candidates at level...FeatureExtractorCoherence
Could not find WikiMatchData for title Kameno
Relational inference took 0ms
Discarded 0 hypothesis
Annotation at test time--67 milliseconds elapsed to annotate the document 101471-0.txt
Done running the inference
Saving the simplest-form no nested entities Wikification output in html format
Saving the full annotation in XML
Saving the NER output
Processing the file : data/input/101471-1.txt
Constructing the problem...
Annotating the data with expressive features...
Brown clusters OOV statistics:
Data statistics:
- Total tokens with repetitions =35
- Total unique tokens =29
- Total unique tokens ignore case =29
* OOV statistics for the resource: data/NER_Data//BrownHierarchicalWordClusters/brown-english-wikitext.case-intact.txt-c1000-freq10-v3.txt(covers 1288301 unique tokens)
- Total OOV tokens, Case Sensitive =3
- OOV tokens, no repetitions, Case Sensitive =3
- Total OOV tokens even after lowercasing =3
- OOV tokens even after lowercasing, no repetition =3
* OOV statistics for the resource: data/NER_Data//BrownHierarchicalWordClusters/brownBllipClusters(covers 95262 unique tokens)
- Total OOV tokens, Case Sensitive =4
- OOV tokens, no repetitions, Case Sensitive =4
- Total OOV tokens even after lowercasing =4
- OOV tokens even after lowercasing, no repetition =4
* OOV statistics for the resource: data/NER_Data//BrownHierarchicalWordClusters/rcv1.clean.tokenized-c1000-p1.paths.txt(covers 85963 unique tokens)
- Total OOV tokens, Case Sensitive =3
- OOV tokens, no repetitions, Case Sensitive =3
- Total OOV tokens even after lowercasing =3
- OOV tokens even after lowercasing, no repetition =3
Annotating the data with gazetteers
Annotating the data with context-aggregation features (if necessary)
Done Annotating the data with expressive features...
Annontating data with the models tagger, the inference algoritm is: GREEDY
Extracting features for level 2 inference
Done - Extracting features for level 2 inference
Done Annontating data with the models tagger, the inference algoritm is: GREEDY
Inference time: 61 milliseconds
Annotating mention view..
Constructing a problem for the following text:
Thoni commits suicide thanks to rejection by her husband Remi, the title character, in one of this authors plays, while in another, Gathonis pregnancy prevents her marriage to John Muhuuni.
0 milliseconds elapsed on constructing the TF-IDF representation of the input text...101471-1.txt
Getting the wikifiable mentions candidates
Getting the Wikifiable entitites
Getting the text annotation
Adding NER candidates for 101471-1.txt
Adding SHALLOW_PARSE and subChunk candidates for 101471-1.txt
Done - Getting the text annotation
Adding manually specified mentions
Regex matching...
Finished adding regex large chunk matching
Extracting the candidate disambiguations for the mentions
Done constructing the Wikifiable entities
---- almost there....
66 milliseconds elapsed on constructing potentially wikifiable entitites in the input text...101471-1.txt
Done constructing the problem; running the inference
Inference on the document -- 101471-1.txt
0 milliseconds elapsed extracting features for the level: FeatureExtractorTitlesMatch
0 milliseconds elapsed ranking the candidates at level...FeatureExtractorTitlesMatch
21 milliseconds elapsed extracting features for the level: FeatureExtractorLexical
1 milliseconds elapsed ranking the candidates at level...FeatureExtractorLexical
2 milliseconds elapsed extracting features for the level: FeatureExtractorCoherence
1 milliseconds elapsed ranking the candidates at level...FeatureExtractorCoherence
Could not find WikiMatchData for title Reiner_Thoni
[DEBUG][edu.illinois.cs.cogcomp.wikifier.inference.relation.NominalMentionAnalysis] - NOMCOREF DETECTED:Remi[57-61]{9-10} === the title character
[DEBUG][edu.illinois.cs.cogcomp.wikifier.inference.relation.NominalMentionAnalysis] - Force linking Remi[57-61]{9-10} to null
Relational inference took 0ms
Discarded 0 hypothesis
Annotation at test time--117 milliseconds elapsed to annotate the document 101471-1.txt
Done running the inference
Saving the simplest-form no nested entities Wikification output in html format
Saving the full annotation in XML
Saving the NER output
Processing the file : data/input/101471-2.txt
Constructing the problem...
Annotating the data with expressive features...
Brown clusters OOV statistics:
Data statistics:
- Total tokens with repetitions =23
- Total unique tokens =21
- Total unique tokens ignore case =21
* OOV statistics for the resource: data/NER_Data//BrownHierarchicalWordClusters/brown-english-wikitext.case-intact.txt-c1000-freq10-v3.txt(covers 1288301 unique tokens)
- Total OOV tokens, Case Sensitive =2
- OOV tokens, no repetitions, Case Sensitive =2
- Total OOV tokens even after lowercasing =2
- OOV tokens even after lowercasing, no repetition =2
* OOV statistics for the resource: data/NER_Data//BrownHierarchicalWordClusters/brownBllipClusters(covers 95262 unique tokens)
- Total OOV tokens, Case Sensitive =3
- OOV tokens, no repetitions, Case Sensitive =3
- Total OOV tokens even after lowercasing =3
- OOV tokens even after lowercasing, no repetition =3
* OOV statistics for the resource: data/NER_Data//BrownHierarchicalWordClusters/rcv1.clean.tokenized-c1000-p1.paths.txt(covers 85963 unique tokens)
- Total OOV tokens, Case Sensitive =4
- OOV tokens, no repetitions, Case Sensitive =4
- Total OOV tokens even after lowercasing =4
- OOV tokens even after lowercasing, no repetition =4
Annotating the data with gazetteers
Annotating the data with context-aggregation features (if necessary)
Done Annotating the data with expressive features...
Annontating data with the models tagger, the inference algoritm is: GREEDY
Extracting features for level 2 inference
Done - Extracting features for level 2 inference
Done Annontating data with the models tagger, the inference algoritm is: GREEDY
Inference time: 33 milliseconds
Annotating mention view..
Constructing a problem for the following text:
Another of his novels is set in the Republic of Aburiria, where Kamiti and Nyawira pretend to be the title sorcerer.
1 milliseconds elapsed on constructing the TF-IDF representation of the input text...101471-2.txt
Getting the wikifiable mentions candidates
Getting the Wikifiable entitites
Getting the text annotation
Adding NER candidates for 101471-2.txt
Adding SHALLOW_PARSE and subChunk candidates for 101471-2.txt
Done - Getting the text annotation
Adding manually specified mentions
Regex matching...
Matched regex entity Republic of Aburiria[36-56]{8-11}
Matched regex entity Kamiti and Nyawira[64-82]{13-16}
Finished adding regex large chunk matching
Extracting the candidate disambiguations for the mentions
Done constructing the Wikifiable entities
---- almost there....
38 milliseconds elapsed on constructing potentially wikifiable entitites in the input text...101471-2.txt
Done constructing the problem; running the inference
Inference on the document -- 101471-2.txt
0 milliseconds elapsed extracting features for the level: FeatureExtractorTitlesMatch
1 milliseconds elapsed ranking the candidates at level...FeatureExtractorTitlesMatch
9 milliseconds elapsed extracting features for the level: FeatureExtractorLexical
0 milliseconds elapsed ranking the candidates at level...FeatureExtractorLexical
1 milliseconds elapsed extracting features for the level: FeatureExtractorCoherence
0 milliseconds elapsed ranking the candidates at level...FeatureExtractorCoherence
Relational inference took 0ms
Discarded 0 hypothesis
Annotation at test time--93 milliseconds elapsed to annotate the document 101471-2.txt
Done running the inference
Saving the simplest-form no nested entities Wikification output in html format
Saving the full annotation in XML
Saving the NER output
Processing the file : data/input/101471-3.txt
Constructing the problem...
Annotating the data with expressive features...
Brown clusters OOV statistics:
Data statistics:
- Total tokens with repetitions =40
- Total unique tokens =35
- Total unique tokens ignore case =33
* OOV statistics for the resource: data/NER_Data//BrownHierarchicalWordClusters/brown-english-wikitext.case-intact.txt-c1000-freq10-v3.txt(covers 1288301 unique tokens)
- Total OOV tokens, Case Sensitive =2
- OOV tokens, no repetitions, Case Sensitive =2
- Total OOV tokens even after lowercasing =2
- OOV tokens even after lowercasing, no repetition =2
* OOV statistics for the resource: data/NER_Data//BrownHierarchicalWordClusters/brownBllipClusters(covers 95262 unique tokens)
- Total OOV tokens, Case Sensitive =5
- OOV tokens, no repetitions, Case Sensitive =5
- Total OOV tokens even after lowercasing =5
- OOV tokens even after lowercasing, no repetition =5
* OOV statistics for the resource: data/NER_Data//BrownHierarchicalWordClusters/rcv1.clean.tokenized-c1000-p1.paths.txt(covers 85963 unique tokens)
- Total OOV tokens, Case Sensitive =5
- OOV tokens, no repetitions, Case Sensitive =5
- Total OOV tokens even after lowercasing =5
- OOV tokens even after lowercasing, no repetition =5
Annotating the data with gazetteers
Annotating the data with context-aggregation features (if necessary)
Done Annotating the data with expressive features...
Annontating data with the models tagger, the inference algoritm is: GREEDY
Extracting features for level 2 inference
Done - Extracting features for level 2 inference
Done Annontating data with the models tagger, the inference algoritm is: GREEDY
Inference time: 52 milliseconds
Annotating mention view..
Constructing a problem for the following text:
In addition to The Black Hermit and Wizard of the Crow, this author is better known for a novel in which Mugo betrays the revolutionary Kihika, as well as another in which Munira burns down Wanjas ...
0 milliseconds elapsed on constructing the TF-IDF representation of the input text...101471-3.txt
Getting the wikifiable mentions candidates
Getting the Wikifiable entitites
Getting the text annotation
Adding NER candidates for 101471-3.txt
Adding SHALLOW_PARSE and subChunk candidates for 101471-3.txt
Done - Getting the text annotation
Adding manually specified mentions
Regex matching...
Matched regex entity The Black Hermit and Wizard of the Crow[15-54]{3-11}
Matched regex entity Black Hermit and Wizard of the Crow[19-54]{4-11}
Matched regex entity Wizard of the Crow[36-54]{7-11}
Finished adding regex large chunk matching
Extracting the candidate disambiguations for the mentions
Done constructing the Wikifiable entities
---- almost there....
94 milliseconds elapsed on constructing potentially wikifiable entitites in the input text...101471-3.txt
Done constructing the problem; running the inference
Inference on the document -- 101471-3.txt
0 milliseconds elapsed extracting features for the level: FeatureExtractorTitlesMatch
1 milliseconds elapsed ranking the candidates at level...FeatureExtractorTitlesMatch
53 milliseconds elapsed extracting features for the level: FeatureExtractorLexical
2 milliseconds elapsed ranking the candidates at level...FeatureExtractorLexical
8 milliseconds elapsed extracting features for the level: FeatureExtractorCoherence
4 milliseconds elapsed ranking the candidates at level...FeatureExtractorCoherence
Could not find WikiMatchData for title The_Black_Hermit
Exception in thread "main" java.lang.NullPointerException
at edu.illinois.cs.cogcomp.wikifier.models.Mention.getCorefRelationsTo(Mention.java:938)
at edu.illinois.cs.cogcomp.wikifier.inference.relation.NPAnalysis.forceLinkPERNameChunks(NPAnalysis.java:55)
at edu.illinois.cs.cogcomp.wikifier.inference.relation.NPAnalysis.infer(NPAnalysis.java:31)
at edu.illinois.cs.cogcomp.wikifier.models.LinkingProblem.resolveCoherenceRelations(LinkingProblem.java:329)
at edu.illinois.cs.cogcomp.wikifier.models.LinkingProblem.deepRelationalInference(LinkingProblem.java:285)
at edu.illinois.cs.cogcomp.wikifier.inference.InferenceEngine.annotate(InferenceEngine.java:125)
at edu.illinois.cs.cogcomp.wikifier.ReferenceAssistant.main(ReferenceAssistant.java:89)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment