This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TextRank源码笔记 | |
更多 | |
TextRank代码版本是(https://github.com/davidadamojr/TextRank.git),基于2004年一篇用graph model做文本中单词、句子排序的paper。 | |
TextRank主要功能是提取关键短语(keyphrases extraction)和文本摘要(summarization)。 | |
1、keyphrases extraction | |
(1)将文本分词,并用词性进行过滤。这里使用nltk pos_tag留下了词表中的NN,NNP(名词),JJ(形容词)作为候选词。 |