Skip to content

Instantly share code, notes, and snippets.

@manmohan24nov
Created February 3, 2021 16:29
Show Gist options
  • Save manmohan24nov/d40b80427948474014c446781436b331 to your computer and use it in GitHub Desktop.
Save manmohan24nov/d40b80427948474014c446781436b331 to your computer and use it in GitHub Desktop.
>>> import yake
>>> kw_extractor = yake.KeywordExtractor()
>>> text = """spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main developers are Matthew Honnibal and Ines Montani, the founders of the software company Explosion."""
>>> language = "en"
>>> max_ngram_size = 3
>>> deduplication_threshold = 0.9
>>> numOfKeywords = 20
>>> custom_kw_extractor = yake.KeywordExtractor(lan=language, n=max_ngram_size, dedupLim=deduplication_threshold, top=numOfKeywords, features=None)
>>> keywords = custom_kw_extractor.extract_keywords(text)
>>> for kw in keywords:
... print(kw)
...
(0.009510576306853048, 'python and cython')
(0.01303668666027702, 'programming languages python')
(0.01903487908092154, 'natural language processing')
(0.03313747648989761, 'advanced natural language')
(0.04190317972882479, 'languages python')
(0.06072246587768757, 'language processing')
(0.07193228066565227, 'ines montani')
(0.0795196958830841, 'cython')
(0.08323210051373409, 'advanced natural')
(0.10196330452820937, 'honnibal and ines')
(0.10360488015988403, 'software company explosion')
(0.10387285964895944, 'natural language')
(0.10387285964895944, 'programming languages')
(0.11267991494597461, 'matthew honnibal')
(0.11847350870685432, 'python')
(0.1184757331079562, 'open-source software library')
(0.13756552350526924, 'company explosion')
(0.16863560177278256, 'spacy')
(0.16863560177278256, 'processing')
(0.16863560177278256, 'written')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment