Skip to content

Instantly share code, notes, and snippets.

@kirankulkarni
Created August 18, 2020 13:41
Show Gist options
  • Save kirankulkarni/f0fe63e6479aba8282de65d4b60a674c to your computer and use it in GitHub Desktop.
Save kirankulkarni/f0fe63e6479aba8282de65d4b60a674c to your computer and use it in GitHub Desktop.
;;; Implementation of Spacy example in Clojure using libpython-clj
;;; Ensure that clj-python/libpython-clj "1.45" is added in deps or project.clj
;;; Recommend to use Conda as your Python environment
(require '[libpython-clj.require :refer [require-python]])
(require '[libpython-clj.python :refer [py. py.. py.-] :as py])
;;; Import the Spacy python package
;;; Install Spacy using `pip install spacy` and model using `python -m spacy download en_core_web_sm`
(require-python 'spacy)
(def text "When Sebastian Thrun started working on self-driving cars at Google
in 2007, few people outside of the company took him seriously.
“I can tell you very senior CEOs of major American car companies would
shake my hand and turn away because I wasn’t worth talking to,” said Thrun,
in an interview with Recode earlier this week.")
;;; Loading pretrained statistical models for English
;;; Note that you can use Python package same as that of Clojure namespace
(def nlp (spacy/load "en_core_web_sm"))
;;; Let's see what it is initialized as
nlp
;;; Let's see what is the type of the nlp
(type nlp)
;;; `nlp` is your standard python object. Let's see what it contains
(py. nlp __dir__)
;;; Or use direct function
(py/dir nlp)
;;; Okay, you can also access the attributes of a Python object
(py.- nlp path)
;;; So now I can submit the text using `__call__` method.
(def nlp-doc (py. nlp __call__ text))
;;; Library takes care of this by attaching IFn interface to this object
(fn? nlp)
;;; This allows you to call the `nlp` object
(def nlp-doc (nlp text))
;;; Let's get noun_chunks, It returns a Java generator
(py.- nlp-doc noun_chunks)
(class (py.- nlp-doc noun_chunks))
;;; Python returns an iterable which is then converted into Java generator
(py/dir (py.- nlp-doc noun_chunks))
(instance? Iterable (py.- nlp-doc noun_chunks))
;;; Iterable collections of Java can be used as a sequence in Clojure. There's a
;;; clojure function that checks this and confirms this
(seqable? (py.- nlp-doc noun_chunks))
;;; So we can take element and check text of that element
(py.- (first (py.- nlp-doc noun_chunks))
text)
;;; Let's get those noun chunks
(map #(py.- % text)
(py.- nlp-doc noun_chunks))
;;; For verbs we need to get tokens from the doc itself, is that seqable?
;;; Let's check if it implements __iter__ if it does than we can check if library has
(seqable? nlp-doc)
;;; Let's try to see what it returns
(type (first nlp-doc))
;;; Awesome! we know how to work with pyobject. Now let's get those verbs
(map #(py.- % lemma_)
(filter #(= "VERB" (py.- % pos_))
nlp-doc))
;;; I am sure we all can write the named entity extraction
(map (fn [ent]
[(py.- ent text)
(py.- ent label_)])
(py.- nlp-doc ents))
(map (juxt #(py.- % text)
#(py.- % label_))
(py.- nlp-doc ents))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment