Skip to content

Instantly share code, notes, and snippets.

@ericnormand
Last active October 7, 2021 01:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ericnormand/25e53eea786708b948d0c666c790580b to your computer and use it in GitHub Desktop.
Save ericnormand/25e53eea786708b948d0c666c790580b to your computer and use it in GitHub Desktop.
434 PurelyFunctional.tv Newsletter

Sentence searcher

Sometimes I want to find a word in a document, but I want the context for the word. Write a function that takes a document and a word and returns the sentences that contain that word. The sentences should be returned in the order they appear in the document.

Examples

(search "This is my document." "Hello") ;=> nil
(search "This is my document. It has two sentences." "sentences") ;=> ["It has two sentences."]
(search "I like to write. Do you like to write?" "Write") ;=> ["I like to write." "Do you like to write?"]

Sentences end with \., \!, or \?.

The search should be case insensitive.

Return nil if the word is not found.

Thanks to this site for the problem idea, where it is rated Hard in Python. The problem has been modified.

Please submit your solutions as comments on this gist.

To subscribe: https://purelyfunctional.tv/newsletter/

@dfuenzalida
Copy link

I had to work a little to return nil instead of an empty vector for the base case:

(def same-and-uppercase
  (juxt identity clojure.string/upper-case))

(defn search [sentences word]
  (let [WORD (clojure.string/upper-case word)]
    (when-let [result (->> (re-seq #".*?[\.|\?|!]" sentences)
                           (map (comp same-and-uppercase clojure.string/trim))
                           (filter #(clojure.string/includes? % WORD))
                           (map first)
                           seq)]
      (vec result))))

;; (search "This is my document." "Hello")
;; => nil

;; (search "This is my document. It has two sentences." "sentences")
;; => ["It has two sentences."]

;; (search "I like to write. Do you like to write?" "Write")
;; => ["I like to write." "Do you like to write?"]

@vpetruchok
Copy link

(defn search [text word]
  (let [sentence-endings #"\.|\!|\?"
        word             (clojure.string/lower-case word)]
    (->> (clojure.string/split text sentence-endings)
         (filter (fn [s] (-> s clojure.string/lower-case (.contains word))))
         (map clojure.string/trim)
         ((fn [result]
            (if (empty? result)
              nil
              (vec result)))))))

@heyarne
Copy link

heyarne commented Jul 14, 2021

This version respects word borders and returns the full sentences including punctuation:

(defn search [doc phrase]
  (->>
    (re-seq #"[^\s].*?[.!?]" doc)
    (filter #(re-find (re-pattern (str "(?i)\\b" phrase "\\b")) %))
    (seq)))

@alex-gerdom
Copy link

alex-gerdom commented Jul 14, 2021

(defn escape-re [s]
  #?(:clj (java.util.regex.Pattern/quote s)
     :cljs (.replace s (js/RegExp. "[.*+?^${}()|[\\]\\\\]" "g") "\\$&")))

(defn split-sentences [s]
  (if (empty? s) (list s)
      (re-seq #"[^.!?]+[.!?]?" s)))

(defn search [s substr]
  (let [pattern (re-pattern (str "(?i)" (escape-re substr)))
        contains-substr? #(some? (re-find pattern %))
        matching-sentences (->> s
                                split-sentences
                                (filter contains-substr?)
                                (map clojure.string/trim))]
    (if (empty? matching-sentences) nil
        matching-sentences)))

@jumarko
Copy link

jumarko commented Jul 15, 2021

https://github.com/jumarko/clojure-experiments/blob/master/src/clojure_experiments/purely_functional/puzzles/0434_sentence_searcher.clj#L1

(require '[clojure.string :as str])

(defn sentences [document]
  (mapv str/trim
        (str/split document #"(?<=[.?!])")))

(defn contains-word? [sentence word]
  (let [sentence-words (str/split (str/lower-case sentence) #"\s")]
    ((set sentence-words) (str/lower-case word))))

(defn search [document word]
  (not-empty (filterv #(contains-word? % word)
                      (sentences document))))

(search "This is my document." "Hello")
;; => nil

(search "I like to write. Do you like to write?" "like")
;; => ["I like to write." "Do you like to write?"]

(search "This is not my document. It has No two sentences." "no")
;; => ["It has No two sentences."]

@sztamas
Copy link

sztamas commented Jul 15, 2021

(defn search [sentences word]
  (let [ci-word       (re-pattern (str "(?i)" "\\b+" word "\\b+"))
        matches-word? (partial re-find ci-word)]
    (->> sentences
         (re-seq #"[^\.\!\?]+[\.\!\?]+")
         (filter matches-word?)
         seq)))

@javierrweiss
Copy link

javierrweiss commented Jul 16, 2021

(:require [clojure.string :as st])

(defn process-str
[text]
(as-> text t
(st/split-lines t)
(remove st/blank? t)
(map #(st/split % #"(?<=(.|?|!))") t)
(flatten t)
(map #(st/trim %) t)))

(defn matching-indexes
[xs word]
(let [coll (map-indexed
(fn [idx itm]
(if-not (nil? (re-find (re-pattern (str "(?i)" word)) itm))
idx)) xs)]
(remove nil? coll)))

(defn search [text word]
(let [data (process-str text)
indexes (matching-indexes data word)]
(if (empty? indexes)
nil
(vec (for [x indexes] (nth data x))))))

@KingCode
Copy link

KingCode commented Oct 7, 2021

(require '[clojure.string :as str])

(defn parse-ends [txt]
  (for [m (repeat (re-matcher #"([^.^!^?]+[.!?])" txt))
        :let [finds (re-find m)]
        :while finds]
    (->> finds rest (filter identity) first last str)))

(defn search [txt word]
  (let [word (str/lower-case word) 
        ends (parse-ends txt)]
    (->> (str/split txt #"[.!?]")
         (map vector ends)
         (sequence (comp
                    (filter (fn [[end sent]]
                              (->> (str/split (str/lower-case sent) #"\s+")
                                   (some #{word}))))
                    (map (fn [[end sent]]
                           (.concat sent end)))))
         seq)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment