Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
434 PurelyFunctional.tv Newsletter

Sentence searcher

Sometimes I want to find a word in a document, but I want the context for the word. Write a function that takes a document and a word and returns the sentences that contain that word. The sentences should be returned in the order they appear in the document.

Examples

(search "This is my document." "Hello") ;=> nil
(search "This is my document. It has two sentences." "sentences") ;=> ["It has two sentences."]
(search "I like to write. Do you like to write?" "Write") ;=> ["I like to write." "Do you like to write?"]

Sentences end with \., \!, or \?.

The search should be case insensitive.

Return nil if the word is not found.

Thanks to this site for the problem idea, where it is rated Hard in Python. The problem has been modified.

Please submit your solutions as comments on this gist.

To subscribe: https://purelyfunctional.tv/newsletter/

@sztamas
Copy link

sztamas commented Jul 15, 2021

(defn search [sentences word]
  (let [ci-word       (re-pattern (str "(?i)" "\\b+" word "\\b+"))
        matches-word? (partial re-find ci-word)]
    (->> sentences
         (re-seq #"[^\.\!\?]+[\.\!\?]+")
         (filter matches-word?)
         seq)))

@javierrweiss
Copy link

javierrweiss commented Jul 16, 2021

(:require [clojure.string :as st])

(defn process-str
[text]
(as-> text t
(st/split-lines t)
(remove st/blank? t)
(map #(st/split % #"(?<=(.|?|!))") t)
(flatten t)
(map #(st/trim %) t)))

(defn matching-indexes
[xs word]
(let [coll (map-indexed
(fn [idx itm]
(if-not (nil? (re-find (re-pattern (str "(?i)" word)) itm))
idx)) xs)]
(remove nil? coll)))

(defn search [text word]
(let [data (process-str text)
indexes (matching-indexes data word)]
(if (empty? indexes)
nil
(vec (for [x indexes] (nth data x))))))

@KingCode
Copy link

KingCode commented Oct 7, 2021

(require '[clojure.string :as str])

(defn parse-ends [txt]
  (for [m (repeat (re-matcher #"([^.^!^?]+[.!?])" txt))
        :let [finds (re-find m)]
        :while finds]
    (->> finds rest (filter identity) first last str)))

(defn search [txt word]
  (let [word (str/lower-case word) 
        ends (parse-ends txt)]
    (->> (str/split txt #"[.!?]")
         (map vector ends)
         (sequence (comp
                    (filter (fn [[end sent]]
                              (->> (str/split (str/lower-case sent) #"\s+")
                                   (some #{word}))))
                    (map (fn [[end sent]]
                           (.concat sent end)))))
         seq)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment