Skip to content

Instantly share code, notes, and snippets.

@ericnormand
Created September 27, 2021 15:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ericnormand/724c98a2ff399c82469ce2dd1ea3e23c to your computer and use it in GitHub Desktop.
Save ericnormand/724c98a2ff399c82469ce2dd1ea3e23c to your computer and use it in GitHub Desktop.
444 PurelyFunctional.tv Newsletter

Bigram checker

A bigram is a sequence of two letters. Common bigrams in English include st and tr. Write a function to check if all bigrams in a collection are present in the words in a string.

Examples

(all-present? ["st" "tr"] "street") ;=> true
(all-present? ["ea" "ng" "kt"] "eating at a restaurant") ;=> false
(all-present? [] "hello!") ;=> true

Thanks to this site for the problem idea, where it is rated Hard in Ruby. The problem has been modified.

Please submit your solutions as comments on this gist.

To subscribe: https://purelyfunctional.tv/newsletter/

@mchampine
Copy link

mchampine commented Sep 27, 2021

(defn all-present? [bigrams astring]
  (every? (partial clojure.string/includes? astring) bigrams))

@nwjsmith
Copy link

(defn all-present?
  "Returns true if all bigrams are present in the words in a string."
  [bigrams s]
  (every? #(string/includes? s %) bigrams))

@safehammad
Copy link

safehammad commented Sep 27, 2021

@mchampine @nwjsmith Neat solutions :)

For the sake of another approach, rather than searching for bigrams in the string, this solution turns that on its head and extracts the set of bigrams from the string then checks that the given bigrams are part of that set:

(defn all-present? [bigrams astring]
  (clojure.set/subset? bigrams (set (map (partial apply str) (partition 2 1 astring)))))

[Update] Or even more succinctly:

(defn all-present? [bigrams astring]
  (clojure.set/subset? bigrams (set (map str astring (subs astring 1)))))

@KingCode
Copy link

KingCode commented Sep 28, 2021

Definitely not succinct, I tried to emulate one-pass which terminates as soon as all bigrams are found - reducing over a large or even infinite-lazy collection is made practical with the use of reduced, at least in best case scenarios.

This should be efficient enough for large strings, assuming the cost of String.substring is small:

(defn all-present? [bgrams txt]
  (->> (.substring txt 1) 
       (map str txt)
       (reduce (fn [not-found bg]
                 (cond
                   (empty? not-found)
                   (reduced not-found)
                   (contains? not-found bg)
                   (disj not-found bg)
                   :else
                   not-found))
               (set bgrams))
       empty?))

@SignSpice
Copy link

I'm relatively new to Clojure and I really enjoy seeing all the different ways to solve a problem.
This was my take on it:

(defn all-present?
  "([bigrams phrase])
  Checks if phrase contains every bigram. Currently, does not enforce bigrams being bigrams. 
  Would be more accurate to name it subphrases. Returns true if bigrams is an empty seq."
  [bigrams phrase]
  (every? true? (for [bigram bigrams]
     (clojure.string/includes? phrase bigram))))

@KubqoA
Copy link

KubqoA commented Sep 29, 2021

(require '[clojure.set :as set])

(defn all-present?
  [bigrams sentence]
  (let [bigrams-in-sentence (set (partition 2 1 sentence))
        bigrams (set (map seq bigrams))]
       (set/subset? bigrams bigrams-in-sentence)))

@genmeblog
Copy link

(defn all-present?
  [bigrams s]
  (let [bs (set bigrams)
        ss (->> s (partition 2 1) (map (partial apply str)) set)]
    (= bs (clojure.set/intersection bs ss))))

@alex-gerdom
Copy link

Ended up over engineering this one to work for any list of substrings.

(defn all-present?
  "Test whether all substrings occur in a string"
  [substrs s]
  (let [test-stream
        (fn [unfound sample-stream]
          (cond (empty? unfound) true
                (empty? sample-stream) false
                :else (recur
                       (clojure.set/difference unfound #{(first sample-stream)})
                       (rest sample-stream))))]
    (->> substrs
         (map (partial apply list))    ;; Convert to lists of characters
         (group-by count)              ;; {length-to-match [chars-to-match]}
         ;; Test each subsequence of a given length
         (map (fn [[length to-match]]
                (test-stream (set to-match)
                             (partition length 1 s))))
         ;; Fail if any subsequence of a given length fails to match
         (every? true?))))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment