Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
444 PurelyFunctional.tv Newsletter

Bigram checker

A bigram is a sequence of two letters. Common bigrams in English include st and tr. Write a function to check if all bigrams in a collection are present in the words in a string.

Examples

(all-present? ["st" "tr"] "street") ;=> true
(all-present? ["ea" "ng" "kt"] "eating at a restaurant") ;=> false
(all-present? [] "hello!") ;=> true

Thanks to this site for the problem idea, where it is rated Hard in Ruby. The problem has been modified.

Please submit your solutions as comments on this gist.

To subscribe: https://purelyfunctional.tv/newsletter/

@mchampine

This comment has been minimized.

Copy link

@mchampine mchampine commented Sep 27, 2021

(defn all-present? [bigrams astring]
  (every? (partial clojure.string/includes? astring) bigrams))
@nwjsmith

This comment has been minimized.

Copy link

@nwjsmith nwjsmith commented Sep 27, 2021

(defn all-present?
  "Returns true if all bigrams are present in the words in a string."
  [bigrams s]
  (every? #(string/includes? s %) bigrams))
@safehammad

This comment has been minimized.

Copy link

@safehammad safehammad commented Sep 27, 2021

@mchampine @nwjsmith Neat solutions :)

For the sake of another approach, rather than searching for bigrams in the string, this solution turns that on its head and extracts the set of bigrams from the string then checks that the given bigrams are part of that set:

(defn all-present? [bigrams astring]
  (clojure.set/subset? bigrams (set (map (partial apply str) (partition 2 1 astring)))))

[Update] Or even more succinctly:

(defn all-present? [bigrams astring]
  (clojure.set/subset? bigrams (set (map str astring (subs astring 1)))))
@KingCode

This comment has been minimized.

Copy link

@KingCode KingCode commented Sep 28, 2021

Definitely not succinct, I tried to emulate one-pass which terminates as soon as all bigrams are found - reducing over a large or even infinite-lazy collection is made practical with the use of reduced, at least in best case scenarios.

This should be efficient enough for large strings, assuming the cost of String.substring is small:

(defn all-present? [bgrams txt]
  (->> (.substring txt 1) 
       (map str txt)
       (reduce (fn [not-found bg]
                 (cond
                   (empty? not-found)
                   (reduced not-found)
                   (contains? not-found bg)
                   (disj not-found bg)
                   :else
                   not-found))
               (set bgrams))
       empty?))
@SignSpice

This comment has been minimized.

Copy link

@SignSpice SignSpice commented Sep 28, 2021

I'm relatively new to Clojure and I really enjoy seeing all the different ways to solve a problem.
This was my take on it:

(defn all-present?
  "([bigrams phrase])
  Checks if phrase contains every bigram. Currently, does not enforce bigrams being bigrams. 
  Would be more accurate to name it subphrases. Returns true if bigrams is an empty seq."
  [bigrams phrase]
  (every? true? (for [bigram bigrams]
     (clojure.string/includes? phrase bigram))))
@KubqoA

This comment has been minimized.

Copy link

@KubqoA KubqoA commented Sep 29, 2021

(require '[clojure.set :as set])

(defn all-present?
  [bigrams sentence]
  (let [bigrams-in-sentence (set (partition 2 1 sentence))
        bigrams (set (map seq bigrams))]
       (set/subset? bigrams bigrams-in-sentence)))
@genmeblog

This comment has been minimized.

Copy link

@genmeblog genmeblog commented Sep 29, 2021

(defn all-present?
  [bigrams s]
  (let [bs (set bigrams)
        ss (->> s (partition 2 1) (map (partial apply str)) set)]
    (= bs (clojure.set/intersection bs ss))))
@alex-gerdom

This comment has been minimized.

Copy link

@alex-gerdom alex-gerdom commented Sep 30, 2021

Ended up over engineering this one to work for any list of substrings.

(defn all-present?
  "Test whether all substrings occur in a string"
  [substrs s]
  (let [test-stream
        (fn [unfound sample-stream]
          (cond (empty? unfound) true
                (empty? sample-stream) false
                :else (recur
                       (clojure.set/difference unfound #{(first sample-stream)})
                       (rest sample-stream))))]
    (->> substrs
         (map (partial apply list))    ;; Convert to lists of characters
         (group-by count)              ;; {length-to-match [chars-to-match]}
         ;; Test each subsequence of a given length
         (map (fn [[length to-match]]
                (test-stream (set to-match)
                             (partition length 1 s))))
         ;; Fail if any subsequence of a given length fails to match
         (every? true?))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment