Skip to content

Instantly share code, notes, and snippets.

Last active August 16, 2022 10:53
Show Gist options
  • Save ericnormand/33d040d1568d1f0253707e42a1cdba85 to your computer and use it in GitHub Desktop.
Save ericnormand/33d040d1568d1f0253707e42a1cdba85 to your computer and use it in GitHub Desktop.
433 Newsletter

Vowel families

Given two words, we can determine if they have the same vowels, ignoring order and repetition. For instance, "Hello" and "Vowel" both have \e and \o, so they have the same vowels. Write a function that groups words into "families" that all have the same vowels. "Tree" and "tent" also belong to the same family because we ignore the repetition of \e.


(vowel-families ["hello" "vowel" "fox" "cot" "hat" "cat"]) ;=> [["hello" "vowel"]
                                                           ;    ["fox" "cot"]
                                                           ;    ["hat" "cat"]]
(vowel-families []) ;=> []
(vowel-families ["tree" "tent" "blanket"] ;=> [["tree" "tent"]
                                          ;    ["blanket"]]


For this exercise, the vowels are \a, \e, \i, \o, and \u. If you wish, you may include vowels from other languages and scripts. Your algorithm should not be case-sensitive.

Please maintain capitalization and non-alphabetic characters.

Thanks to this site for the problem idea, where it is rated Very Hard in Python. The problem has been modified.

Please submit your solutions as comments on this gist.

To subscribe:

Copy link

steffan-westcott commented Jul 5, 2021

(defn vowel-families [words]
  (->> words
       (group-by #(set (filter (set "aeiou") (clojure.string/lower-case %))))
       (into [])))

Copy link

grierson commented Jul 5, 2021

(def vowels #{\a \e \i \o \u})

(defn get-vowels [word]
  (set (filter vowels word)))

(defn vowel-families [words]
  (if (empty? coll)
    (vals (group-by get-vowels words))))

Copy link

sztamas commented Jul 5, 2021

(defn vowels [s]
  (->> s
       (clojure.set/intersection #{\a \e \i \o \u})))

(defn vowel-families [words]
  (let [groups (group-by vowels words)]
    (if (empty? groups) [] (vals groups))))

Copy link

natec425 commented Jul 5, 2021

(def VOWELS #{\a \e \i \o \u})

(defn vowels-in-word [word]
    (clojure.set/intersection VOWELS
        (set (clojure.string/lower-case word))))

(defn vowel-families [words]
    (vals (group-by vowels-in-word words)))

Edit: I don't usually write clojure, but I noticed the other submissions have the branch to check for an empty vector. Apparently vals returns nil for an empty map instead of returning an empty sequence. Is there a reason for this? I'm just curious.

My fix for my code would be to wrap it with (or ... []):

(defn vowel-families [words]
    (or (vals (group-by vowels-in-word words)) []))

Copy link

kthu commented Jul 5, 2021

(defn vowel-family
  (->> word
       (filter #{\a \e \u \i \o})

(defn vowel-families
  (->> words
       (group-by vowel-family)
       (into [])))

Copy link

(defn word->vowels [w]
  (clojure.set/intersection (set "aeiou") (set w)))

(defn vowel-families [wc]
  (if-let [vf (vals (group-by word->vowels wc))] vf []))

Copy link

mcuervoe commented Jul 5, 2021

(defn vowel-families [strings]
  (->> strings
      (map (fn [string] (vector (into #{} (filter #{\a \e \i \o \u} string))
      (reduce (fn [acc [key, value]] (update acc key conj value)) {})
      (map vec)

Copy link

samboozle commented Jul 5, 2021

Silly point-free version

(def vowel-families
  (let [vowels (comp set
                     (partial filter (partial contains? (set "aeiou")))
    (comp vec vals (partial group-by vowels))))

Copy link

jonasseglare commented Jul 5, 2021

Mapping every word to a vowel code between 0 and 31:

(defn vowel-code [s]
   (map #(case % \a 1 \o 2 \u 4 \e 8 \i 16 0))
   (completing bit-or)
   (clojure.string/lower-case s)))

(defn vowel-families [words]
  (->> words
       (sort-by vowel-code)
       (partition-by vowel-code)))

Copy link

(defn- get-vowels [word]
  (clojure.set/intersection (into #{} (clojure.string/lower-case word)) #{\a \e \i \o \u}))

(defn vowel-families [xs]
  (if (empty? xs)
    (-> (group-by get-vowels xs)

Copy link

(def vowels #{\a \e \i \o \u})

(defn vowels-sorted [string]
        (filter #(contains? vowels %))
(defn vowel-families [strings]
        (group-by vowels-sorted)

Copy link

Sinha-Ujjawal commented Jul 6, 2021

More generic solution

(defn find-key [key-values string]
        (filter #(contains? key-values %))
(defn families [key-values strings]
        (group-by #(find-key key-values %))
(def vowels #{\a \e \i \o \u})

(defn vowel-families [strings]
    (families vowels strings))

Copy link

ZaymonFC commented Jul 6, 2021

(defn vowel-set [word]
  (->> word
       (into #{})
       (clojure.set/intersection #{\a \e \i \o \u})))

(defn vowel-families [words] (->> words (group-by vowel-set) vals vec))

@Sinha-Ujjawal it's worth noting that sorting a set is unnecessary, normal sets are order agnostic.

#{\e \a} and #{\a \e} would be the same key for group-by

(assoc {#{\a \e} \x} #{\e \a} \y) ; => {#{a e} y}

Copy link

Did it a few different ways, to see if I can make it faster:

;; First cut
(require '[clojure.string :as str])

(def ^:const ^:private vowel? #{\a \e \i \o \u})

(defn- vowel-family [s]
  (into #{} (filter vowel?) (str/lower-case s)))

(defn vowel-families [words]
  (vec (vals (group-by vowel-family words))))

;; Make grouping fn faster
(defn- vowel-family2 [^String s]
  (loop [i 0, x 0]
    (if (= i (.length s))
        (inc i)
        (bit-or x
          (case (.charAt s i)
            \a 1, \A 1
            \o 2, \O 2
            \u 4, \U 4
            \e 8, \E 8
            \i 16, \I 16

(defn vowel-families2 [words]
  (vec (vals (group-by vowel-family2 words))))

;; custom group-by to use transients for map-vals
(import java.util.HashSet)

(defn group-by-transient [f coll]
  (let [ks (HashSet.)
        grouped (reduce
                  (fn [ret x]
                    (let [k (f x)]
                      (.add ks k)
                      (assoc! ret k (conj! (or (get ret k) (transient [])) x))))
                  (transient {}) coll)]
    (doseq [k ks] (assoc! grouped k (persistent! (get grouped k))))
    (persistent! grouped)))

(defn vowel-families3 [words]
  (vec (vals (group-by-transient vowel-family2 words))))

;; mostly-mutating impl to go even faster
(import java.util.ArrayList)

(defn vowel-families4 [words]
  (let [grouped (ArrayList. 32)
        _ (dotimes [_ 32] (.add grouped (ArrayList.)))
        _ (doseq [s words] (.add ^ArrayList (.get grouped (vowel-family2 s)) s))]
    (into [] (keep #(when-not (.isEmpty ^ArrayList %) (vec %))) grouped)))

;; return vector of seqs instead of vector of vectors (changes the return value)
(defn vowel-families5 [words]
  (let [grouped (ArrayList. 32)
        _ (dotimes [_ 32] (.add grouped (ArrayList.)))
        _ (doseq [s words] (.add ^ArrayList (.get grouped (vowel-family2 s)) s))]
    (into [] (keep seq) grouped)))

;; measurements
(require '[criterium.core :as cc])

(defn measure [cases]
  #(cc/quick-bench (run! % cases)))

(def measure-given
  (measure [["hello" "vowel" "fox" "cot" "hat" "cat"]
            ["tree" "tent" "blanket"]]))

(defn rand-str [max-n]
  (apply str (repeatedly (rand-int max-n) #(rand-nth "abcdefghijklmnopqrstuvwxyz"))))

(def measure-big-input
    [(vec (repeatedly 10000 #(rand-str 10)))]))

"Measurements for given inputs:
name            mean-exec-time   std-deviation
vowel-families     5.15 µs          608.37 ns
vowel-families2    1.59 µs          252.64 ns
vowel-families3    2.55 µs           70.23 ns
vowel-families4    2.17 µs          226.43 ns
vowel-families5    4.69 µs          177.06 ns

Measurements for randomly generated large input
name            mean-exec-time   std-deviation
vowel-families    7194.89 µs        642.74 µs
vowel-families2   1946.96 µs        190.33 µs
vowel-families3   1837.29 µs         37.65 µs
vowel-families4    644.46 µs         20.93 µs
vowel-families5    517.92 µs         10.93 µs")

Copy link

miner commented Jul 6, 2021

(defn vowel-families [words]
  (let [vowel-mask (fn [ch]
                     (case ch
                       (\A \a) 1
                       (\E \e) 2
                       (\I \i) 4
                       (\O \o) 8
                       (\U \u) 16
        vowel-bits (fn [word] (reduce (fn [score ch] (bit-or score (vowel-mask ch))) 0 word))]
    (->> words
         (group-by vowel-bits)
         (mapv val))))

Copy link

(def all-vowels #{\a \e \i \o \u})

(defn vowels [word]
  (into #{} 
        (filter all-vowels word)))

(defn vowel-families [words]
  (->> words
       (group-by (comp vowels (memfn toLowerCase)))
       (into [])))

Copy link

(defn vowel-families [words]
   (let [vowel-family (fn [w]
                        (->> (re-seq #"(?i)[aeiou]" w)
                             (map clojure.string/lower-case)
                             (into #{})))]
     (->> words
          (group-by vowel-family)

Copy link

(defn vowel-families [coll]
  (let [lc clojure.string/lower-case]
    (->> coll
      (group-by #(set (filter (set "aeiou") (lc %))))

Copy link

dfuenzalida commented Jul 9, 2021

I'm late to this one, but here's my version:

(defn vowels-of [s]
  (->> (filter #{\a \e \i \o \u} (clojure.string/lower-case s))
       (reduce conj #{})))

(defn vowel-families [xs]
  (->> (group-by vowels-of xs) vals vec))

;; (vowel-families [])
;; => []

;; (vowel-families ["hello" "vowel" "fox" "cot" "hat" "cat"])
;; => [["hello" "vowel"] ["fox" "cot"] ["hat" "cat"]]

;; (vowel-families ["TREE" "tent" "blanket"])
;; => [["TREE" "tent"] ["blanket"]]

Copy link

KingCode commented Oct 7, 2021

(def vowels #{\a \e \i \o \u})

(defn vowel-families [words]
  (->> words 
       (sequence (comp
                  (map (fn [w]
                          (->> w clojure.string/lower-case seq
                               (reduce (fn [acc c]
                                         (if (vowels c)
                                           (conj acc c)
                  (partition-by second)
                  (map #(mapv first %))))))

Copy link

import std/[tables, sequtils]

# --- helpers
  Vowel = enum
    o, a, i, u, e

  SeqTable[K, V] = Table[K, seq[V]]

func incl[K, V](t: var SeqTable[K, V], key: K, value: V) =
  t.withValue key, wrapper:
    wrapper[].add value
    t[key] = @[value]

func vowels(word: string): set[Vowel] =
  for ch in word:
      case ch:
      of 'o', 'O': o
      of 'a', 'A': a
      of 'u', 'U': u
      of 'e', 'E': e
      of 'i', 'I': i
      else: continue

# --- main
func vowelFamilies(words: seq[string]): seq[seq[string]] =
  var vowelsLookup: SeqTable[set[Vowel], string]

  for w in words:
    vowelsLookup.incl vowels w, w

  toseq vowelsLookup.values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment