Skip to content

Instantly share code, notes, and snippets.

@ericnormand
Last active March 28, 2020 22:48
Show Gist options
  • Save ericnormand/6218548273971aef0c31bf50f0bacb67 to your computer and use it in GitHub Desktop.
Save ericnormand/6218548273971aef0c31bf50f0bacb67 to your computer and use it in GitHub Desktop.

Headline -> Hashtags

Imagine you work for a newspaper and you're job is to automatically post news articles to Twitter. Of course, you need hash tags. Write a function that takes the headline (as a string) and returns a list of hashtags. The hashtags should be the three longest words in the headline, ordered longest to shortest, and of course, with a # in front. If there are fewer than three words, use as many words as there are. If two words are of the same length, prefer the one that occurs closest to the beginning.

Examples from The Onion, America's finest news source.

(->hashtags "Violently Bored Americans Begin Looting Puzzle Stores") 
;; => ("#violently" "#americans" "#looting")
(->hashtags "Trump Quietly Checks With Aides To Make Sure He’d Be Included In Receiving $1,000 Government Checks")
;; => ("#government" "#receiving" "#included")
(->hashtags "Nation Demands More Slow-Motion Footage Of Running Basset Hounds")
;; => ("#demands" "#footage" "#running")

Remember, hashtags are typically all lowercase.

Thanks to this site for the challenge idea.

(defn ->hashtags
[headline]
(letfn [(words [line]
(clojure.string/split headline #"(\s|,|\.|;|:)+"))
(hashtag [word]
(str "#" (clojure.string/lower-case word)))]
(->> (words headline)
(sort-by (comp - count))
(take 3)
(map hashtag)))
(defn ->hashtags [s]
(let [words (str/split s #" ")]
(->> words
(remove #(contains? (set %) \-))
(sort #(compare (count %2) (count %1)))
(take 3)
(map #(str "#" (.toLowerCase %))))))
(defn ->hashtags [headline]
(-> headline
(clojure.string/lower-case)
(clojure.string/split #"\W+")
(->> (map-indexed #(vector (- (count %2)) %1 %2))
(sort)
(take 3)
(map #(get % 2))
(map #(str "#" %)))))
(ns hashtags
(:require [clojure.string :as str]
[clojure.test :refer :all]))
(defn ->indexed-words
[str]
(if (seq str)
(->> (str/split str #"[ -]")
(map str/lower-case)
(map-indexed (fn [idx word] [idx word])))))
(defn score-words
"Sorts by index in sentence then longest word."
[words]
(->> words
(sort-by first)
(reverse)
(sort-by #(count (second %)))
(reverse)))
(defn invalid-chars
[word]
(if-let [invalid-char (and (seq word)
(re-seq #"[^A-Za-z0-9_]" word))]
(str "Hashtag contains invalid chars: " (into #{} invalid-char))))
(defn invalid-starting-character
[word]
(if-let [first-char (and (seq word)
-(re-matches #"[^A-Za-z]" (str (first word))))]
(str "Hashtag begins with an invalid character: " first-char)))
(def hashtag-rules
[invalid-chars
invalid-starting-character])
(defn ->hashtag
"Hashtags are prefixed by an octothorpe and follow these conditions:
- Has only letters, numbers, and underscores
- Begins with a letter"
[word]
(if-let [rules (->> hashtag-rules
(map #(% word))
(remove nil?)
seq)]
(do (println (str "Rule violations for '" word "':\n\t"
(str/join "\n\t" rules))))
(str "#" word)))
(defn ->hashtags
[message]
(->> (->indexed-words message)
(score-words)
(map #(->hashtag (second %)))
(filter seq)
(take 3)))
(deftest generate-hashtags-test
(is (= '("#violently" "#americans" "#looting")
(->hashtags "Violently Bored Americans Begin Looting Puzzle Stores")))
(is (= '("#demands" "#footage" "#running")
(->hashtags "Nation Demands More Slow-Motion Footage Of Running Basset Hounds")))
(is (= '("#brief")
(->hashtags "Brief")))
(is (= '("#worth" "#man" "#now")
(->hashtags "Man now worth $1,000,000,000")))
(is (empty? (->hashtags "@!#$!@#")))
(is (empty? (->hashtags nil))))
(ns functional-tv-puzzles.-2020.headline2hashtag-370
(:require [clojure.string :as s]))
(defn ->posmap [words]
(->> words distinct
(map-indexed (fn [i w] [w i]))
(into {})))
(defn ->hashtags [headline]
(let [words (-> headline s/lower-case
(s/split #"(\p{Punct}| )+"))
pos (->posmap words)]
(->> words
(sort-by count) reverse
(partition-by count) (map set)
(map #(sort-by pos %)) flatten
(take 3)
(map #(s/join ["#" %])))))
(defn ->hashtags [s]
(letfn [(mktag [s] (str \# (clojure.string/lower-case s)))]
(->> (clojure.string/split s #" |-")
(group-by count)
(sort-by key)
reverse
vals
flatten
(take 3)
(map mktag))))
(ns purelyfunctional-newsletters.issue-370
(:require [clojure.test :refer :all]
[clojure.string :as str]))
(defn ->hashtags [s]
(if (nil? s)
'()
(->> (str/split s #"\W")
(sort-by count #(compare %2 %1))
(filter (comp not empty?))
(take 3)
(map str/lower-case)
(map (partial str "#")))))
(deftest ->hashtags-tests
(is (= '() (->hashtags nil)))
(is (= '() (->hashtags "")))
(is (= '("#violently" "#americans" "#looting")
(->hashtags "Violently Bored Americans Begin Looting Puzzle Stores") ))
(is (= '("#government" "#receiving" "#included")
(->hashtags "Trump Quietly Checks With Aides To Make Sure He’d Be Included In
Receiving $1,000 Government Checks")))
(is (= '("#demands" "#footage" "#running")
(->hashtags "Nation Demands More Slow-Motion Footage Of Running Basset
Hounds"))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment