-
-
Save realgenekim/3fb9d2dccde9234743ec392b600055be to your computer and use it in GitHub Desktop.
(ns db.db2 | |
(:require | |
[next.jdbc :as j] | |
[next.jdbc.sql :as jsql] | |
[next.jdbc.sql.builder :as jb] | |
[clojure.spec.alpha :as s])) | |
(def mysql-db {:dbtype "mysql" | |
:dbname "twitter" | |
:host "127.0.0.1" | |
:user "xxx" | |
:password "xxx" | |
:useUnicode "yes" | |
:characterEncoding "UTF-8"}) | |
(def ds (j/get-datasource mysql-db)) | |
(defn insert-tweets-with-conn! | |
[conn tweets] | |
(let [ks [:id :tweet :hash256] | |
rows (map #((apply juxt ks) %) tweets) | |
;_ (println rows) | |
q (jb/for-insert-multi :tweets ks rows {}) | |
;_ (println q) | |
qr (clojure.string/replace (first q) #"^INSERT" "REPLACE") | |
;_ (println qr) | |
qnew (assoc-in q [0] qr) | |
_ (println qnew)] | |
(j/execute! conn qnew))) | |
(def insert-tweets! (partial insert-tweets-with-conn! ds)) | |
(commment | |
(insert-tweets! [{:id 1235 :tweet "abc🎉" :hash256 "abcxxx"}]) | |
) |
Didn't know the typo in my Slack message until now -- I meant "I think you need the MODIFY COLUMN before the CONVERT...".
I'm basing this on our experience at work where we originally built our MySQL database with utf8
columns where we planned to store customer-provided text but latin1
for columns where we control the data and know there will be no Unicode. We ran into this emoji issue because of the "three-byte utf8" and we only needed ALTER TABLE ... MODIFY COLUMN ... CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci
-- all the data stored in the utf8
columns remained valid, without needing a CONVERT
on the table.
For new tables we create, if all columns need utf8mb4
then we'll declare that as the default for the whole table, otherwise, we leave the tables as latin1
and only declare specific columns as utf8mb4
(our DBA preferred this approach so we weren't "wasting space" by using four-byte Unicode for strings that we knew only needed one byte per character.
@seancorfield also wrote — I think this is what every dev (and DBA) should know about modern realities of strings and their storage in databases
(we have a couple of columns still on utf8 in our biggest tables because running the ALTER TABLE will lock the table long enough to cause disruption to our users -- at some point we'll probably tackle that during a scheduled maintenance window... but we very rarely have scheduled maintenance any more now that our processes are nearly all Clojure apps running in a (small) cluster!)
Thanks @seancorfield, who responded with this in Clojurians Slack.