Skip to content

Instantly share code, notes, and snippets.

@realgenekim
Last active February 16, 2021 17:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save realgenekim/3fb9d2dccde9234743ec392b600055be to your computer and use it in GitHub Desktop.
Save realgenekim/3fb9d2dccde9234743ec392b600055be to your computer and use it in GitHub Desktop.
Example of getting UTF-8 working with next.jdbc with MySQL, to enable Unicode and emojis, along with using using "REPLACE" instead of "INSERT" to enable upserts.
(ns db.db2
(:require
[next.jdbc :as j]
[next.jdbc.sql :as jsql]
[next.jdbc.sql.builder :as jb]
[clojure.spec.alpha :as s]))
(def mysql-db {:dbtype "mysql"
:dbname "twitter"
:host "127.0.0.1"
:user "xxx"
:password "xxx"
:useUnicode "yes"
:characterEncoding "UTF-8"})
(def ds (j/get-datasource mysql-db))
(defn insert-tweets-with-conn!
[conn tweets]
(let [ks [:id :tweet :hash256]
rows (map #((apply juxt ks) %) tweets)
;_ (println rows)
q (jb/for-insert-multi :tweets ks rows {})
;_ (println q)
qr (clojure.string/replace (first q) #"^INSERT" "REPLACE")
;_ (println qr)
qnew (assoc-in q [0] qr)
_ (println qnew)]
(j/execute! conn qnew)))
(def insert-tweets! (partial insert-tweets-with-conn! ds))
(commment
(insert-tweets! [{:id 1235 :tweet "abc🎉" :hash256 "abcxxx"}])
)
@realgenekim
Copy link
Author

...and this is what's required to modify the MySQL database, tables, and potentially, columns to make this work:

; https://stackoverflow.com/questions/39463134/how-to-store-emoji-character-in-mysql-database

ALTER DATABASE twitter CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE tweets CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tweets MODIFY COLUMN tweet TEXT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;

@realgenekim
Copy link
Author

Thanks @seancorfield, who responded with this in Clojurians Slack.

I think you need the MODIFY COLUMN between the CONVERT though otherwise you might lose data?

I have a feeling you don't need to CONVERT the table at all -- it can't contain any four byte Unicode characters and anything it already contains is correct as-is.

If you have other character columns in that table, you might need to consider what encodings they have -- not all char, varchar, text columns need to be the same charset: it depends on what they are going to contain. We have tables where the charset default is latin-1 but a few specific columns are utf8mb4, for example.

@seancorfield
Copy link

Didn't know the typo in my Slack message until now -- I meant "I think you need the MODIFY COLUMN before the CONVERT...".

I'm basing this on our experience at work where we originally built our MySQL database with utf8 columns where we planned to store customer-provided text but latin1 for columns where we control the data and know there will be no Unicode. We ran into this emoji issue because of the "three-byte utf8" and we only needed ALTER TABLE ... MODIFY COLUMN ... CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci -- all the data stored in the utf8 columns remained valid, without needing a CONVERT on the table.

For new tables we create, if all columns need utf8mb4 then we'll declare that as the default for the whole table, otherwise, we leave the tables as latin1 and only declare specific columns as utf8mb4 (our DBA preferred this approach so we weren't "wasting space" by using four-byte Unicode for strings that we knew only needed one byte per character.

@realgenekim
Copy link
Author

@seancorfield also wrote — I think this is what every dev (and DBA) should know about modern realities of strings and their storage in databases

(we have a couple of columns still on utf8 in our biggest tables because running the ALTER TABLE will lock the table long enough to cause disruption to our users -- at some point we'll probably tackle that during a scheduled maintenance window... but we very rarely have scheduled maintenance any more now that our processes are nearly all Clojure apps running in a (small) cluster!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment