Skip to content

Instantly share code, notes, and snippets.

@cgrand
Created February 13, 2017 14:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cgrand/dd1c71feb6c4a05194f9bae8ed8b1998 to your computer and use it in GitHub Desktop.
Save cgrand/dd1c71feb6c4a05194f9bae8ed8b1998 to your computer and use it in GitHub Desktop.
from clojure spec to spark schema [PoC]
(ns powderkeg.spec
(:require [clojure.spec :as s])
(:import [org.apache.spark.sql.types DataType DataTypes]))
(defmulti expr first)
(def ^:private regexp-repeat
(s/and
(s/cat :tag any? :type ::datatype)
(s/conformer
(fn [{t :type}]
(DataTypes/createArrayType t)))))
(defmethod expr `s/every [_] (s/cat :tag any? :type ::datatype :options (s/* any?)))
(defmethod expr `s/keys [_]
(s/conformer
(fn [[_ & {:keys [req req-un]}]]
(let [fields
(-> {}
(into (map #(vector (str (namespace %) "/" (name %)) (s/conform ::datatype %))) req)
(into (map #(vector (name %) (s/conform ::datatype %))) req-un))]
(or (some #{::s/invalid} (vals fields))
(DataTypes/createStructType
^java.util.List (map (fn [[k v]] (DataTypes/createStructField k v true)) fields)))))))
(defmethod expr `s/tuple [_]
(s/conformer
(fn [[_ & specs]]
(let [types (map #(s/conform ::datatype %) specs)]
(or (some #{::s/invalid} types)
(DataTypes/createStructType
^java.util.List (map-indexed (fn [i v] (DataTypes/createStructField (str i) v true)) types)))))))
(defmethod expr `s/* [_] regexp-repeat)
(defmethod expr `s/+ [_] regexp-repeat)
(def preds-registry {`string? DataTypes/StringType
`boolean? DataTypes/BooleanType
`double? DataTypes/DoubleType
`int? DataTypes/LongType
`nil? DataTypes/NullType
`inst? DataTypes/TimestampType})
(s/def ::datatype
(s/and
(s/or
:named (s/and qualified-keyword?
(s/conformer
(fn [k]
(if-some [form (some-> k s/get-spec s/form)]
(s/conform ::datatype form)
::s/invalid))))
:expr (s/and seq? (s/multi-spec expr (fn [form tag] (cons tag (next form)))))
:pred (s/and qualified-symbol?
(s/conformer #(preds-registry % ::s/invalid))))
(s/conformer val)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment