Skip to content

Instantly share code, notes, and snippets.

@prodeezy
Last active November 22, 2019 02:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save prodeezy/c0212781e2617fba56c3324ad5c262a9 to your computer and use it in GitHub Desktop.
Save prodeezy/c0212781e2617fba56c3324ad5c262a9 to your computer and use it in GitHub Desktop.
Vectorization Test Without Maps
import org.apache.spark.sql.types._ ;
import org.apache.iceberg.hadoop.HadoopTables;
import org.apache.iceberg.Schema;
import org.apache.iceberg.spark.SparkSchemaUtil
val schema = new StructType().add("age", IntegerType).add("name", StringType).add("location", new StructType().add("lat", DoubleType).add("lon", DoubleType))
val json = spark.read.schema(schema).json("people_no_maps.json")
json.printSchema
json.show
val tables = new HadoopTables()
val iceSchema = SparkSchemaUtil.convert(json.schema)
val iceTable = tables.create(iceSchema, "iceberg-people-no-maps-complex")
iceTable.schema
json.write.format("iceberg").mode("append").save("iceberg-people-no-maps-complex")
val iceDf = spark.read.format("iceberg").load("iceberg-people-no-maps-complex")
iceDf.createOrReplaceTempView("people_iceberg_no_maps_complex")
spark.sql("select age from people_iceberg_no_maps_complex").show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment