Skip to content

Instantly share code, notes, and snippets.

@ssimeonov
Created January 8, 2016 21:23
Show Gist options
  • Save ssimeonov/6ee2a1df46f8d056839d to your computer and use it in GitHub Desktop.
Save ssimeonov/6ee2a1df46f8d056839d to your computer and use it in GitHub Desktop.
Some improvements to Databricks' Scala notebook capabilities.
val ctx = sqlContext
import ctx.implicits._
// With nested structs, sometimes JSON is a much more readable form than display()
def showall(df: DataFrame, num: Int): Unit = df.limit(num).toJSON.collect.foreach(println)
def showall(sql: String, num: Int = 100): Unit = showall(ctx.sql(sql), num)
def hivePath(name: String) = s"/user/hive/warehouse/$name"
// Bug workaround
// Any table whose schema is > 4K in serialized not accessible through Spark SQL
def parquetTable(name: String, fromHive: Boolean = true) =
if (fromHive) ctx.table(name)
else {
val df = ctx.read.parquet(hivePath(name))
df.registerTempTable(name)
df
}
// Simple directory tree; use with sext's treeString()
def dirTree(path: String): Map[String, Any] =
dbutils.fs.ls(path).map(file => {
// Work around double encoding bug
val path = file.path.replace("%25", "%").replace("%25", "%")
path -> (
if (file.isDir) dirTree(path)
else file.size
)
}).toMap
// Recursive file listing. Must have with partitioned table output.
def allFiles(path: String): Map[String, Long] =
dbutils.fs.ls(path).map(file => {
// Work around double encoding bug
val path = file.path.replace("%25", "%").replace("%25", "%")
if (file.isDir) allFiles(path)
else Map[String, Long](path -> file.size)
}).reduce(_ ++ _)
@sslyle
Copy link

sslyle commented Nov 28, 2018

  • import org.apache.spark.sql.DataFrame
  • registerTempTable => createOrReplaceTempView

How do you actually use your defs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment