Skip to content

Instantly share code, notes, and snippets.

@ndolgov
Created July 1, 2015 03:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ndolgov/886d4f1b53abb15a70ad to your computer and use it in GitHub Desktop.
Save ndolgov/886d4f1b53abb15a70ad to your computer and use it in GitHub Desktop.
A local Parquet file to SparkSQL cache
public void readAndCache(SQLContext sqlCtx, File file) {
final DataFrame df = sqlCtx.read().parquet("file://" + file.getAbsolutePath())
sqlCtx.registerDataFrameAsTable(df, name);
sqlCtx.cacheTable(name); // != df.persist(StorageLevel.MEMORY_ONLY_SER()) when reading from a Parquet file
final long rowCount = df.count(); // warm-up cache
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment