Skip to content

Instantly share code, notes, and snippets.

View aravinthsci's full-sized avatar
🎯
Focusing

Aravinth aravinthsci

🎯
Focusing
View GitHub Profile
val more_data = Seq(
("4","345","1970-01-01 00:02:50","6"),
("5","345","1970-01-01 00:03:50","8")).toDF("id","product_id","created_at","units")
more_data.show()
/*
+---+----------+-------------------+-----+
| id|product_id| created_at|units|
+---+----------+-------------------+-----+
| 4| 345|1970-01-01 00:02:50| 6|
| 5| 345|1970-01-01 00:03:50| 8|
@aravinthsci
aravinthsci / time.ipynb
Created November 12, 2019 09:39
Note book for creating table uisng delta lake lib in Apache Spark
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
val new_df = readTable("sales").withColumn("new_col",lit("abc"))
new_df.show()
/*
+---+----------+-------------------+-----+----------+--------+
| id|product_id| created_at|units| date| new_col|
+---+----------+-------------------+-----+----------+--------+
| 21| 527|2012-12-21 06:18:10| 2|2012-12-21| abc |
| 22| 54|2012-12-21 06:18:50| 5|2012-12-21| abc |
+---+----------+-------------------+-----+----------+--------+
*/
def addColumn(data: DataFrame,tableName: String): Unit = {
data
.write
.format("delta")
.mode("overwrite")
.option("mergeSchema", "true")
.save("/data/deltalake/" + tableName)
}
val data = spark.range(0, 5).toDF("no")
createTable(data, "numbers")
val moreData = spark.range(20, 25).toDF("no")
updateDeltaTable(moreData, "numbers", "overwrite")
val moreMoreData = spark.range(26, 30)
updateDeltaTable(moreMoreData, "numbers", "append")
val no_df = readTable("numbers")
no_df.show()
/*
+---+
def updateDeltaTable(data: DataFrame, tableName: String, savemode: String): Unit = {
data
.write
.format("delta")
.mode(savemode)
.save("/data/deltalake/" + tableName)
}
val df = spark.read.option("header",true).csv("Sale_test.csv")
createTable(df, "sales")
val sales_df = readTable("sales")
sales_df.show(2)
+---+----------+-------------------+-----+----------+
| id|product_id| created_at|units| date|
+---+----------+-------------------+-----+----------+
| 21| 527|2012-12-21 06:18:10| 2|2012-12-21|
| 22| 54|2012-12-21 06:18:50| 5|2012-12-21|
+---+----------+-------------------+-----+----------+
@aravinthsci
aravinthsci / read_table.scala
Last active January 31, 2020 06:26
Reading table from delta lake
def readTable(tableName: String): DataFrame = {
val df = spark
.read
.format("delta")
.load("/data/deltalake/" + tableName)
df
}
@aravinthsci
aravinthsci / create_table.scala
Created November 6, 2019 11:56
Creating table in Delta lake
import org.apache.spark.sql.functions._
import org.apache.spark.sql.{SaveMode, SparkSession, DataFrame}
def createTable(data: DataFrame, tableName: String ): Unit = {
data
.write
.format("delta")
.mode(SaveMode.Overwrite)
.save("/data/deltalake/" + tableName)
}
mport os
from io import StringIO
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
def pdfextract(fname, pages=None):
if not pages: