Skip to content

Instantly share code, notes, and snippets.

boolean updateValue(Event event) {
long userInitiatedTimestamp = event.getUserInitiatedTimestamp();
String userId = event.getUserId();
User user = database.query(userId);
long lastUserInitiatedTimestamp = user.getLastUserInitiatedTimestamp();
// event arrives in correct order
if(userInitiatedTimestamp > lastUserInitiatedTimestamp) {
// register with latest timestamp
user.setLastUserInitiatedTimestamp(userInitiatedTimestamp);
@mingwei-li
mingwei-li / more.scala
Created August 4, 2020 18:36
hyperspace - more
hs.refreshIndex("index1")
hs.deleteIndex("index1")
hs.restoreIndex("index1")
hs.deleteIndex("index2")
hs.vacuumIndex("index2")
@mingwei-li
mingwei-li / explain2.log
Created August 4, 2020 18:35
hyperspace - explain2
== Physical Plan ==
*(1) Project [name#11]
+- *(1) Filter (isnotnull(id#10) && (id#10 = 1))
+- *(1) FileScan parquet [id#10,name#11] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/mingwli/Dev/lib/hadoop-2.9.2/bin/spark-warehouse/indexes/index/v__=0], PartitionFilters: [], PushedFilters: [IsNotNull(id), EqualTo(id,1)], ReadSchema: struct<id:int,name:string>
@mingwei-li
mingwei-li / enable.scala
Created August 4, 2020 18:35
hyperspace - enable
spark.enableHyperspace
query.show()
@mingwei-li
mingwei-li / explain.log
Created August 4, 2020 18:34
hyperspace - explain
=============================================================
Plan with indexes:
=============================================================
Project [name#11]
+- Filter (isnotnull(id#10) && (id#10 = 1))
<----+- FileScan parquet [id#10,name#11] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/mingwli/Dev/lib/hadoop-2.9.2/bin/spark-warehouse/indexes/index/v__=0], PartitionFilters: [], PushedFilters: [IsNotNull(id), EqualTo(id,1)], ReadSchema: struct<id:int,name:string>---->
=============================================================
Plan without indexes:
=============================================================
@mingwei-li
mingwei-li / show.log
Created August 4, 2020 18:32
hyperspace - show
+-----+--------------+---------------+----------+--------------------+--------------------+--------------------+------+
| name|indexedColumns|includedColumns|numBuckets| schema| indexLocation| queryPlan| state|
+-----+--------------+---------------+----------+--------------------+--------------------+--------------------+------+
|index| [id]| [name]| 200|{"type":"struct",...|file:/Users/mingw...|Relation[id#10,na...|ACTIVE|
+-----+--------------+---------------+----------+--------------------+--------------------+--------------------+------+
@mingwei-li
mingwei-li / create.scala
Created August 4, 2020 18:32
hyperspace - create
val hs = new Hyperspace(spark)
hs.createIndex(df, IndexConfig("index", indexedColumns = Seq("id"), includedColumns = Seq("name")))
hs.indexes.show()
@mingwei-li
mingwei-li / import.scala
Created August 4, 2020 18:31
hyperspace - import
import com.microsoft.hyperspace._
import com.microsoft.hyperspace.index._
@mingwei-li
mingwei-li / load-data.scala
Created August 4, 2020 18:30
hyperspace - load-data
val df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("hdfs://localhost:9000/hyperspace_test/customers.csv")
df.show()
@mingwei-li
mingwei-li / file.csv
Last active August 4, 2020 18:24
hyperspace - csv
id name zip
1 john smith 78750
2 john doe 78758
3 mike tyson 91731
4 mingwei li 78750