Skip to content

Instantly share code, notes, and snippets.

Keybase proof

I hereby claim:

  • I am feynmanliang on github.
  • I am feynman (https://keybase.io/feynman) on keybase.
  • I have a public key ASDISK0rUZWGgEO6xCKFhIoLWMFM9iReGIYlgD0YZukZBAo

To claim this, I am signing this object:

[
{
"action": "talk",
"text": "Please wait while we connect you."
},
{
"action": "connect",
"timeout": 20,
"from": "12015100650",
"endpoint": [

Do you even read these?

Keybase proof

I hereby claim:

  • I am feynmanliang on github.
  • I am feynman (https://keybase.io/feynman) on keybase.
  • I have a public key ASCNE3Hd3sTW9NIXdtB7kYN566UbsrJ3BEe9GVNH-BoCgQo

To claim this, I am signing this object:

[SPARK-10478] SoftmaxFunction.eval Benchmarks Before/After PR 8648

Num Examples Num Classes Before Average Runtime (ms) After Average Runtime (ms)
1000 5 5.3315174999999995 0.4096546
1000 50 6.7308793 4.1289975
1000 250 21.5866622 21.25164
10000 5 4.2153461 4.145396
10000 50 42.2423785 41.0198213
10000 250 215.2192236 208.61861249999998
100000 5 43.517224899999995 41.2843315
import org.apache.http.client.methods.HttpGet
import org.apache.http.impl.client.{BasicResponseHandler, HttpClientBuilder}
import org.apache.spark.mllib.fpm.PrefixSpan
// sequence database
val sequenceDatabase = {
val url = "http://www.philippe-fournier-viger.com/spmf/datasets/SIGN.txt"
val client = HttpClientBuilder.create().build()
val request = new HttpGet(url)
val response = client.execute(request)
import org.apache.spark.ml.feature.{CountVectorizer, RegexTokenizer, StopWordsRemover}
import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer}
import org.apache.spark.mllib.linalg.Vector
import sqlContext.implicits._
val numTopics: Int = 100
val maxIterations: Int = 100
val vocabSize: Int = 10000
diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
index 6c02004..83d47c7 100644
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
@@ -577,7 +577,9 @@ public String toString() {
StringBuilder build = new StringBuilder("[");
for (int i = 0; i < sizeInBytes; i += 8) {
build.append(java.lang.Long.toHexString(Platform.getLong(baseObject, baseOffset + i)));
- build.append(',');
+ if (i <= sizeInBytes-1) {
diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
index 6c02004..83d47c7 100644
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
@@ -577,7 +577,9 @@ public String toString() {
StringBuilder build = new StringBuilder("[");
for (int i = 0; i < sizeInBytes; i += 8) {
build.append(java.lang.Long.toHexString(Platform.getLong(baseObject, baseOffset + i)));
- build.append(',');
+ if (i <= sizeInBytes-1) {

Cluster

  • Spark 1.4 + 1da3c7f
  • Databricks Cloud
  • 8 Workers, EC2 Spot instances
  • Workers: 240 GB Memory, 32 Cores
  • Driver: 30 GB Memory, 4 Cores

Data