Skip to content

Instantly share code, notes, and snippets.

View ssimeonov's full-sized avatar

Simeon Simeonov ssimeonov

View GitHub Profile
@ssimeonov
ssimeonov / new_relic_503.txt
Created January 31, 2014 08:37
New Relic can't handle ingestion again
2014-01-31T08:25:13.684816+00:00 app[analytics.1]: ** [NewRelic][01/31/14 08:25:13 +0000 213b2cc6-fb96-4c1a-b031-9edexxxxxxxx (2)] WARN : Error during check_for_and_handle_agent_commands, will retry later:
2014-01-31T08:25:13.684816+00:00 app[analytics.1]: ** [NewRelic][01/31/14 08:25:13 +0000 213b2cc6-fb96-4c1a-b031-9edexxxxxxxx (2)] WARN : NewRelic::Agent::ServerConnectionException: Service unavailable (503): Service Unavailable
function replaceDoc(html) {
var newDoc = document.open("text/html", "replace");
newDoc.write(html);
newDoc.close();
}
<html>
<head>
<script src="//ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.js"></script>
</head>
<body>
<div id="main_wrapper">
<div id="main_wrapper_top"></div>

Ad hoc setup for a Swoop ML experimentation machine

Run the following:

curl 'https://gist.githubusercontent.com/ssimeonov/2319ecb00d825d6f5c78/raw/2bf43b3c5b766b9ce16f647fadbd7b423234f210/aws_ml_setup.sh' | bash -v

If the script exits without an error right after installing some packages, run it again.

➜ jq git:(master) ✗ make clean
rm -f jq
test -z "libjq.la " || rm -f libjq.la
rm -f ./so_locations
rm -rf .libs _libs
rm -f version.h .remake-version-h
rm -f *.o
test -z "tests/all.log" || rm -f tests/all.log
test -z "tests/all.trs" || rm -f tests/all.trs
test -z "test-suite.log" || rm -f test-suite.log

Spark 1.4.0 regression: out-of-memory conditions on small data

A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1 and fails with a series of out-of-memory errors in 1.4.0.

The data in question is a single file of 88,283 JSON objects with at most 109 fields per object. Size on disk is 181Mb.

This gist includes the code and the full output from the 1.3.1 and 1.4.0 runs, including the command line showing how spark-shell is started.

@ssimeonov
ssimeonov / code.scala
Last active August 29, 2015 14:25
I/O error in saveAsTable
// This code is pasted into spark-shell
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.SaveMode
val ctx = sqlContext.asInstanceOf[HiveContext]
import ctx.implicits._
val devRoot = "/home/ubuntu/spx"
ctx.
jsonFile("file://" + devRoot + "/data/swoop-ml-nlp/dimensions/component_variations.jsonlines").
@ssimeonov
ssimeonov / code.scala
Last active August 29, 2015 14:25
SPARK-9342 Spark SQL problems dealing with views
// This code is designed to be pasted in spark-shell in a *nix environment
// On Windows, replace sys.env("HOME") with a directory of your choice
import java.io.File
import java.io.PrintWriter
import org.apache.spark.sql.hive.HiveContext
val ctx = sqlContext.asInstanceOf[HiveContext]
import ctx.implicits._

Spark exceptions later on cause persistent I/O problems

When using spark-shell in local mode, I've observed the following behavior on a number of nodes:

  1. Some operation generates an exception related to Spark SQL processing via HiveContext.
  2. From that point on, nothing could be written to Hive with saveAsTable.
  3. Another identically-configured version of Spark on the same machine may not exhibit the problem.
  4. A new identically-configured installation of the same version on the same machine would exhibit the problem.

The behavior is difficult to reproduce reliably but consistently observable with a lot of Spark SQL experimentation.

15/08/05 02:48:16 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 02:48:16 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 02:48:16 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 02:48:16 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 02:48:16 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 02:48:16 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 02:48:16 INFO SparkILoop: Created sql context (with Hive support)..