Simeon Simeonov ssimeonov

## wordle.scala
package wordle

/** Wordle solver, game runner & simulator
  *
  * Optimizes based on a combination of an allowed word list (from the Wordle source code or any
  * other source), word frequency data and the move in the game.
  *
  * @note
  *   [[Wordle.Game]] is mutable to allow for play in an environment without easy STDIN input. Use
  *   [[Wordle.Game.nextMove()]]. All words are in lowercase. Patterns are entered as as strings of

## 0 mvn_output.txt
➜  jvm-packages git:(master) ✗ mvn -Dspark.version=2.1.0 package
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for ml.dmlc:xgboost4j:jar:0.7
[WARNING] 'build.plugins.plugin.version' for org.codehaus.mojo:exec-maven-plugin is missing. @ line 40, column 29
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.

## distributedFileListing.scala
case class FInfo(
  path: String,
  parent: String,
  isDir: Boolean,
  size: Long,
  modificationTime: Long,
  partitions: Map[String, String]) {

  // @todo encoding issues
  def hasExt(ext: String) = endsWith(ext)

## DataFrameFunctions.scala
object DataFrameFunctions {

  final val TEMP_TABLE_PLACEHOLDER = "~tbl~"

  /** Executes a SQL statement on the dataframe.
    * Behind the scenes, it registers and cleans up a temporary table.
    *
    * @param df input dataframe
    * @param stmtTemplate SQL statement template that uses the value of
    *                     `TEMP_TABLE_PLACEHOLDER` for the table name.

## spark_sql_test_failures.txt
➜  spark git:(master) ✗ build/sbt sql/test
Using /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
[info] Loading global plugins from /Users/sim/.sbt/0.13/plugins
[info] Loading project definition from /Users/sim/dev/spx/spark/project/project
[info] Loading project definition from /Users/sim/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
[warn] Multiple resolvers having different access mechanism configured with same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).
[info] Loading project definition from /Users/sim/dev/spx/spark/project
[info] Set current project to spark-parent (in build file:/Users/sim/dev/spx/spark/)

## spark_test_failures.txt
[info] spark-streaming: found 30 potential binary incompatibilities (filtered 8)
[error]  * method delaySeconds()Int in class org.apache.spark.streaming.Checkpoint does not have a correspondent in new version
[error]    filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.streaming.Checkpoint.delaySeconds")
[error]  * class org.apache.spark.streaming.receiver.ActorSupervisorStrategy does not have a correspondent in new version
[error]    filter with: ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.streaming.receiver.ActorSupervisorStrategy")
[error]  * object org.apache.spark.streaming.receiver.IteratorData does not have a correspondent in new version
[error]    filter with: ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.streaming.receiver.IteratorData$")
[error]  * class org.apache.spark.streaming.receiver.ByteBufferData does not have a correspondent in new version
[error]    filter with: ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.streami

## ContrivedAdd.scala
object ContrivedAdd {

  import shapeless._
  import record._
  import syntax.singleton._
  import shapeless.ops.record.Updater
  import scalaz._
  import Scalaz._

  case class S[L <: HList](total: Int, scratch: L)

## shapeless-transformations.scala
package ss

object ContrivedAdd {

  import shapeless._
  import record._
  import syntax.singleton._

  import scalaz._
  import Scalaz._

## databricks.scala
val ctx = sqlContext
import ctx.implicits._

// With nested structs, sometimes JSON is a much more readable form than display()
def showall(df: DataFrame, num: Int): Unit = df.limit(num).toJSON.collect.foreach(println)
def showall(sql: String, num: Int = 100): Unit = showall(ctx.sql(sql), num)

def hivePath(name: String) = s"/user/hive/warehouse/$name"

// Bug workaround

## 00_README.md

      
        
          
            
              
              2 files
            
          
          
            
              
              0 forks
            
          
          
            
              
              0 comments
            
          
          
            
              
              0 stars
            
          
        
        
          
              
          
          
            
                ssimeonov
                / 00_README.md
            
            
              Created
              September 12, 2015 04:42
            
          
        
      
        
  
      
    #Scala .hashCode vs. MurmurHash3 for Spark's MLlib
This is simple test of two hashing functions:

Scala's native implementation (obj.##), used in HashingTF
MurmurHash3, included in Scala, used by Vowpal Wabbit and many others

The test uses the aspell dictionary generated with the "insane" setting (download), which produces 676,547 entries, and explores the following grid:

Feature vector sizes: 2^^18..22
	package wordle

	/** Wordle solver, game runner & simulator
	*
	* Optimizes based on a combination of an allowed word list (from the Wordle source code or any
	* other source), word frequency data and the move in the game.
	*
	* @note
	* [[Wordle.Game]] is mutable to allow for play in an environment without easy STDIN input. Use
	* [[Wordle.Game.nextMove()]]. All words are in lowercase. Patterns are entered as as strings of
	➜ jvm-packages git:(master) ✗ mvn -Dspark.version=2.1.0 package
	Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
	[INFO] Scanning for projects...
	[WARNING]
	[WARNING] Some problems were encountered while building the effective model for ml.dmlc:xgboost4j:jar:0.7
	[WARNING] 'build.plugins.plugin.version' for org.codehaus.mojo:exec-maven-plugin is missing. @ line 40, column 29
	[WARNING]
	[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
	[WARNING]
	[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
	case class FInfo(
	path: String,
	parent: String,
	isDir: Boolean,
	size: Long,
	modificationTime: Long,
	partitions: Map[String, String]) {

	// @todo encoding issues
	def hasExt(ext: String) = endsWith(ext)
	object DataFrameFunctions {

	final val TEMP_TABLE_PLACEHOLDER = "~tbl~"

	/** Executes a SQL statement on the dataframe.
	* Behind the scenes, it registers and cleans up a temporary table.
	*
	* @param df input dataframe
	* @param stmtTemplate SQL statement template that uses the value of
	* `TEMP_TABLE_PLACEHOLDER` for the table name.
	➜ spark git:(master) ✗ build/sbt sql/test
	Using /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home as default JAVA_HOME.
	Note, this will be overridden by -java-home if it is set.
	Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
	[info] Loading global plugins from /Users/sim/.sbt/0.13/plugins
	[info] Loading project definition from /Users/sim/dev/spx/spark/project/project
	[info] Loading project definition from /Users/sim/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
	[warn] Multiple resolvers having different access mechanism configured with same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).
	[info] Loading project definition from /Users/sim/dev/spx/spark/project
	[info] Set current project to spark-parent (in build file:/Users/sim/dev/spx/spark/)
	[info] spark-streaming: found 30 potential binary incompatibilities (filtered 8)
	[error] * method delaySeconds()Int in class org.apache.spark.streaming.Checkpoint does not have a correspondent in new version
	[error] filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.streaming.Checkpoint.delaySeconds")
	[error] * class org.apache.spark.streaming.receiver.ActorSupervisorStrategy does not have a correspondent in new version
	[error] filter with: ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.streaming.receiver.ActorSupervisorStrategy")
	[error] * object org.apache.spark.streaming.receiver.IteratorData does not have a correspondent in new version
	[error] filter with: ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.streaming.receiver.IteratorData$")
	[error] * class org.apache.spark.streaming.receiver.ByteBufferData does not have a correspondent in new version
	[error] filter with: ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.streami
	object ContrivedAdd {

	import shapeless._
	import record._
	import syntax.singleton._
	import shapeless.ops.record.Updater
	import scalaz._
	import Scalaz._

	case class S[L <: HList](total: Int, scratch: L)
	package ss

	object ContrivedAdd {

	import shapeless._
	import record._
	import syntax.singleton._

	import scalaz._
	import Scalaz._
	val ctx = sqlContext
	import ctx.implicits._

	// With nested structs, sometimes JSON is a much more readable form than display()
	def showall(df: DataFrame, num: Int): Unit = df.limit(num).toJSON.collect.foreach(println)
	def showall(sql: String, num: Int = 100): Unit = showall(ctx.sql(sql), num)

	def hivePath(name: String) = s"/user/hive/warehouse/$name"

	// Bug workaround