Skip to content

Instantly share code, notes, and snippets.

View gregakespret's full-sized avatar

Grega Kespret gregakespret

View GitHub Profile
@gregakespret
gregakespret / stream.R
Created May 24, 2012 16:37
Stream prediction
#increments model and predicts
for(i in 2:nrow(dataset)) {
predictions[i] <- predict.model(model.name, my.model, dataset[i,])
my.model <- update.model(model.name, Real~., dataset[i,], my.model)
}
Load Minus.24 Minus.168 Minus.192 Minus.312 Minus.336
2007-01-21 00:00:00 609.1750 584.2417 546.3750 551.7769 504.7167 498.0333
2007-01-21 01:00:00 576.5667 550.8167 504.8167 524.0333 477.0667 465.2167
2007-01-21 02:00:00 550.8917 526.3417 489.6417 484.7750 458.1833 436.8417
2007-01-21 03:00:00 534.4833 522.8500 475.0769 474.7333 451.3500 421.8167
2007-01-21 04:00:00 541.6750 517.6917 471.4667 470.1833 453.6000 421.4917
2007-01-21 05:00:00 546.3750 534.4667 474.8667 479.3750 478.6917 423.0000
2007-01-21 06:00:00 560.6083 554.2667 502.2250 511.1167 557.3636 442.2083
2007-01-21 07:00:00 577.1417 582.3167 521.1333 556.6583 664.6583 467.8250
2007-01-21 08:00:00 600.3500 636.8583 542.4000 604.2083 695.1000 486.0083
@gregakespret
gregakespret / spark-fetch-failure
Created November 19, 2013 07:53
Spark fetch failure, resubmitting failed stages
Connected to jdbc:vertica://vertica.celtra.com:5433/celtra (DirectBatchInsert: false)
13/11/18 08:56:24,710 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/11/18 08:56:24,919 INFO spark.SparkEnv: Registering BlockManagerMaster
13/11/18 08:56:24,968 INFO storage.MemoryStore: MemoryStore started with capacity 2.2 GB.
13/11/18 08:56:24,975 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20131118085624-199a
13/11/18 08:56:25,009 INFO network.ConnectionManager: Bound socket to port 39620 with id = ConnectionManagerId(ip-10-170-8-11.ec2.internal,39620)
13/11/18 08:56:25,017 INFO storage.BlockManagerMaster: Trying to register BlockManager
13/11/18 08:56:25,019 INFO storage.BlockManagerMaster: Registered BlockManager
13/11/18 08:56:25,108 INFO server.Server: jetty-7.x.y-SNAPSHOT
13/11/18 08:56:25,130 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:57821
@gregakespret
gregakespret / gist:7641293
Created November 25, 2013 13:37
Inserting into a table backed by pre-join projection HASH JOIN problem
drop table test_1 cascade;
drop table test_2 cascade;
CREATE TABLE test_1
(
a date NOT NULL,
b char(8) NOT NULL,
PRIMARY KEY (a),
UNIQUE (b)
@gregakespret
gregakespret / gist:7874908
Created December 9, 2013 16:14
Resubmision due to a fetch failure
Connected to jdbc:vertica://vertica7.aws.celtra-test.com:5433/aws7 (DirectBatchInsert: false)
13/12/09 15:45:36,704 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/12/09 15:45:36,888 INFO spark.SparkEnv: Registering BlockManagerMaster
13/12/09 15:45:36,925 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20131209154536-5135
13/12/09 15:45:36,933 INFO storage.MemoryStore: MemoryStore started with capacity 2.2 GB.
13/12/09 15:45:36,969 INFO network.ConnectionManager: Bound socket to port 45383 with id = ConnectionManagerId(ip-10-170-8-11.ec2.internal,45383)
13/12/09 15:45:36,977 INFO storage.BlockManagerMaster: Trying to register BlockManager
13/12/09 15:45:36,988 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager ip-10-170-8-11.ec2.internal:45383 with 2.2 GB RAM
13/12/09 15:45:36,989 INFO storage.BlockManagerMaster: Registered BlockManager
13/12/09 15:45:37,078 INFO server.Server: jetty-7.x.y-SNAPSHOT
package org.example
abstract class Person(val name: String)
// cannot be case class, because case classes have all parameters as vals and it wouldn't make sense to lazily instantiate them
class Girl(val name2: String, _boyfriend: => Boy) extends Person(name2) {
lazy val boyfriend = _boyfriend
}
class Boy(val name2: String, _girlfriend: => Girl) extends Person(name2) {
@gregakespret
gregakespret / gist:813b540faca678413ad4
Created May 21, 2014 21:53
java.io.IOException: Failed to save output of task
14/05/21 21:44:45 ERROR SparkHadoopWriter: Error committing the output of task: attempt_201405212144_0000_m_000000_3432
java.io.IOException: Failed to save output of task: attempt_201405212144_0000_m_000000_3432
at org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:160)
at org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:172)
at org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:132)
at org.apache.hadoop.mapred.SparkHadoopWriter.commit(SparkHadoopWriter.scala:110)
at org.apache.spark.rdd.PairRDDFunctions.org$apache$spark$rdd$PairRDDFunctions$$writeToFile$1(PairRDDFunctions.scala:731)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:734)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:734)
at org.apache.spark.scheduler.ResultTask.runTask(Result
@gregakespret
gregakespret / gist:95e74f28551edc8a90c6
Created November 5, 2014 19:27
Polymorphic Deserialization Jackson using custom subtypes with default class fallback
import com.fasterxml.jackson.annotation.JsonSubTypes.Type
import com.fasterxml.jackson.annotation.{JsonTypeName, JsonSubTypes, JsonTypeInfo, JsonProperty}
import com.fasterxml.jackson.databind.{ObjectMapper,DeserializationFeature}
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
@JsonTypeInfo(
use = JsonTypeInfo.Id.NAME,
include = JsonTypeInfo.As.PROPERTY,
property = "clazz",
@gregakespret
gregakespret / gist:570998fccd6ca6e24ad4
Created March 15, 2015 19:26
Histogram using TDigest
import java.io.File
import java.nio.charset.Charset
import com.tdunning.math.stats.{ArrayDigest, TDigest}
import scala.collection.JavaConversions._ // needed for java Collection -> scala Seq
import scala.io.Source
import com.google.common.io.Files
object Histogram extends App {
val distribution: TDigest = TDigest.createArrayDigest(35, 1000)
@gregakespret
gregakespret / gist:e2bfd4eccaf60c1d9c3d
Last active June 27, 2018 10:09
Data Scientist Assignment

Celtra Data Scientist Assignment

First of all, thank you for taking the time to do this assignment.

There are many possible ways to solve this data problem. Your solution will help us gain insight into how you think, what tools and technologies you like to use and how you use them. Hopefully, we may be able to learn something from you, as well :)

As you will notice, not every detail is clearly defined. You have the freedom to make your own choices where you see fit. But you can also ask questions, of course.

Please e-mail your solution to the person that gave it to you within the agreed time.