Skip to content

Instantly share code, notes, and snippets.

Russell Jurney rjurney

Block or report user

Report or block rjurney

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@rjurney
rjurney / beaconingFeature.scala
Created Apr 13, 2014
How to return an eventTime sorted array of values for each unique grouping of (periodSeconds, cIp, csHost, requestMethod, userAgent)
View beaconingFeature.scala
package com.securityx.modelfeature.resources
import javax.ws.rs.{QueryParam, GET, Produces, Path}
import scala.Array
import javax.ws.rs.core.{Response, MediaType}
import org.slf4j.{LoggerFactory, Logger}
import org.joda.time.format.{ISODateTimeFormat, DateTimeFormatter}
import org.joda.time.DateTimeZone
import com.securityx.modelfeature.dao.{FeatureDao, BeaconActivityDao}
@rjurney
rjurney / BeaconActivityFeature.scala
Created Apr 15, 2014
Scala for time series controller - how can I improve/speed this up?
View BeaconActivityFeature.scala
def fillZeros(startDateStr : String, endDateStr : String, periodSeconds : Int, jsonKey : scala.collection.immutable.Map[String,Any], group : ListBuffer[collection.mutable.Map[String, Any]]): scala.collection.mutable.ListBuffer[scala.collection.mutable.Map[String,Any]] = {
var finalTimeSeries = ListBuffer[collection.mutable.Map[String, Any]]()
var startDate = MutableDateTime.parse(startDateStr)
val endDate = MutableDateTime.parse(endDateStr)
while(startDate.isBefore(endDate)) {
println(startDate.toString + " is before " + endDate.toString)
// If our results have an entry for this timestamp, append it to the final array
val searchResult = group.filter(x => x.get("eventTime") == Some(startDate.toString))
if (searchResult.length > 0) {
@rjurney
rjurney / test.scala
Created May 28, 2014
Failure to read Avro RDD
View test.scala
scala> val avroRdd = sc.newAPIHadoopFile("hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/27/19/*", classOf[AvroKeyInputFormat[GenericRecord]], classOf[AvroKey[GenericRecord]], classOf[NullWritable])
14/05/27 17:02:49 INFO storage.MemoryStore: ensureFreeSpace(167954) called with curMem=0, maxMem=308713881
14/05/27 17:02:49 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 164.0 KB, free 294.3 MB)
avroRdd: org.apache.spark.rdd.RDD[(org.apache.avro.mapred.AvroKey[org.apache.avro.generic.GenericRecord], org.apache.hadoop.io.NullWritable)] = NewHadoopRDD[0] at newAPIHadoopFile at <console>:23
scala> avroRdd.take(1)
14/05/27 17:03:05 INFO input.FileInputFormat: Total input paths to process : 21
14/05/27 17:03:05 INFO spark.SparkContext: Starting job: take at <console>:26
14/05/27 17:03:05 INFO scheduler.DAGScheduler: Got job 0 (take at <console>:26) with 1 output partitions (allowLocal=true)
14/05/27 17:03:05 INFO scheduler.DAGScheduler: Final stage: Stage 0 (take at <cons
@rjurney
rjurney / results.scala
Created May 28, 2014
Line 15 for me...
View results.scala
scala> val avroRdd = sc.newAPIHadoopFile("hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/27/19/part-m-00019.avro",
| classOf[AvroKeyInputFormat[GenericRecord]],
| classOf[AvroKey[GenericRecord]],
| classOf[NullWritable])
14/05/27 19:44:58 INFO storage.MemoryStore: ensureFreeSpace(167562) called with curMem=369637, maxMem=308713881
14/05/27 19:44:58 INFO storage.MemoryStore: Block broadcast_3 stored as values to memory (estimated size 163.6 KB, free 293.9 MB)
avroRdd: org.apache.spark.rdd.RDD[(org.apache.avro.mapred.AvroKey[org.apache.avro.generic.GenericRecord], org.apache.hadoop.io.NullWritable)] = NewHadoopRDD[7] at newAPIHadoopFile at <console>:41
scala> val genericRecords = avroRdd.map{case (ak, _) => ak.datum()}
genericRecords: org.apache.spark.rdd.RDD[org.apache.avro.generic.GenericRecord] = MappedRDD[8] at map at <console>:43
@rjurney
rjurney / avro.scala
Last active Aug 29, 2015
Unable to read an Avro in Scala
View avro.scala
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.AvroKey
import org.apache.avro.mapreduce.AvroKeyInputFormat
import org.apache.hadoop.io.NullWritable
import org.apache.commons.lang.StringEscapeUtils.escapeCsv
val file = sc.textFile("hdfs://hivecluster2/securityx/beaconing_activity.txt/2014/05/12/14/hour")
View gist:966acf3071224d4cf768
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.AvroKey
import org.apache.avro.mapreduce.AvroKeyInputFormat
import org.apache.hadoop.io.NullWritable
import org.apache.commons.lang.StringEscapeUtils.escapeCsv
import org.apache.avro.file.DataFileStream
@rjurney
rjurney / avro.scala
Created Jun 2, 2014
Results when loading a directory full of Avros
View avro.scala
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.AvroKey
import org.apache.avro.mapred.AvroInputFormat
import org.apache.avro.mapreduce.AvroKeyInputFormat
import org.apache.hadoop.io.NullWritable
import org.apache.commons.lang.StringEscapeUtils.escapeCsv
@rjurney
rjurney / avro.scala
Created Jun 3, 2014
Loading Avros in Spark Shell...
View avro.scala
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.AvroKey
import org.apache.avro.mapred.AvroInputFormat
import org.apache.avro.mapreduce.AvroKeyInputFormat
import org.apache.hadoop.io.NullWritable
import org.apache.commons.lang.StringEscapeUtils.escapeCsv
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import org.apache.hadoop.conf.Configuration
View stdout
<console>:43: error: type mismatch;
found : Class[T]
required: org.apache.avro.Schema
val reader = new GenericDatumReader[T](classManifest[T].erasure.asInstanceOf[Class[T]])
^
<console>:44: error: type mismatch;
found : Class[T]
required: org.apache.avro.Schema
val writer = new GenericDatumWriter[T](classManifest[T].erasure.asInstanceOf[Class[T]])
@rjurney
rjurney / test.scala
Created Jun 17, 2014
Trying to build a Map inside a map operation
View test.scala
import javax.ws.rs.{QueryParam, GET, Produces, Path}
import scala.Array
import javax.ws.rs.core.{Response, MediaType}
import org.slf4j.{LoggerFactory, Logger}
import org.joda.time.format.{ISODateTimeFormat, DateTimeFormatter}
import org.joda.time.DateTimeZone
import com.securityx.modelfeature.dao.BeaconsDao
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.module.scala.DefaultScalaModule
You can’t perform that action at this time.