Skip to content

Instantly share code, notes, and snippets.

View rjurney's full-sized avatar

Russell Jurney rjurney

View GitHub Profile
@rjurney
rjurney / beaconingFeature.scala
Created April 13, 2014 21:19
How to return an eventTime sorted array of values for each unique grouping of (periodSeconds, cIp, csHost, requestMethod, userAgent)
package com.securityx.modelfeature.resources
import javax.ws.rs.{QueryParam, GET, Produces, Path}
import scala.Array
import javax.ws.rs.core.{Response, MediaType}
import org.slf4j.{LoggerFactory, Logger}
import org.joda.time.format.{ISODateTimeFormat, DateTimeFormatter}
import org.joda.time.DateTimeZone
import com.securityx.modelfeature.dao.{FeatureDao, BeaconActivityDao}
@rjurney
rjurney / BeaconActivityFeature.scala
Created April 15, 2014 19:04
Scala for time series controller - how can I improve/speed this up?
def fillZeros(startDateStr : String, endDateStr : String, periodSeconds : Int, jsonKey : scala.collection.immutable.Map[String,Any], group : ListBuffer[collection.mutable.Map[String, Any]]): scala.collection.mutable.ListBuffer[scala.collection.mutable.Map[String,Any]] = {
var finalTimeSeries = ListBuffer[collection.mutable.Map[String, Any]]()
var startDate = MutableDateTime.parse(startDateStr)
val endDate = MutableDateTime.parse(endDateStr)
while(startDate.isBefore(endDate)) {
println(startDate.toString + " is before " + endDate.toString)
// If our results have an entry for this timestamp, append it to the final array
val searchResult = group.filter(x => x.get("eventTime") == Some(startDate.toString))
if (searchResult.length > 0) {
@rjurney
rjurney / test.scala
Created May 28, 2014 00:06
Failure to read Avro RDD
scala> val avroRdd = sc.newAPIHadoopFile("hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/27/19/*", classOf[AvroKeyInputFormat[GenericRecord]], classOf[AvroKey[GenericRecord]], classOf[NullWritable])
14/05/27 17:02:49 INFO storage.MemoryStore: ensureFreeSpace(167954) called with curMem=0, maxMem=308713881
14/05/27 17:02:49 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 164.0 KB, free 294.3 MB)
avroRdd: org.apache.spark.rdd.RDD[(org.apache.avro.mapred.AvroKey[org.apache.avro.generic.GenericRecord], org.apache.hadoop.io.NullWritable)] = NewHadoopRDD[0] at newAPIHadoopFile at <console>:23
scala> avroRdd.take(1)
14/05/27 17:03:05 INFO input.FileInputFormat: Total input paths to process : 21
14/05/27 17:03:05 INFO spark.SparkContext: Starting job: take at <console>:26
14/05/27 17:03:05 INFO scheduler.DAGScheduler: Got job 0 (take at <console>:26) with 1 output partitions (allowLocal=true)
14/05/27 17:03:05 INFO scheduler.DAGScheduler: Final stage: Stage 0 (take at <cons
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.AvroKey
import org.apache.avro.mapreduce.AvroKeyInputFormat
import org.apache.hadoop.io.NullWritable
import org.apache.commons.lang.StringEscapeUtils.escapeCsv
import org.apache.avro.file.DataFileStream
@rjurney
rjurney / avro.scala
Last active August 29, 2015 14:01
Unable to read an Avro in Scala
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.AvroKey
import org.apache.avro.mapreduce.AvroKeyInputFormat
import org.apache.hadoop.io.NullWritable
import org.apache.commons.lang.StringEscapeUtils.escapeCsv
val file = sc.textFile("hdfs://hivecluster2/securityx/beaconing_activity.txt/2014/05/12/14/hour")
@rjurney
rjurney / results.scala
Created May 28, 2014 02:45
Line 15 for me...
scala> val avroRdd = sc.newAPIHadoopFile("hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/27/19/part-m-00019.avro",
| classOf[AvroKeyInputFormat[GenericRecord]],
| classOf[AvroKey[GenericRecord]],
| classOf[NullWritable])
14/05/27 19:44:58 INFO storage.MemoryStore: ensureFreeSpace(167562) called with curMem=369637, maxMem=308713881
14/05/27 19:44:58 INFO storage.MemoryStore: Block broadcast_3 stored as values to memory (estimated size 163.6 KB, free 293.9 MB)
avroRdd: org.apache.spark.rdd.RDD[(org.apache.avro.mapred.AvroKey[org.apache.avro.generic.GenericRecord], org.apache.hadoop.io.NullWritable)] = NewHadoopRDD[7] at newAPIHadoopFile at <console>:41
scala> val genericRecords = avroRdd.map{case (ak, _) => ak.datum()}
genericRecords: org.apache.spark.rdd.RDD[org.apache.avro.generic.GenericRecord] = MappedRDD[8] at map at <console>:43
@rjurney
rjurney / avro.scala
Created June 2, 2014 01:37
Results when loading a directory full of Avros
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.AvroKey
import org.apache.avro.mapred.AvroInputFormat
import org.apache.avro.mapreduce.AvroKeyInputFormat
import org.apache.hadoop.io.NullWritable
import org.apache.commons.lang.StringEscapeUtils.escapeCsv
@rjurney
rjurney / avro.scala
Created June 3, 2014 03:13
Loading Avros in Spark Shell...
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.AvroKey
import org.apache.avro.mapred.AvroInputFormat
import org.apache.avro.mapreduce.AvroKeyInputFormat
import org.apache.hadoop.io.NullWritable
import org.apache.commons.lang.StringEscapeUtils.escapeCsv
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import org.apache.hadoop.conf.Configuration
<console>:43: error: type mismatch;
found : Class[T]
required: org.apache.avro.Schema
val reader = new GenericDatumReader[T](classManifest[T].erasure.asInstanceOf[Class[T]])
^
<console>:44: error: type mismatch;
found : Class[T]
required: org.apache.avro.Schema
val writer = new GenericDatumWriter[T](classManifest[T].erasure.asInstanceOf[Class[T]])
@rjurney
rjurney / git.txt
Last active August 29, 2015 14:02
Unable to push local commits to remote?
Russells-MacBook-Pro:azkaban2 rjurney$ git remote -v
origin https://github.com/azkaban/azkaban2.git (fetch)
origin https://github.com/azkaban/azkaban2.git (push)
rjurney git@github.com:rjurney/azkaban2.git (fetch)
rjurney git@github.com:rjurney/azkaban2.git (push)
Russells-MacBook-Pro:azkaban2 rjurney$ git branch -v
master cb88bf7 [ahead 60] Merge branch 'master' of https://github.com/azkaban/azkaban2
* remotes/origin/release-2.5 963a85f All my hacks to Azkaban.
v cb88bf7 Merge branch 'master' of https://github.com/azkaban/azkaban2