Skip to content

Instantly share code, notes, and snippets.

==================
==================
WARNING: DATA RACE
Read at 0x00c4200259b8 by goroutine 14:
runtime.slicecopy()
/usr/local/Cellar/go/1.10.2/libexec/src/runtime/slice.go:192 +0x0
github.com/ABC/byoa-price-engine/vendor/github.com/valyala/fasthttp.copyArgs()
/Users/geri/intelligence/src/github.com/ABC/byoa-price-engine/vendor/github.com/valyala/fasthttp/args.go:320 +0x27a
github.com/ABC/byoa-price-engine/vendor/github.com/valyala/fasthttp.(*RequestHeader).CopyTo()
/Users/geri/intelligence/src/github.com/ABC/byoa-price-engine/vendor/github.com/valyala/fasthttp/header.go:703 +0x7e0
https://stackoverflow.com/questions/33878433/spark-write-avro-file
http://www.bigdatatidbits.cc/2015/01/how-to-load-some-avro-data-into-spark.html
https://stackoverflow.com/questions/33899417/avro-schema-to-spark-structtype/
https://stackoverflow.com/questions/36078420/spark-avro-to-parquet
https://github.com/tomwhite/hadoop-book/blob/master/ch19-spark/src/test/scala/RDDCreationTest.scala
https://gist.github.com/MLnick/5864741781b9340cb211
http://alvincjin.blogspot.com/2015/11/append-spark-dataframe-with-new-column.html
https://stackoverflow.com/questions/27033823/how-to-overwrite-the-output-directory-in-spark
https://gist.github.com/yzhong52/f81e929e5810271292bd08856e2f4512
https://stackoverflow.com/questions/41567859/extract-a-column-value-from-a-spark-dataframe-and-add-it-to-another-dataframe
@Arnold1
Arnold1 / main.sql
Created January 23, 2018 17:20
query
select mm_date, COUNT(DISTINCT mm_id)
from data1 join data2 on (data1.mm_id = data2.mm_id)
where mm_date between '2018-01-22' and '2018-01-22'
and data2.type <> 'test1' and data2.type <> 'test2'
FAILED: SemanticException Column mm_id Found in more than One Tables/Subqueries
select im_date, im_id, Count(*)
from data
where im_date between '2018-01-22' and '2018-01-22'
and im_id=12345
group by im_date, im_id
@Arnold1
Arnold1 / KafkaSparkPopularHashTags.scala
Created November 11, 2017 03:26 — forked from stdatalabs/KafkaSparkPopularHashTags.scala
A Spark Streaming - Kafka integration to receive twitter data from kafka topic and find the popular hashtags. More @ stdatalabs.blogspot.com
import java.util.HashMap
import org.apache.kafka.clients.producer.{ KafkaProducer, ProducerConfig, ProducerRecord }
import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._
import org.apache.spark.streaming.{ Seconds, StreamingContext }
import org.apache.spark.SparkContext._
import org.apache.spark.streaming.twitter._
import org.apache.spark.SparkConf
@Arnold1
Arnold1 / SparkPopularHashTags.scala
Created November 11, 2017 03:23 — forked from stdatalabs/SparkPopularHashTags.scala
TwitterPopularHashTags using Spark Streaming. More @ stdatalabs.blogspot.com
import org.apache.spark.streaming.{ Seconds, StreamingContext }
import org.apache.spark.SparkContext._
import org.apache.spark.streaming.twitter._
import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.{ SparkContext, SparkConf }
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.flume._
/**
@Arnold1
Arnold1 / TweetStreams.scala
Created November 11, 2017 03:21 — forked from samklr/TweetStreams.scala
TweetStream with Spark
import Utils
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.{Seconds, StreamingContext}
import StreamingContext._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming.twitter._
import org.apache.spark.SparkConf
import org.apache.spark.{SparkConf, SparkContext}
@Arnold1
Arnold1 / 00-OozieWorkflowHdfsAndEmailActions
Created November 8, 2017 20:23 — forked from airawat/00-OozieWorkflowHdfsAndEmailActions
Oozie workflow application with FS and email actions; Includes sample data, workflow components, commands.
This gist includes components of a simple workflow application that created a directory and moves files within
hdfs to this directory;
Emails are sent out to notify designated users of success/failure of workflow. There is a prepare section,
to allow re-run of the action..the prepare essentially negates the move done by a potential prior run
of the action. Sample data is also included.
The sample application includes:
--------------------------------
1. Oozie actions: hdfs action and email action
2. Oozie workflow controls: start, end, and kill.
@Arnold1
Arnold1 / 00-OozieWorkflowHdfsAndEmailActions
Created November 8, 2017 20:23 — forked from airawat/00-OozieWorkflowHdfsAndEmailActions
Oozie workflow application with FS and email actions; Includes sample data, workflow components, commands.
This gist includes components of a simple workflow application that created a directory and moves files within
hdfs to this directory;
Emails are sent out to notify designated users of success/failure of workflow. There is a prepare section,
to allow re-run of the action..the prepare essentially negates the move done by a potential prior run
of the action. Sample data is also included.
The sample application includes:
--------------------------------
1. Oozie actions: hdfs action and email action
2. Oozie workflow controls: start, end, and kill.
@Arnold1
Arnold1 / cv2ff.cpp
Created November 5, 2017 02:48 — forked from yohhoy/cv2ff.cpp
Convert from OpenCV image and write movie with FFmpeg
/*
* Convert from OpenCV image and write movie with FFmpeg
*
* Copyright (c) 2016 yohhoy
*/
#include <iostream>
#include <vector>
// FFmpeg
extern "C" {
#include <libavformat/avformat.h>