Skip to content

Instantly share code, notes, and snippets.

View iandow's full-sized avatar

Ian Downard iandow

View GitHub Profile
{
"handler": "Microsoft.Compute.MultiVm",
"version": "0.0.1-preview",
"parameters":
{
"basics":
[
{
"name": "environmentName",
"type": "Microsoft.Common.TextBox",
/* Copyright (c) 2009 & onwards. MapR Tech, Inc., All rights reserved */
package com.mapr.demo.finserv;
import org.apache.htrace.fasterxml.jackson.databind.ObjectMapper;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import com.mapr.db.MapRDB;
import com.mapr.db.Table;
import org.ojai.Document;
package com.mapr.demo.finserv;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.google.common.base.Charsets;
import java.io.IOException;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.GregorianCalendar;
@iandow
iandow / kafka
Last active November 17, 2017 15:21
Simple Kafka Ubuntu init.d Startup Script
DAEMON_PATH=/opt/kafka/
PATH=$PATH:$DAEMON_PATH/bin
# See how we were called.
case "$1" in
start)
# Start daemon.
echo "Starting Zookeeper";
nohup $DAEMON_PATH/bin/zookeeper-server-start.sh -daemon /$DAEMON_PATH/config/zookeeper.properties 2> /dev/null && \
echo "Starting Kafka";
package com.mapr.test;
/******************************************************************************
* PURPOSE:
* This Kafka consumer is designed to measure how fast we can consume
* messages from a topic and persist them to MapR-DB. It output throughput
* stats to stdout.
*
* This Kafka consumer reads NYSE Tick data from a MapR Stream topic and
* persists each message in a MapR-DB table as a JSON Document, which can

BY USING THIS SOFTWARE, YOU EXPRESSLY ACCEPT AND AGREE TO THE TERMS OF THE AGREEMENT CONTAINED IN THIS GITHUB REPOSITORY. See the file EULA.md for details.

An Example Application for Processing Stock Market Trade Data on the MapR Converged Data Platform

This project provides a processing engine for ingesting real time streams of trades, bids and asks into MapR Streams at a high rate. The application consists of the following components:

  • A Producer microservice that streams trades, bids and asks using the NYSE TAQ format. The data source is the Daily Trades dataset described here. The schema for our data is detailed in Table 6, "Daily Trades File Data Fields", on page 26 of Daily TAQ Client Specification (from December 1st, 2013).
  • A multi-threaded Consumer microservice that indexes the trades by receiver and sender.
  • Example Spark code for querying the indexed streams at interactive speeds, enabling Spark SQL
@iandow
iandow / MaprThroughputParameterizedByMessageSize.sh
Last active January 28, 2023 07:42
MapR Streams Throughput Parameterized By Message Size
# DESCRIPTION: Perf test streams with a producer sending 1 million messages with
# record sizes between 10, 100, 500, 1000, 2000, 5000, 10000 to 1 topic with 1
# partition.
maprcli stream delete -path /user/mapr/iantest
maprcli stream create -path /user/mapr/iantest -produceperm p -consumeperm p -topicperm p -compression off
THREADCOUNT=1
TOPICS=1
PARTITIONS=1
MESSAGES=1000000
# print csv header
# Run these commands to measure consumer throughput in Kafka
MESSAGES=1000000
echo "numMsgs, sizeMsg, start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec" >> consumer.csv
for MSGSIZE in 10 100 1000 2000 4000 6000 8000 10000; do
for i in `seq 1 20`; do
TOPIC='iantest-'$MSGSIZE
echo $TOPIC
/opt/kafka_2.11-0.10.0.1/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic $TOPIC --config compression.type=uncompressed
/opt/kafka_2.11-0.10.0.1/bin/kafka-producer-perf-test.sh --topic $TOPIC --num-records $MESSAGES --record-size $MSGSIZE --throughput -1 --producer-props bootstrap.servers=localhost:9092 buffer.memory=67108864 batch.size=8196 acks=all >& /dev/null
echo -n "$MESSAGES, $MSGSIZE, " >> consumer.csv
####################################################################################
# DESCRIPTION: Kafka performance test. One producer sending 1 million messages with
# record sizes between 10, 100, 500, 1000, 2000, 5000, 10000 to 1 topic with 1
# partition.
# PRECONDITIONS: Kafka and Zookeeper services must be running.
####################################################################################
THREADCOUNT=1
TOPICS=1
PARTITIONS=1
# init csv file
echo "numPartitions, sizeMsg, topicCount, numMessages, Average nMsgs/sec, Average nKBs/sec, Average latency (ms), eth0 TX kB/s, eth0 RX kB/s" >> partitions-kafka.csv
for NUMPARTITIONS in 1 2 3 4 5 6 7 8 9 10 15 20 25 30 35 40 45 50 60 70 80 90 100 150 200 250 300 350 400; do
~/cleanup.sh >& /dev/null
sleep 5
TOPIC='iantest-'$NUMPARTITIONS
MESSAGES=1000000