Skip to content

Instantly share code, notes, and snippets.

View ldcasillas-progreso's full-sized avatar

Luis Casillas ldcasillas-progreso

View GitHub Profile
@ldcasillas-progreso
ldcasillas-progreso / RedshiftDelimitedParser.java
Created April 16, 2015 18:33
An OpenCSV-based DelimitedParser for Cascading that we've used successfully to interface with Redshift. I do not vouch for correctness under all circumstances...
package com.progressfin.cascading.util.scheme;
import au.com.bytecode.opencsv.CSVParser;
import cascading.scheme.util.DelimitedParser;
import cascading.scheme.util.FieldTypeResolver;
import cascading.tap.TapException;
import cascading.tuple.Fields;
import cascading.tuple.Tuple;
import cascading.util.Util;
import org.slf4j.Logger;
@ldcasillas-progreso
ldcasillas-progreso / AvroSerde.java
Created November 19, 2015 22:11
Samza Avro Serde. Has no schema management support.
package com.oportun.fraud.demo.serde.avro;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.IndexedRecord;
import org.apache.avro.io.BinaryEncoder;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
@ldcasillas-progreso
ldcasillas-progreso / AvroJsonSerde.java
Created November 19, 2015 22:21
SpecificAvroJsonSerdeFactory, for Samza
package com.oportun.fraud.demo.serde.avro;
import com.google.common.base.Charsets;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.IndexedRecord;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.DecoderFactory;
Producer
Setup
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3
Single thread, no replication
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196