Create a gist now

Instantly share code, notes, and snippets.

Avro append-to-existing file example with the DataFileWriter.appendTo(…) API.
package com.cloudera.example;
import org.apache.avro.Schema;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.file.SeekableInput;
import org.apache.avro.mapred.FsInput;
import org.apache.avro.reflect.ReflectData;
import org.apache.avro.reflect.ReflectDatumWriter;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class DFWAppendTest {
public static class Sample {
CharSequence foo;
public Sample(CharSequence bar) { = bar;
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://localhost");
conf.setInt("dfs.replication", 1);
FileSystem fs = FileSystem.get(conf);
Schema sample = ReflectData.get().getSchema(Sample.class);
ReflectDatumWriter<Sample> rdw = new ReflectDatumWriter<DFWAppendTest.Sample>(
DataFileWriter<Sample> dfwo = new DataFileWriter<DFWAppendTest.Sample>(rdw);
Path filePath = new Path("/sample.avro");
OutputStream out = fs.create(filePath);
DataFileWriter<Sample> dfw = dfwo.create(sample, out);
dfw.append(new Sample("Eggs"));
dfw.append(new Sample("Spam"));
OutputStream aout = fs.append(filePath);
dfw = dfwo.appendTo(new FsInput(filePath, conf), aout);
dfw.append(new Sample("Monty"));
dfw.append(new Sample("Python"));

Do you know if something like has to be set to true in the site file on Hadoop configuration for append operation to work?



Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment