This code shows how to use reflection to write arbitrary java beans to parquet files with Apache Avro.
Example:
import com.google.common.collect.Iterables;
ParquetWriterHelper<BeanClass> writer = new ParquetWriterHelper<>(BeanClass.class);
Iterable<List<BeanClass>> batches = Iterables.partition(beans, 300_000);
int cnt = 0;
for (List<BeanClass> batch : batches) {
String name = String.format("part-%05d.snappy.parquet", cnt);
writer.write(batch, name);
cnt++;
}
Dependencies to add to pom.xml:
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-avro</artifactId>
<version>1.8.1</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop</artifactId>
<version>1.8.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>1.1.0</version>
</dependency>
org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/home/lrao/scanwork/a423448b-fc42-46b6-a0f6-88f10fcdb653/a423448b-fc42-46b6-a0f6-88f10fcdb653.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:129)
java.lang.ClassCastException: com.sherlock.dao.ScanResultsRow cannot be cast to org.apache.avro.generic.IndexedRecord
at org.apache.avro.generic.GenericData.setField(GenericData.java:569)
at org.apache.parquet.avro.AvroRecordConverter.set(AvroRecordConverter.java:295)
at org.apache.parquet.avro.AvroRecordConverter$1.add(AvroRecordConverter.java:109)
at org.apache.parquet.avro.AvroConverters$BinaryConverter.addBinary(AvroConverters.java:62)
at org.apache.parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:323)
at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:371)
at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:405)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:218)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:129)
I get above errors. any idea what Im missing?