This code shows how to use reflection to write arbitrary java beans to parquet files with Apache Avro.
Example:
import com.google.common.collect.Iterables;
ParquetWriterHelper<BeanClass> writer = new ParquetWriterHelper<>(BeanClass.class);
Iterable<List<BeanClass>> batches = Iterables.partition(beans, 300_000);
int cnt = 0;
for (List<BeanClass> batch : batches) {
String name = String.format("part-%05d.snappy.parquet", cnt);
writer.write(batch, name);
cnt++;
}
Dependencies to add to pom.xml:
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-avro</artifactId>
<version>1.8.1</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop</artifactId>
<version>1.8.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>1.1.0</version>
</dependency>
Hi, this is very handy and basically the thing I just wanted to write.
You do not seem to have a license for your gists, anywhere. Could you specify the license?