Skip to content

Instantly share code, notes, and snippets.

@ibalashov
Created November 12, 2016 19:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ibalashov/e337e8d7e43c5780665e66fa4fd35a88 to your computer and use it in GitHub Desktop.
Save ibalashov/e337e8d7e43c5780665e66fa4fd35a88 to your computer and use it in GitHub Desktop.
Reading broken avro file with DataFileReader does not produce any exception. fastavro duly complains on the same file.
% cat twitter.snappy.incomplete.avro
Objavro.codec
snappyavro.schema�{"type":"record","name":"twitter_schema","namespace":"com.miguno.avro","fields":[{"name":"username","type":"string","doc":"Name of the user account on Twitter.com"},{"name":"tweet","type":"string","doc":"The content of the user's Twitter message"},{"name":"timestamp","type":"long","doc":"Unix epoch time in milliseconds"}],"doc:":"A basic schema for storing Twitter messages"}5\�1��~����H����d�c
migunoFRock: Nerf paper, scissors is fine.���
BlizzardCSFWor%
% fastavro twitter.snappy.incomplete.avro
Traceback (most recent call last):
File "/usr/local/bin/fastavro", line 9, in <module>
load_entry_point('fastavro==0.9.9', 'console_scripts', 'fastavro')()
File "/usr/local/lib/python2.7/site-packages/fastavro/__main__.py", line 54, in main
for record in reader:
File "fastavro/_reader.py", line 469, in _iter_avro (fastavro/_reader.c:9127)
File "fastavro/_reader.py", line 422, in fastavro._reader.snappy_read_block (fastavro/_reader.c:8077)
File "fastavro/_reader.py", line 426, in fastavro._reader.snappy_read_block (fastavro/_reader.c:7985)
snappy.UncompressError: Error while decompressing: invalid input
==== AvroReadTest.java ==
@Test
public void testRead() throws Exception {
final File file = new File(getClass().getResource("/twitter.snappy.incomplete.avro").getFile());
final GenericDatumReader<GenericRecord> genericDatumReader = new GenericDatumReader<GenericRecord>();
final DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, genericDatumReader);
while (dataFileReader.hasNext()) {
System.out.println(dataFileReader.next());
}
}
===============================================
Default Suite
Total tests run: 1, Failures: 0, Skips: 0
===============================================
@sb2nov
Copy link

sb2nov commented Feb 11, 2020

I'm seeing this error. Could you explain what is breaking the avro file here ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment