public
Last active

Pig-AvroStorage

  • Download Gist
gistfile1.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
Apache Pig version 0.11.0-SNAPSHOT (r1304979) compiled Mar 24 2012, 21:48:44
 
Run my pig script to get my bag of tuples.....
....
....
....
 
grunt> describe c;
c: {franchise_id: int,cast_and_crew: {(full_name: chararray)}}
grunt>illustrate c;
...
...
---------------------------------------------------------------------------------------------
| c | franchise_id:int | cast_and_crew:bag{:tuple(full_name:chararray)} |
---------------------------------------------------------------------------------------------
| | 213939 | {(Wang Junzheng), (Li Ling)} |
---------------------------------------------------------------------------------------------
 
I try to store is using AvroStorage w/ or w/o using a schema:
 
With Schema:
grunt> STORE c INTO 'hdfs://127.0.0.1:9000/user/hadoop/indexer/avro/franchise_cast_and_crew' using org.apache.pig.piggybank.storage.avro.AvroStorage('{"index":1,"schema":{"type":"record","name":"franchise_cast_and_crew","fields":[{"name":"franchise_id","type":"int"},{"name":"cast_and_crew","type":{"type":"array","items":"string"}}]}');
2012-03-24 22:17:57,001 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments '[{"index":1,"schema":{"type":"record","name":"franchise_cast_and_crew","fields":[{"name":"franchise_id","type":"int"},{"name":"cast_and_crew","type":{"type":"array","items":"string"}}]}]'
Details at logfile: /Users/dan.young/pig_1332647783884.log
grunt>
 
Pig Stack Trace
---------------
ERROR 1200: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments '[{"index":1,"schema":{"type":"record","name":"franchise_cast_and_crew","fields":[{"name":"franchise_id","type":"int"},{"name":"cast_and_crew","type":{"type":"array","items":"string"}}]}]'
 
Failed to parse: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments '[{"index":1,"schema":{"type":"record","name":"franchise_cast_and_crew","fields":[{"name":"franchise_id","type":"int"},{"name":"cast_and_crew","type":{"type":"array","items":"string"}}]}]'
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1539)
at org.apache.pig.PigServer.registerQuery(PigServer.java:541)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:945)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:392)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:535)
at org.apache.pig.Main.main(Main.java:153)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments '[{"index":1,"schema":{"type":"record","name":"franchise_cast_and_crew","fields":[{"name":"franchise_id","type":"int"},{"name":"cast_and_crew","type":{"type":"array","items":"string"}}]}]'
at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:546)
at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:791)
at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:780)
at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:4670)
at org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:6312)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1337)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:791)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:509)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:384)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
... 15 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:536)
... 24 more
Caused by: Unexpected token END OF FILE at position 184.
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.apache.pig.piggybank.storage.avro.AvroStorage.parseJsonString(AvroStorage.java:335)
at org.apache.pig.piggybank.storage.avro.AvroStorage.<init>(AvroStorage.java:118)
... 29 more
================================================================================
 
 
Without a schema:
 
grunt> STORE c INTO 'hdfs://127.0.0.1:9000/user/hadoop/indexer/avro/franchise_cast_and_crew' using org.apache.pig.piggybank.storage.avro.AvroStorage();
2012-03-24 22:19:15,012 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2012-03-24 22:19:15,013 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias c
Details at logfile: /Users/dan.young/pig_1332647783884.log
grunt>
 
Pig Stack Trace
---------------
ERROR 1002: Unable to store alias c
 
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias c
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1553)
at org.apache.pig.PigServer.registerQuery(PigServer.java:541)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:945)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:392)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:535)
at org.apache.pig.Main.main(Main.java:153)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.NullPointerException
at org.apache.pig.piggybank.storage.avro.AvroStorageUtils.isTupleWrapper(AvroStorageUtils.java:327)
at org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convert(PigSchema2Avro.java:82)
at org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convert(PigSchema2Avro.java:105)
at org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convertRecord(PigSchema2Avro.java:151)
at org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convert(PigSchema2Avro.java:62)
at org.apache.pig.piggybank.storage.avro.AvroStorage.checkSchema(AvroStorage.java:533)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:65)
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:298)
at org.apache.pig.PigServer.compilePp(PigServer.java:1317)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1254)
at org.apache.pig.PigServer.execute(PigServer.java:1246)
at org.apache.pig.PigServer.access$400(PigServer.java:127)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1548)
... 13 more
================================================================================
 
 
 
My Avro Schema:
{
"index": 1,
"schema": {
"type": "record",
"name": "franchise_cast_and_crew",
"fields": [
{
"name": "franchise_id",
"type": "int"
},
{
"name": "cast_and_crew",
"type": {
"type": "array",
"items": "string"
}
}
]
}
}

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.