Skip to content

Instantly share code, notes, and snippets.

@rjurney
Forked from anonymous/Example.pig
Created December 24, 2012 07:20
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rjurney/4368194 to your computer and use it in GitHub Desktop.
Save rjurney/4368194 to your computer and use it in GitHub Desktop.
I want to extend Pig's existing XMLLoader to go beyond capturing the text inside a tag and to actually create a Pig mapping of the Document Object Model the XML represents. This would be similar to elephant-bird's JsonLoader. Semi-structured data can vary, so this behavior can be risky but... I want people to be able to load JSON and XML data ea…
characters = load 'example.xml' using XMLLoader('character');
describe characters
{properties:map[], name:chararray, born:datetime, qualification:chararray}
<book id="b0836217462" available="true">
<isbn>
0836217462
</isbn>
<title lang="en">
Being a Dog Is a Full-Time Job
</title>
<author id="CMS">
<name>
Charles M Schulz
</name>
<born>
1922-11-26
</born>
<dead>
2000-02-12
</dead>
</author>
<character id="PP">
<name>
Peppermint Patty
</name>
<born>
1966-08-22
</born>
<qualification>
bold, brash and tomboyish
</qualification>
</character>
<character id="Snoopy">
<name>
Snoopy
</name>
<born>
1950-10-04
</born>
<qualification>
extroverted beagle
</qualification>
</character>
<character id="Schroeder">
<name>
Schroeder
</name>
<born>
1951-05-30
</born>
<qualification>
brought classical music to the Peanuts strip
</qualification>
</character>
<character id="Lucy">
<name>
Lucy
</name>
<born>
1952-03-03
</born>
<qualification>
bossy, crabby and selfish
</qualification>
</character>
</book>
</library>
@syedaliammar
Copy link

syedaliammar commented Jun 3, 2016

Hi @rjurney, How can one read such an XML file using Pig XMLLoader to transform the tags into a tabular format? Any/all help will be useful.

@kullaireddy
Copy link

could you please send the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment