Skip to content

Instantly share code, notes, and snippets.

@karpet
Last active December 10, 2017 04:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save karpet/e332b9e1f273aadea2d9be4657f3c451 to your computer and use it in GitHub Desktop.
Save karpet/e332b9e1f273aadea2d9be4657f3c451 to your computer and use it in GitHub Desktop.
example Dezi aggregator for a tab-separated file
package MyAggregator;
use Moose;
extends 'Dezi::Aggregator';
use Dezi::Doc;
sub crawl {
my ( $self, $inputfile ) = @_;
open( RF1, $inputfile ) or die "Can't open < $inputfile: $!";
my $header = <RF1>; # read out header line
my $count = 0;
while ( my $line = <RF1> ) {
chomp $line;
$count++;
my @array = split( /\t/, $line );
my $dezi_doc = Dezi::Doc->new( uri => $count, );
$dezi_doc->set_field( 'shopid' => $array[0] );
$dezi_doc->set_field( 'prodtype' => $array[1] );
$dezi_doc->set_field( 'prodid' => $array[2] );
$dezi_doc->set_field( 'prodname' => $array[3] );
my $xml = $dezi_doc->as_string_ref;
my $doc = $self->doc_class->new(
content => $$xml,
url => $count,
modtime => time(),
parser => 'XML*',
type => 'application/xml',
size => length $$xml,
);
$self->indexer->process($doc);
}
close(RF1);
return $count;
}
1;
@karpet
Copy link
Author

karpet commented Dec 10, 2017

You invoke the class from the command line like:

% deziapp -S MyAggregator -i items.tsv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment