Skip to content

Instantly share code, notes, and snippets.

@sheimi
Created November 9, 2014 05:23
Show Gist options
  • Save sheimi/67934213ca0b70ea9b3b to your computer and use it in GitHub Desktop.
Save sheimi/67934213ca0b70ea9b3b to your computer and use it in GitHub Desktop.
code in blog.sheimi.me: 2012-05-13-hadoop-source-code-01 (2)
linkDbTool.invert(linkDb, segments, true, true, false); // invert links
if (solrUrl != null) {
// index, dedup & merge
FileStatus[] fstats = fs.listStatus(segments, HadoopFSUtil.getPassDirectoriesFilter(fs));
SolrIndexer indexer = new SolrIndexer(getConf());
indexer.indexSolr(solrUrl, crawlDb, linkDb,
Arrays.asList(HadoopFSUtil.getPaths(fstats)));
SolrDeleteDuplicates dedup = new SolrDeleteDuplicates();
dedup.setConf(getConf());
dedup.dedup(solrUrl);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment