Skip to content

Instantly share code, notes, and snippets.

View javagrails's full-sized avatar
💗
Coding - Coding - Coding & Coding

Salman* javagrails

💗
Coding - Coding - Coding & Coding
View GitHub Profile
@kordless
kordless / README.md
Last active August 18, 2021 18:43
Indexing XKCD with Lucidwork's Fusion and Google Image API

Overview

This Seed Streams guide illustrates how to use Lucidworks Fusion to crawl a specific set of documents on a website whose URIs match a regular expression. Additionally, img src fields are extracted with a JavaScript parsing stage and inserted into the index for use in other indexing stages. A vision network may be utilized to extract additional fields from the images.

Start Fusion and Create a New Appliction

  1. Start a Fusion instance on Google. Click the link the script outputs to navigate to the Fusion instance page. Set a password. Login with admin and the new password.
  2. Create a new application. Call it XKCD.
  3. Click on the new application.

Add a New Datasource and Limit the Documents

  1. Create a new datasource under Indexing..Datasources. Add a web source. Add https://xkcd.com a