Skip to content

Instantly share code, notes, and snippets.

@erichiller
Forked from sebz/grunt-hugo-lunrjs.md
Last active February 12, 2024 16:46
Show Gist options
  • Save erichiller/08fd60c5490667cc371b to your computer and use it in GitHub Desktop.
Save erichiller/08fd60c5490667cc371b to your computer and use it in GitHub Desktop.
hugo + gruntjs + lunrjs = <3 search

Gist Description

How to implement a custom search for Hugo usig Gruntjs and Lunrjs.
Updated from original to allow for url page changes, more detailed install instructions.
See More at www.hiller.pro
Thanks to sebz for the initial writeup!

Requirements

Requires Nodejs

Setup

Project organization

Here is my Hugo based website project structure

	MySite/
		|- src/ <= Hugo source (private) root folder
			|- content <= .md content folders & files
			|- layout/
				|- static
				|- js/
					|- pindex.json <= Where we generate the lunr json index file
					|- lunrjs.min.js <= lunrjs library
					|- ...
			|- ...
			|- config.toml
		|- indexer/ <= lunr and associated node files
			|- node_modules/ <- node dependencies
			|- Gruntfile.js <= Where the magic happens (see below)
			|- package.json <= Dependencies declaration required to build the index
		|- ...

Install the Nodejs dependencies

  1. Navigate to the folder you wish to install the indexer into (in the above example MySite/indexer; and issue the commands:

    touch package.json
    npm --global install grunt-cli
    npm install --save-dev grunt string toml

  2. create the Gruntfile.js (see below)

Note I modified the Gruntfile.js to allow for changed urls

Time to work

The principle

We will work both at buildtime and runtime. With Gruntjs (buildtime), we'll generate a JSON index file and with a small js script (runtime) initilize and use lunrjs.

Build the Lunr index file

Lunrjs allows you to define fields to describe your pages (documents in lunrjs terms) that will be used to search and hopefully find stuff. The index file is basically a JSON file corresponding to an array of all the documents (pages) composing the website.

Here are the fields I chose to describe my pages:

  • title => Frontmatter title or file name
  • tags => Frontmatter tags or nothing
  • content => File content
  • ref => Reworked file path used as absolute URL

ref can be drawn either from the directory position within content or from the url field within the frontmatter

Workflow

  1. Recursively walk through all files of the content folder
  2. Two possibilities
    1. Markdown file
      1. Parse the Frontmatter to extract the title and the tags
      2. Parse and clean the content
    2. HTML file
      1. Parse and clean the content
      2. Use the file name as title
  3. Use the path file as ref (link toward the page)

Show me the code!

Here is the Gruntfile.js file:

var toml = require("toml");
var S = require("string");

var CONTENT_PATH_PREFIX = "../src/content";
var SITE_IDX_DEST = "../src/static/js/pindex.json";

module.exports = function(grunt) {

    grunt.registerTask("lunr-index", function() {

        grunt.log.writeln("Build pages index");

        var indexPages = function() {
            var pagesIndex = [];
            grunt.file.recurse(CONTENT_PATH_PREFIX, function(abspath, rootdir, subdir, filename) {
                grunt.verbose.writeln("Parse file:",abspath);
                pagesIndex.push(processFile(abspath, filename));
            });

            return pagesIndex;
        };

        var processFile = function(abspath, filename) {
            var pageIndex;

            if (S(filename).endsWith(".html")) {
                pageIndex = processHTMLFile(abspath, filename);
            } else {
                pageIndex = processMDFile(abspath, filename);
            }

            return pageIndex;
        };

        var processHTMLFile = function(abspath, filename) {
            var content = grunt.file.read(abspath);
            var pageName = S(filename).chompRight(".html").s;
            var href = S(abspath)
                .chompLeft(CONTENT_PATH_PREFIX).s;
            return {
                title: pageName,
                href: href,
                content: S(content).trim().stripTags().stripPunctuation().s
            };
        };

        var processMDFile = function(abspath, filename) {
            var content = grunt.file.read(abspath);
			grunt.log.ok("READING FILE:" + abspath)
            var pageIndex;
            // First separate the Front Matter from the content and parse it
            content = content.split("+++");
            var frontMatter;
            try {
                frontMatter = toml.parse(content[1].trim());
            } catch (e) {
                grunt.log.error("ERROR WHILST PROCESSING: " + abspath + e.message);
            }
			if (frontMatter.url) {
				var href = frontMatter.url;
			} else {
				var href = S(abspath).chompLeft(CONTENT_PATH_PREFIX).chompRight(".md").s;
				// href for index.md files stops at the folder name
				if (filename === "index.md") {
					href = S(abspath).chompLeft(CONTENT_PATH_PREFIX).chompRight(filename).s;
				}
			}


            // Build Lunr index for this page
            pageIndex = {
                title: frontMatter.title,
                tags: frontMatter.tags,
                href: href,
                content: S(content[2]).trim().stripTags().stripPunctuation().s
            };

            return pageIndex;
        };

        grunt.file.write(SITE_IDX_DEST, JSON.stringify(indexPages()));
        grunt.log.ok("Index built");
    });
};

Example index file looks like:

[{
    "title": "Page1",
    "href": "/section/page1",
    "content": " This is the cleaned content of 'site/content/section/page1.md' "
}, {
    "title": "Page2",
    "tags": ["tag1", "tag2", "tag3"],
    "href": "/section/page2",
    "content": " This is the cleaned content of 'site/content/section/page2.md' "
}, {
    "title": "Page3",
    "href": "/section/page3",
    "content": " This is the cleaned content of 'site/content/section/page3.md' "
}]

Launch the task: grunt lunr-index or to run from any directory grunt --gruntfile the/remote/directory/indexer/Gruntfile.js lunr-index

Use the index

On the client side here is a small usage example:

<!DOCTYPE html>
<html>

<head>
    <title>Hugo + Lunrjs = &lt;3 search </title>
</head>

<body>
    Search:
    <input id="search" type="text">
    <br> Results:
    <ul id="results">
    </ul>
    <script type="text/javascript" src="https://code.jquery.com/jquery-2.1.3.min.js"></script>
    <script type="text/javascript" src="js/vendor/lunr.min.js"></script>
    <script type="text/javascript">
    var lunrIndex,
        $results,
        pagesIndex;

    // Initialize lunrjs using our generated index file
    function initLunr() {
        // First retrieve the index file
        $.getJSON("js/lunr/PagesIndex.json")
            .done(function(index) {
                pagesIndex = index;
                console.log("index:", pagesIndex);

                // Set up lunrjs by declaring the fields we use
                // Also provide their boost level for the ranking
                lunrIndex = lunr(function() {
                    this.field("title", {
                        boost: 10
                    });
                    this.field("tags", {
                        boost: 5
                    });
                    this.field("content");

                    // ref is the result item identifier (I chose the page URL)
                    this.ref("href");
                });

                // Feed lunr with each file and let lunr actually index them
                pagesIndex.forEach(function(page) {
                    lunrIndex.add(page);
                });
            })
            .fail(function(jqxhr, textStatus, error) {
                var err = textStatus + ", " + error;
                console.error("Error getting Hugo index flie:", err);
            });
    }

    // Nothing crazy here, just hook up a listener on the input field
    function initUI() {
        $results = $("#results");
        $("#search").keyup(function() {
            $results.empty();

            // Only trigger a search when 2 chars. at least have been provided
            var query = $(this).val();
            if (query.length < 2) {
                return;
            }

            var results = search(query);

            renderResults(results);
        });
    }

    /**
     * Trigger a search in lunr and transform the result
     *
     * @param  {String} query
     * @return {Array}  results
     */
    function search(query) {
        // Find the item in our index corresponding to the lunr one to have more info
        // Lunr result: 
        //  {ref: "/section/page1", score: 0.2725657778206127}
        // Our result:
        //  {title:"Page1", href:"/section/page1", ...}
        return lunrIndex.search(query).map(function(result) {
                return pagesIndex.filter(function(page) {
                    return page.href === result.ref;
                })[0];
            });
    }

    /**
     * Display the 10 first results
     *
     * @param  {Array} results to display
     */
    function renderResults(results) {
        if (!results.length) {
            return;
        }

        // Only show the ten first results
        results.slice(0, 10).forEach(function(result) {
            var $result = $("<li>");
            $result.append($("<a>", {
                href: result.href,
                text: "» " + result.title
            }));
            $results.append($result);
        });
    }

    // Let's get started
    initLunr();

    $(document).ready(function() {
        initUI();
    });
    </script>
</body>

</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment