Skip to content

Instantly share code, notes, and snippets.

@sebz
Last active June 28, 2024 18:41
Show Gist options
  • Save sebz/efddfc8fdcb6b480f567 to your computer and use it in GitHub Desktop.
Save sebz/efddfc8fdcb6b480f567 to your computer and use it in GitHub Desktop.
hugo + gruntjs + lunrjs = <3 search

How to implement a custom search for Hugo usig Gruntjs and Lunrjs.

Requirements

Install the following tools:

Setup

Project organization

Here is my Hugo based website project structure

  MyWebsite/
   |- site/ <= Hugo project root folder
         |- content/
         |- layout/
         |- static/
            |- js/
               |- lunr/ <= Where we generate the lunr json index file
               |- vendor/
                  |- lunrjs.min.js <= lunrjs library
            |- ...
         |- config.yaml
         |- ...
   |- Gruntfile.js <= Where the magic happens
   |- package.json <= Dependencies declaration required to build the index
   |- ...

Install the Nodejs dependencies

From the project root folder launch npm install --save-dev grunt string toml

  • string <= do almost all the work
  • toml
    • Used to parse the Frontmatter, mine is in TOML... obviously
    • Otherwise you can install yamljs

Time to work

The principle

We will work both at buildtime and runtime. With Gruntjs (buildtime), we'll generate a JSON index file and with a small js script (runtime) initilize and use lunrjs.

Build the Lunr index file

Lunrjs allows you to define fields to describe your pages (documents in lunrjs terms) that will be used to search and hopefully find stuff. The index file is basically a JSON file corresponding to an array of all the documents (pages) composing the website.

Here are the fields I chose to describe my pages:

  • title <=> Frontmatter title or file name
  • tags <=> Frontmatter tags or nothing
  • content <=> File content
  • ref <=> Reworked file path used as absolute URL

Workflow

  1. Recursively walk through all files of the content folder
  2. Two possibilities
    1. Markdown file
      1. Parse the Frontmatter to extract the title and the tags
      2. Parse and clean the content
    2. HTML file
      1. Parse and clean the content
      2. Use the file name as title
  3. Use the path file as ref (link toward the page)

Show me the code!

Here is the Gruntfile.js file:

var toml = require("toml");
var S = require("string");

var CONTENT_PATH_PREFIX = "site/content";

module.exports = function(grunt) {

    grunt.registerTask("lunr-index", function() {

        grunt.log.writeln("Build pages index");

        var indexPages = function() {
            var pagesIndex = [];
            grunt.file.recurse(CONTENT_PATH_PREFIX, function(abspath, rootdir, subdir, filename) {
                grunt.verbose.writeln("Parse file:",abspath);
                pagesIndex.push(processFile(abspath, filename));
            });

            return pagesIndex;
        };

        var processFile = function(abspath, filename) {
            var pageIndex;

            if (S(filename).endsWith(".html")) {
                pageIndex = processHTMLFile(abspath, filename);
            } else {
                pageIndex = processMDFile(abspath, filename);
            }

            return pageIndex;
        };

        var processHTMLFile = function(abspath, filename) {
            var content = grunt.file.read(abspath);
            var pageName = S(filename).chompRight(".html").s;
            var href = S(abspath)
                .chompLeft(CONTENT_PATH_PREFIX).s;
            return {
                title: pageName,
                href: href,
                content: S(content).trim().stripTags().stripPunctuation().s
            };
        };

        var processMDFile = function(abspath, filename) {
            var content = grunt.file.read(abspath);
            var pageIndex;
            // First separate the Front Matter from the content and parse it
            content = content.split("+++");
            var frontMatter;
            try {
                frontMatter = toml.parse(content[1].trim());
            } catch (e) {
                conzole.failed(e.message);
            }

            var href = S(abspath).chompLeft(CONTENT_PATH_PREFIX).chompRight(".md").s;
            // href for index.md files stops at the folder name
            if (filename === "index.md") {
                href = S(abspath).chompLeft(CONTENT_PATH_PREFIX).chompRight(filename).s;
            }

            // Build Lunr index for this page
            pageIndex = {
                title: frontMatter.title,
                tags: frontMatter.tags,
                href: href,
                content: S(content[2]).trim().stripTags().stripPunctuation().s
            };

            return pageIndex;
        };

        grunt.file.write("site/static/js/lunr/PagesIndex.json", JSON.stringify(indexPages()));
        grunt.log.ok("Index built");
    });
};

The index file looks like:

[{
    "title": "Page1",
    "href": "/section/page1",
    "content": " This is the cleaned content of 'site/content/section/page1.md' "
}, {
    "title": "Page2",
    "tags": ["tag1", "tag2", "tag3"],
    "href": "/section/page2",
    "content": " This is the cleaned content of 'site/content/section/page2.md' "
}, {
    "title": "Page3",
    "href": "/section/page3",
    "content": " This is the cleaned content of 'site/content/section/page3.md' "
}]

Launch the task: grunt lunr-index

Use the index

On the client side here is a small usage example:

<!DOCTYPE html>
<html>

<head>
    <title>Hugo + Lunrjs = &lt;3 search </title>
</head>

<body>
    Search:
    <input id="search" type="text">
    <br> Results:
    <ul id="results">
    </ul>
    <script type="text/javascript" src="https://code.jquery.com/jquery-2.1.3.min.js"></script>
    <script type="text/javascript" src="js/vendor/lunr.min.js"></script>
    <script type="text/javascript">
    var lunrIndex,
        $results,
        pagesIndex;

    // Initialize lunrjs using our generated index file
    function initLunr() {
        // First retrieve the index file
        $.getJSON("js/lunr/PagesIndex.json")
            .done(function(index) {
                pagesIndex = index;
                console.log("index:", pagesIndex);

                // Set up lunrjs by declaring the fields we use
                // Also provide their boost level for the ranking
                lunrIndex = lunr(function() {
                    this.field("title", {
                        boost: 10
                    });
                    this.field("tags", {
                        boost: 5
                    });
                    this.field("content");

                    // ref is the result item identifier (I chose the page URL)
                    this.ref("href");
                });

                // Feed lunr with each file and let lunr actually index them
                pagesIndex.forEach(function(page) {
                    lunrIndex.add(page);
                });
            })
            .fail(function(jqxhr, textStatus, error) {
                var err = textStatus + ", " + error;
                console.error("Error getting Hugo index flie:", err);
            });
    }

    // Nothing crazy here, just hook up a listener on the input field
    function initUI() {
        $results = $("#results");
        $("#search").keyup(function() {
            $results.empty();

            // Only trigger a search when 2 chars. at least have been provided
            var query = $(this).val();
            if (query.length < 2) {
                return;
            }

            var results = search(query);

            renderResults(results);
        });
    }

    /**
     * Trigger a search in lunr and transform the result
     *
     * @param  {String} query
     * @return {Array}  results
     */
    function search(query) {
        // Find the item in our index corresponding to the lunr one to have more info
        // Lunr result: 
        //  {ref: "/section/page1", score: 0.2725657778206127}
        // Our result:
        //  {title:"Page1", href:"/section/page1", ...}
        return lunrIndex.search(query).map(function(result) {
                return pagesIndex.filter(function(page) {
                    return page.href === result.ref;
                })[0];
            });
    }

    /**
     * Display the 10 first results
     *
     * @param  {Array} results to display
     */
    function renderResults(results) {
        if (!results.length) {
            return;
        }

        // Only show the ten first results
        results.slice(0, 10).forEach(function(result) {
            var $result = $("<li>");
            $result.append($("<a>", {
                href: result.href,
                text: "» " + result.title
            }));
            $results.append($result);
        });
    }

    // Let's get started
    initLunr();

    $(document).ready(function() {
        initUI();
    });
    </script>
</body>

</html>
@JamesMcMahon
Copy link

This is great! Thanks for writing this up.

Two quick comments:

@sebz
Copy link
Author

sebz commented Apr 14, 2015

Thanks James, I initially chose the Markdown syntax without naming the gist. And just before posting on the forum, I provided the name without .md which automatically switch the syntax to Text... 😄

Regarding the license, there's no restriction, feel free to use/improve it.

@digitalcraftsman
Copy link

Your script works great. Would it be possible to wrap your code and explantion in some sort of tutorial for the docs of Hugo? I think, other people will benefit from it, if they aren't looking straight into the forum.

@sebz
Copy link
Author

sebz commented Oct 1, 2015

Oops.... I missed your comment @digitalcraftsman... sorry...
It's not an official implementation backed by the Hugo team. I agree, this gist is hard to find...

@sebz
Copy link
Author

sebz commented Oct 1, 2015

And this solution is far from perfect... In my case the number of pages has dramatically increased and downloading the index on the client side is now an issue (here: https://doc.airvantage.net/av/).

A possibility would be to have a small nodejs webapp to serve a search API based on the same lunrjs index. But it implicates adding a server, managing the index updates and monitoring this small webapp and so on...

As we speak, I'm evaluating the integration of Google Custom Search instead of this lunrjs based solution... I know.. It's a shame 😊

@dublx
Copy link

dublx commented Oct 27, 2015

@sebz, why not run your 'search' API in a AWS Lambda? upload your index file along with the Lambda code and then set a API Gateway in front of Lambda.

@AmrAbdulrahman
Copy link

You're super (Y)
Thanks @sebz!

@MrRaph
Copy link

MrRaph commented Jan 25, 2017

Thanks a looooot ! :)

@RichardSage
Copy link

Hi, thanks for putting this together, i'm finding my script fails because it tries to open the .DS_Store file within my contents folder. is there a way i can stop this? Thanks in advance!

@RichardSage
Copy link

ignore me, added if (filename.startsWith('.')) return; into the recurse callback so it ignores the .ds_store

Copy link

ghost commented Aug 2, 2017

@sebz I'm receiving a warning from the output of grunt lunr-index. It reads "Warning: conzole is not defined Use --force to continue." and aborts the task. Any thoughts? Forcing the command doesn't do anything either. It doesn't run the actual command.

@micylt
Copy link

micylt commented Oct 29, 2017

@rniller It's supposed to be console log I think.

@micylt
Copy link

micylt commented Oct 30, 2017

I suggest to anyone using this change:
if (S(filename).endsWith(".html")) {
pageIndex = processHTMLFile(abspath, filename);
} else {
pageIndex = processMDFile(abspath, filename);
}
to:
if (S(filename).endsWith(".html")) {
pageIndex = processHTMLFile(abspath, filename);
} else if (S(filename).endsWith(".md")) {
pageIndex = processMDFile(abspath, filename);
}
return pageIndex;

@gatlinnewhouse
Copy link

gatlinnewhouse commented Dec 3, 2017

@micylt does conzole.failed(e.message); need to be changed to console.log(e.message);? Because when I try that I get Warning: Cannot read property 'title' of undefined Used --force, continuing.

@eleijonmarck
Copy link

I get the same for my project. It feels that we @gatlinnwhouse haven't set it up properly.

@SpiZeak
Copy link

SpiZeak commented Nov 23, 2018

@gatlinnewhouse I noticed that I had empty _index.md files. All I needed to do was wrap both trim() code blocks in if statements that looked to see if content[1] or frontMatter was defined at all.

@cottrell
Copy link

Any good replacement for String package? npm is giving audit warnings.

@naile
Copy link

naile commented Feb 3, 2019

Thanks a bunch this was really helpful! I made a vanilla js version of your search for those who don't want to pull in jquery. https://gist.github.com/naile/47e3e8aa62c6d1410d7b51b80f13bcfe it also works with the latest version of lunrjs as the index since 2.0 is immutable.

@inwardmovement
Copy link

Thanks @sebz for this!

@sprajagopal
Copy link

Is there a way to get more finegrained results? For example, the highlighted line numbers in each file?

@gocs
Copy link

gocs commented Mar 6, 2021

Thanks a bunch this was really helpful! I made a vanilla js version of your search for those who don't want to pull in jquery. https://gist.github.com/naile/47e3e8aa62c6d1410d7b51b80f13bcfe it also works with the latest version of lunrjs as the index since 2.0 is immutable.

this solves idx.add not a function

@radan-magie
Copy link

My version of Gruntfile,js to allow Lunr to work on a standard goHugo setup, still some problems with multiline YAML variables
https://gist.github.com/radan-magie/b83e34bc9ec2884ff19a8cee23c2f613

@antrax2024
Copy link

No way!

❯ ~/node_modules/grunt/bin/grunt lunr-index
Running "lunr-index" task
Build pages index
Warning: ENOENT: no such file or directory, scandir 'site/content' Use --force to continue.

Aborted due to warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment