Skip to content

Instantly share code, notes, and snippets.

@bobmonsour
Created April 13, 2023 04:45
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bobmonsour/895945ad5652d11129d6bbde67ffb2a1 to your computer and use it in GitHub Desktop.
Save bobmonsour/895945ad5652d11129d6bbde67ffb2a1 to your computer and use it in GitHub Desktop.
An Eleventy filter that extracts the meta description from within the <head> element of a web page
// getDescription - given a url, this Eleventy filter extracts the meta
// description from within the <head> element of a web page using the cheerio
// library.
//
// The full html content of the page is fetched using the eleventy-fetch plugin.
// If you have a lot of links from which you want to extract descriptions, the
// initial build time will be slow. However, the plugin will cache the content
// for a duration of your choosing (in this example, it's set to 1 day).
//
// The description is extracted from the <meta> element with the name attribute
// of "description".
//
// If no description is found, the filter returns an empty string. In the event
// of an error, the filter logs an error to the console and returns the string
// "(no description available)"
//
// Be sure to create a .cache folder in your project root and add .cache to your
// .gitignore file. See https://www.11ty.dev/docs/plugins/fetch/#installation
//
const EleventyFetch = require("@11ty/eleventy-fetch");
const cheerio = require("cheerio");
eleventyConfig.addFilter(
"getDescription",
async function getDescription(link) {
try {
let htmlcontent = await EleventyFetch(link, {
duration: "1d",
type: "buffer",
});
const $ = cheerio.load(htmlcontent);
// console.log(
// "description: " + $("meta[name=description]").attr("content")
// );
return $("meta[name=description]").attr("content");
} catch (e) {
console.log(
"Error fetching description for " + link + ": " + e.message
);
return "(no description available)";
}
}
);
@bobmonsour
Copy link
Author

While it's probably obvious, I wanted to note that this can be adapted to extract just about anything from an HTML document. And the item to be extracted could easily be an additional argument to the filter.

@zachleat
Copy link

I do want to also put in a plug for the excellent https://www.npmjs.com/package/linkedom library for this too!

@bobmonsour
Copy link
Author

Thanks, Zach. I can't quite understand how to make that work, but I'm still in the early stages of javascript and npm package knowledge journey. Once I understand "cascading asset bucketing" I think I'll be read to conquer linkedom ;-)

@bobmonsour
Copy link
Author

For my use case, specifically for the 11tybundle.dev site, I have changed the cache duration to '*', meaning that eleventy will never fetch new data (after the first success). There's no need for me to be re-fetching complete blog posts to extract a description...once is quite enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment