Skip to content

Instantly share code, notes, and snippets.

@ecairol
Created November 8, 2023 21:02
Show Gist options
  • Save ecairol/f3cd58fca08196747599207501cfc171 to your computer and use it in GitHub Desktop.
Save ecairol/f3cd58fca08196747599207501cfc171 to your computer and use it in GitHub Desktop.
Scrape Google paginated list
nodes = document.querySelectorAll('#center_col #search [role="heading"]');
Array.from(nodes).forEach((node) => {
const title = node.textContent.slice(0, 100);
const publisher = node.previousSibling.innerText;
const link = node.closest('a').getAttribute('href');
const date = new Date(node.parentNode.lastChild.innerText);
const year = date.getFullYear();
const month = String(date.getMonth() + 1).padStart(2, '0');
const day = String(date.getDate()).padStart(2, '0');
formattedDate = `${year}-${month}-${day}`;
let db = sessionStorage.getItem('googlecontent') || "[]";
db = JSON.parse(db);
db.push(`
<li>${formattedDate} | ${publisher} | <a href="${link}" target="_blank">${title}</a></li>
`);
sessionStorage.setItem( 'googlecontent', JSON.stringify(db) );
} );
// Final output (Save As from browser)
output = JSON.parse(sessionStorage.getItem('googlecontent'))
print = ''; output.forEach( i => { print += ' ' + i }); console.info(print);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment