Skip to content

Instantly share code, notes, and snippets.

@danielhaim1
Last active January 25, 2023 11:21
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danielhaim1/554672fc6d511b4a1b25a24a91d7145b to your computer and use it in GitHub Desktop.
Save danielhaim1/554672fc6d511b4a1b25a24a91d7145b to your computer and use it in GitHub Desktop.
Scraping HTML List & Export to Excel

Scrape List w/ Pagination with Export to Excel

Example HTML

<ul class="list">
    <li class="list-item">
        <a class="list-item__username" href="/profile/john-doe">John Doe</a>
        <div class="list-item__comment">...</div>
    </li>
    <li class="list-item">
        <a class="list-item__username" href="/profile/jane-doe">Jane Doe</a>
        <div class="list-item__comment">...</div>
    </li>
</ul>

<nav class="list-pagination">
    <a class="list-pagination__link" href="#" class="current">1</a>
    <a class="list-pagination__link" href="#">2</a>
    <a class="list-pagination__link" href="#">3</a>
</nav>

Getting Started

const scrapedList = [];

function scrapeList() {
    const arr = [];
    const reviewList = document.querySelectorAll(`.list-item`);
    reviewList.forEach((node, i) => {
        let userName = null;
        let userLink = null;
        let userComment = null;

        const getUsername = node.querySelector(".list-item__username");
        const getUserLink = node.querySelector(".list-item__username");
        const getUserComment = node.querySelector(".list-item__comment");

        userName = getUsername.textContent;
        userLink = getUserLink.getAttribute("href");
        userComment = getUserComment.textContent;

        arr.push({
            userName,
            userLink,
            userComment
        });
    });

    scrapedList.push(arr);
    // console.table(arr);
    // document.querySelector(".list-pagination__link.current + a").click();
}

const copyToClipboard = (matrix) => {
    const textArea = document.createElement("textarea");
    textArea.value = matrix.map(x => x.join('\t')).join('\n');

    document.body.appendChild(textArea);
    textArea.select();
    document.execCommand('copy');
    document.body.removeChild(textArea);

    console.log("Copied to Clipbaord!");
}

Next Steps ...

Copypaste commands for console

1. Non-automated function

scrapeList();

2a. Automated function

Start the automation by running the following commands

const startAutomation = setTimeout(function scrapeAutomation() {
    scrapeList();
    window.repeatAutomation = setTimeout(scrapeAutomation, 1500);
}, 1500);

2b. Stop Automation

Stop the automation by running the following commands

clearTimeout(startAutomation);
clearTimeout(repeatAutomation);

3. Combine Lists

Run the command to combine the list into one array

const scrapedListFlat = scrapedList.flatMap(x => x).map(({
    userName,
    userLink,
    userComment
}) => [userName, userLink, userComment]);

4. Export List

Paste the command to copy the data to your clipboard, and paste the data in excel.

copyToClipboard(scrapedListFlat);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment