Skip to content

Instantly share code, notes, and snippets.

@asowder3943
Last active August 24, 2022 17:59
Show Gist options
  • Save asowder3943/feda3cbe9412ff44707705c9c607d34a to your computer and use it in GitHub Desktop.
Save asowder3943/feda3cbe9412ff44707705c9c607d34a to your computer and use it in GitHub Desktop.
Crawlee Documentation Question

Hello Everyone! -> I'm new to crawlee and typescript in general.

I know there are a lot more important things on the maintainers' plates right now, but I was following this section when I encountered the following failure:

error TS2345: Argument of type 'string | undefined' is not assignable to parameter of type 'string'.
  Type 'undefined' is not assignable to type 'string'.

image

Is this the best way to handle this error (typeof checking)?

Perhaps a newbie question, but if it is an acceptable change, could I make it my first pull request to fix all instances of this error in the docs / website files in each section?

import { CheerioCrawler } from 'crawlee';
import { URL } from 'node:url';

const crawler = new CheerioCrawler({
    // Let's limit our crawls to make our
    // tests shorter and safer.
    maxRequestsPerCrawl: 20,
    async requestHandler({ request, $ }) {
        const title = $('title').text();
        console.log(`The title of "${request.url}" is: ${title}.`);

        const links = $('a[href]')
            .map((_, el) => $(el).attr('href'))
            .get();

        // Besides resolving the URLs, we now also need to
        // grab their hostname for filtering.
        const { hostname } = new URL(
          typeof request.loadedUrl == 'undefined' ? '' : request.loadedUrl
        );
        const absoluteUrls = links
            .map((link) => new URL(link, request.loadedUrl));

        // We use the hostname to filter links that point
        // to a different domain, even subdomain.
        const sameHostnameLinks = absoluteUrls
            .filter((url) => url.hostname === hostname)
            .map((url) => ({ url: url.href }));

        // Finally, we have to add the URLs to the queue
        await crawler.addRequests(sameHostnameLinks);
    },
});

await crawler.run(['https://crawlee.dev']);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment