Skip to content

Instantly share code, notes, and snippets.

@hackerdem
Last active January 5, 2023 18:22
Show Gist options
  • Star 42 You must be signed in to star a gist
  • Fork 28 You must be signed in to fork a gist
  • Save hackerdem/2872d7f994d192188970408980267e6e to your computer and use it in GitHub Desktop.
Save hackerdem/2872d7f994d192188970408980267e6e to your computer and use it in GitHub Desktop.
A simple python script to check broken links of a wesite
@allandequeiroz
Copy link

Thanks for sharing!

@ssousaleo
Copy link

thanks a lot. It's very useful

@Hasokeyk
Copy link

Hasokeyk commented Dec 21, 2017

Thanks Bro 👍

@DanielKoohmarey
Copy link

Check out my fork if anyone needs a python 2.7 link checking library

@wgrv
Copy link

wgrv commented Jan 25, 2018

Hi Thanks a Lot or the Script... I tried to run on my Site and gave a single Hyper Link and its running for All the Links and giving me the output ...There are Many Links and is there any way that we know the base-link where the Link that is found to be Broken ...?

@DanielKoohmarey
Copy link

@hackerdem whats the license for this file?

@RafaelAMello
Copy link

Nice!

@Ailothaen
Copy link

Hey, wonderful script. However, it seems that it keeps running on the other domains linked, which is kinda an unwanted behavior (for example, if I include a link to Google on my website, it will scan Google as well...). Could a fix be made in order to only restrict the crawl to the same domain?

@Pavan2303
Copy link

For this code now how to print those links in a document or excel sheet?

@hackerdem
Copy link
Author

Hey, please try to implement some additional code to check every link's root url to a base url, so if it is not the same, it won't run on it. Another question is about printing results to another file, I think for this purpose, python's csv library can be used.

@saiteja13427
Copy link

Hey, try to include a header as well with the urlopen request. Some website gives 403 forbidden if the request is unknown. It should look something like
url = "https://atomstalk.com" headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1)'} info = urllib.request.urlopen(urllib.request.Request(url=url, headers=headers))

@Rini-bosco
Copy link

how to debug if code is not working for a given url ??

@tiffanyveritas
Copy link

Thanks. It works right out of the box!

@Akanksha0704
Copy link

can someone explain each parts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment