Skip to content

Instantly share code, notes, and snippets.

@reosarevok
Last active September 21, 2021 13:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save reosarevok/c155cef5d9a8bfab6678baa45dd5c040 to your computer and use it in GitHub Desktop.
Save reosarevok/c155cef5d9a8bfab6678baa45dd5c040 to your computer and use it in GitHub Desktop.

Goals

  • Avoid having broken links that are not marked as ended.
  • Automatically mark links that are broken as ended.
  • Avoid marking links as ended unless we have made sure they are broken (they repeatedly fail).
  • For lyrics links, avoid having them at all if they are broken (since even if archived, they would now be unlicensed).

Steps

  • Check each URL in the URL table, one by one.
    • Check if all relationships for that URL are marked as ended. If so, skip.
    • Try to reach the page, record the result.
      • If it's a 200, skip.
      • If it's a redirect, log it to potentially show them to users for research (skip ones known to be redirects such as DNB permalinks).
      • If it's an error, check if we have hit the same error for the same URL in a previous pass of the script.
        • If we have, enter edits to mark any relationships to this URL as ended (or to remove them if they should not be kept, e.g. lyrics).
        • If we have not, log the URL and the error so we can check if it keeps failing on the next pass.
    • If we have entered 1000 edits by now, write down where we stopped, continue tomorrow (since there's a bot CoC limit of 1k per day).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment