Skip to content

Instantly share code, notes, and snippets.

@geotheory
Created December 4, 2020 18:10
Show Gist options
  • Save geotheory/0e968d7ad38aa9e34934e57c1ee137e3 to your computer and use it in GitHub Desktop.
Save geotheory/0e968d7ad38aa9e34934e57c1ee137e3 to your computer and use it in GitHub Desktop.
require(robotstxt)
#> Loading required package: robotstxt
rt = robotstxt::get_robotstxt('bbc.com')
paths_allowed('https://bbc.com/notexists', rt_robotstxt_http_getter = rt)
#> bbc.com
#>
#> [1] TRUE
paths_allowed('https://bbc.com/sport/videos/notexists', rt_robotstxt_http_getter = rt)
#> bbc.com
#> [1] FALSE
cat(rt)
#>
#> # version: 0c6ff23035a3e313d9af399d0e5caf3503a4b186
#>
#> # HTTPS www.bbc.com
#>
#>
#> User-agent: *
#> Sitemap: https://www.bbc.com/sitemaps/https-index-com-archive.xml
#> Sitemap: https://www.bbc.com/sitemaps/https-index-com-news.xml
#> Sitemap: https://www.bbc.com/sitemaps/https-index-com-archive_video.xml
#> Sitemap: https://www.bbc.com/sitemaps/https-index-com-video.xml
#> Sitemap: https://www.bbc.com/sitemaps/sitemap-com-ws-topics.xml
#>
#> Disallow: /bitesize/search$
#> Disallow: /bitesize/search/
#> Disallow: /bitesize/search?
#> Disallow: /cbbc/search/
#> Disallow: /cbbc/search$
#> Disallow: /cbbc/search?
#> Disallow: /cbeebies/search/
#> Disallow: /cbeebies/search$
#> Disallow: /cbeebies/search?
#> Disallow: /chwilio/
#> Disallow: /chwilio$
#> Disallow: /chwilio?
#> Disallow: /education/blocks$
#> Disallow: /education/blocks/
#> Disallow: /newsround
#> Disallow: /search/
#> Disallow: /search$
#> Disallow: /search?
#> Disallow: /sport/videos/*
#> Disallow: /food/favourites
#> Disallow: /food/search*?*
#> Disallow: /food/recipes/search*?*
#> Disallow: /education/my$
#> Disallow: /education/my/
#> Disallow: /bitesize/my$
#> Disallow: /bitesize/my/
#> Disallow: /food/recipes/*/shopping-list
#> Disallow: /food/menus/*/shopping-list
#> Disallow: /news/0
#> Disallow: /ugc$
#> Disallow: /ugc/
#> Disallow: /ugcsupport$
#> Disallow: /ugcsupport/
#> Disallow: /userinfo/
#> Disallow: /userinfo
#> Disallow: /u5llnop$
#> Disallow: /u5llnop/
#> Disallow: /sounds/search$
#> Disallow: /sounds/search/
#> Disallow: /sounds/search?
#> Disallow: /ws/includes
#> Disallow: /radio/imda
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment