Skip to content

Instantly share code, notes, and snippets.

@zkkmin
Created March 28, 2019 06:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zkkmin/e81eb7ca39146f726b842a7c6c0643be to your computer and use it in GitHub Desktop.
Save zkkmin/e81eb7ca39146f726b842a7c6c0643be to your computer and use it in GitHub Desktop.
curl -s https://www.docdoc.com.sg/medicaltourism_sitemap_profile_1.xml.gz | zcat | xq -r '.urlset.url | map(.loc) | .[]' | sed -e 's/\.com/\.com\.sg/' | xargs curl --user-agent "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -s -I -L --write-out "============== END ==============\n" | tee medicaltourism_sitemap_profile_1_crawl.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment