Skip to content

Instantly share code, notes, and snippets.

@vpadhariya
Created January 23, 2018 07:57
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save vpadhariya/90ab71c2a6a1203f5f9aa75ad5c5f32a to your computer and use it in GitHub Desktop.
Save vpadhariya/90ab71c2a6a1203f5f9aa75ad5c5f32a to your computer and use it in GitHub Desktop.
Clone site and remove query string values from the files in linux.
# Clone entire site.
wget --content-disposition --execute robots=off --recursive --no-parent --continue --no-clobber http://example.com
# Remove query string from a static resource.
for i in `find $1 -type f -name "*\?*"`; do mv $i `echo $i | cut -d? -f1`; done
@escaroda
Copy link

For the last line we need to create a file rename.sh
Write inside:

# /bin/bash
for i in `find $1 -type f -name "*\?*"`; 
  do mv $i `echo $i | cut -d? -f1`; 
done

then make this file exacutable:
chmod +x rename.sh
and call with folder path as an argument (in this case dot represents current folder):
rename.sh .

@vpadhariya
Copy link
Author

OK here is the new version for 2024

Download Site Command

wget \
     --recursive \
     --level 5 \
     --no-clobber \
     --page-requisites \
     --adjust-extension \
     --span-hosts \
     --convert-links \
     --restrict-file-names=windows \
     --domains yourdomain.com \
     --no-parent \
        https://yourdomain.com

Rename files (works fine for macos)

find . -name '*\?*' | while read f; do echo mv "$f" "${f//\?*/}"; done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment