Skip to content

Instantly share code, notes, and snippets.

@tmslnz
Last active February 19, 2024 03:07
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save tmslnz/69d08ada96a66a39b463bc5824f2ea75 to your computer and use it in GitHub Desktop.
Save tmslnz/69d08ada96a66a39b463bc5824f2ea75 to your computer and use it in GitHub Desktop.
Nice command line for HTTrack

Commands

httrack example.com -O ./example.com -N100 −%i0 -I0 --max-rate 0 --disable-security-limits --near -v
httrack example.com -O ./example.com-3 -N100 -I0 -N "%p/%n%[month].%t" --max-rate 0 --disable-security-limits --near  -v
# Used for WA fetch of toogood (noted on 2017.02.22)
www.xxx.com -O ./xxx.com -N100 −%i0 -I0 -A0 -%! -n -v

Options

-N100    Don't put the site in its own domain directory, otherwise mirror as usual
-I0    Don't make the HTTrack index page
-N "%p/%n%[month].%t"    Name files like path/name.html or path/namemonthname.html [month] could be [page], [search], whatever the query string offers. 
--near    Fetch _near_ external resources (scripts, css, etc.)

.htaccess

Use in conjunction with these .htaccess directives:

Options -Indexes

DirectoryIndex index.html index-2.html index-3.html index-4.html index-5.html index-6.html index-7.html index-8.html

RewriteEngine On
RewriteBase /

# Redirect www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

# make sure index is index
# also account for HTTrack index-n.html renaming
RewriteRule ^index(-[0-9])?\.html$ / [R=301,L]
RewriteRule ^(.*)/index(-[0-9])?\.html$ /$1 [R=301,L]

# Disable Automatic Directory detection
DirectorySlash Off

# Hide extension
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html

# Redirect .html to non-.html
RewriteCond %{THE_REQUEST} \.html
RewriteRule ^(.*)\.html$ /$1 [R=301,L]
@rafilkmp3
Copy link

rafilkmp3 commented Oct 7, 2019

this broke my server, I need to remove this section to workaround 403 on all sublinks

# Disable Automatic Directory detection
DirectorySlash Off

thanks for your gist I used a base to my solution here https://github.com/rafilkmp3/docker-httrack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment