Skip to content

Instantly share code, notes, and snippets.

@jsaneu
Last active March 22, 2021 12:44
Show Gist options
  • Save jsaneu/b66b002724f51c0b75c8 to your computer and use it in GitHub Desktop.
Save jsaneu/b66b002724f51c0b75c8 to your computer and use it in GitHub Desktop.
MediaWiki Crawler

Gets the website

wget -nH --reject-regex 'Especial|Special|Ayuda|Help|action|printable|Archivo:' --recursive --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains domain.com --no-parent http://domain.com/wiki

Remove external links regexp

Find: (<a[^>]*href="http)[^"]*("[^>]*>)([^"]*)(</a>)

Replace: $3

Hide navigation, header and footer

#mw-navigation {display:none;}
#left-navigation,#mw-head-base{margin-left:0em;}
@media screen {
	#mw-page-base,#mw-head-base,#footer,.mw-editsection {display:none;}
	.mw-body{margin-left:0em;}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment