Skip to content

Instantly share code, notes, and snippets.

@nsbingham
Created March 31, 2013 15:56
Show Gist options
  • Save nsbingham/5281082 to your computer and use it in GitHub Desktop.
Save nsbingham/5281082 to your computer and use it in GitHub Desktop.
Clean up a HTML generated by Word with HTML Tidy on OSX
# Install tidy
brew install tidy
# Export the Word doc as HTML
# Create a config file named tidy-config.txt like below
# Find more at http://tidy.sourceforge.net/
tidy -config tidy-config.txt -o cleaned.html -i dirty.htm
word-2000: yes
bare: yes
clean: yes
drop-empty-paras: yes
drop-font-tags: yes
join-styles: yes
output-xhtml: yes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment