Skip to content

Instantly share code, notes, and snippets.

@omas
Last active August 29, 2015 14:02
Show Gist options
  • Save omas/9b2468d24750e13433ef to your computer and use it in GitHub Desktop.
Save omas/9b2468d24750e13433ef to your computer and use it in GitHub Desktop.
html の タグを除去
#!/bin/bash
cat $1 \
| sed -e "s/\r//g" \
| tr "\n" "@" \
| sed -e "s/<script>[^<]*<\/script>//g" \
| sed -e "s/<style>[^<]*<\/style>//g" \
| sed -e "s/<!--[^<]*-->//g" \
| sed -e "s/&nbsp;//g" \
| sed -e "s/&gt;/>/g" \
| sed -e "s/&lt;/</g" \
| sed -e "s/<[^>]*>//g" \
| sed -e "s/[ \t]*//g" \
| sed -e "s/@@*/@/g" \
| tr "@" "\n"
@omas
Copy link
Author

omas commented Jun 17, 2014

html のタグと空白を削除

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment