Last active
March 5, 2024 14:03
-
-
Save Ajnasz/8d2ab713916be7790ebf303a5dd6f7d6 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# SED Commands to replace HTML entities to characters | |
# Usage: | |
# sed -f htmlentities.sed file.html > newfile.html | |
# Replace entities in all html files | |
# sed -i -f htmlentities.sed file.html **/*.html | |
# ASCII printable characters | |
s/ / /g | |
s/!/!/g | |
s/"/"/g | |
s/#/#/g | |
s/$/$/g | |
s/%/%/g | |
s/'/'/g | |
s/(/(/g | |
s/)/)/g | |
s/*/*/g | |
s/+/+/g | |
s/,/,/g | |
s/-/-/g | |
s/././g | |
s///\//g | |
s/0/0/g | |
s/1/1/g | |
s/2/2/g | |
s/3/3/g | |
s/4/4/g | |
s/5/5/g | |
s/6/6/g | |
s/7/7/g | |
s/8/8/g | |
s/9/9/g | |
s/:/:/g | |
s/;/;/g | |
s/</</g | |
s/</</g | |
s/=/=/g | |
s/>/>/g | |
s/>/>/g | |
s/?/?/g | |
s/@/@/g | |
s/A/A/g | |
s/B/B/g | |
s/C/C/g | |
s/D/D/g | |
s/E/E/g | |
s/F/F/g | |
s/G/G/g | |
s/H/H/g | |
s/I/I/g | |
s/J/J/g | |
s/K/K/g | |
s/L/L/g | |
s/M/M/g | |
s/N/N/g | |
s/O/O/g | |
s/P/P/g | |
s/Q/Q/g | |
s/R/R/g | |
s/S/S/g | |
s/T/T/g | |
s/U/U/g | |
s/V/V/g | |
s/W/W/g | |
s/X/X/g | |
s/Y/Y/g | |
s/Z/Z/g | |
s/[/[/g | |
s/\/\\/g | |
s/]/]/g | |
s/^/^/g | |
s/_/_/g | |
s/`/`/g | |
s/a/a/g | |
s/b/b/g | |
s/c/c/g | |
s/d/d/g | |
s/e/e/g | |
s/f/f/g | |
s/g/g/g | |
s/h/h/g | |
s/i/i/g | |
s/j/j/g | |
s/k/k/g | |
s/l/l/g | |
s/m/m/g | |
s/n/n/g | |
s/o/o/g | |
s/p/p/g | |
s/q/q/g | |
s/r/r/g | |
s/s/s/g | |
s/t/t/g | |
s/u/u/g | |
s/v/v/g | |
s/w/w/g | |
s/x/x/g | |
s/y/y/g | |
s/z/z/g | |
s/{/{/g | |
s/|/|/g | |
s/}/}/g | |
s/~/~/g | |
# ISO-8859-1 characters | |
s/À/À/g | |
s/À/À/g | |
s/Á/Á/g | |
s/Á/Á/g | |
s/Â/Â/g | |
s/Â/Â/g | |
s/Ã/Ã/g | |
s/Ã/Ã/g | |
s/Ä/Ä/g | |
s/Ä/Ä/g | |
s/Å/Å/g | |
s/Å/Å/g | |
s/Æ/Æ/g | |
s/Æ/Æ/g | |
s/Ç/Ç/g | |
s/Ç/Ç/g | |
s/È/È/g | |
s/È/È/g | |
s/É/É/g | |
s/É/É/g | |
s/Ê/Ê/g | |
s/Ê/Ê/g | |
s/Ë/Ë/g | |
s/Ë/Ë/g | |
s/Ì/Ì/g | |
s/Ì/Ì/g | |
s/Í/Í/g | |
s/Í/Í/g | |
s/Î/Î/g | |
s/Î/Î/g | |
s/Ï/Ï/g | |
s/Ï/Ï/g | |
s/Ð/Ð/g | |
s/Ð/Ð/g | |
s/Ñ/Ñ/g | |
s/Ñ/Ñ/g | |
s/Ò/Ò/g | |
s/Ò/Ò/g | |
s/Ó/Ó/g | |
s/Ó/Ó/g | |
s/Ô/Ô/g | |
s/Ô/Ô/g | |
s/Õ/Õ/g | |
s/Õ/Õ/g | |
s/Ö/Ö/g | |
s/Ö/Ö/g | |
s/Ø/Ø/g | |
s/Ø/Ø/g | |
s/Ù/Ù/g | |
s/Ù/Ù/g | |
s/Ú/Ú/g | |
s/Ú/Ú/g | |
s/Û/Û/g | |
s/Û/Û/g | |
s/Ü/Ü/g | |
s/Ü/Ü/g | |
s/Ý/Ý/g | |
s/Ý/Ý/g | |
s/Þ/Þ/g | |
s/Þ/Þ/g | |
s/ß/ß/g | |
s/ß/ß/g | |
s/à/à/g | |
s/à/à/g | |
s/á/á/g | |
s/á/á/g | |
s/â/â/g | |
s/â/â/g | |
s/ã/ã/g | |
s/ã/ã/g | |
s/ä/ä/g | |
s/ä/ä/g | |
s/å/å/g | |
s/å/å/g | |
s/æ/æ/g | |
s/æ/æ/g | |
s/ç/ç/g | |
s/ç/ç/g | |
s/è/è/g | |
s/è/è/g | |
s/é/é/g | |
s/é/é/g | |
s/ê/ê/g | |
s/ê/ê/g | |
s/ë/ë/g | |
s/ë/ë/g | |
s/ì/ì/g | |
s/ì/ì/g | |
s/í/í/g | |
s/í/í/g | |
s/î/î/g | |
s/î/î/g | |
s/ï/ï/g | |
s/ï/ï/g | |
s/ð/ð/g | |
s/ð/ð/g | |
s/ñ/ñ/g | |
s/ñ/ñ/g | |
s/ò/ò/g | |
s/ò/ò/g | |
s/ó/ó/g | |
s/ó/ó/g | |
s/ô/ô/g | |
s/ô/ô/g | |
s/õ/õ/g | |
s/õ/õ/g | |
s/ö/ö/g | |
s/ö/ö/g | |
s/ø/ø/g | |
s/ø/ø/g | |
s/ù/ù/g | |
s/ù/ù/g | |
s/ú/ú/g | |
s/ú/ú/g | |
s/û/û/g | |
s/û/û/g | |
s/ü/ü/g | |
s/ü/ü/g | |
s/ý/ý/g | |
s/ý/ý/g | |
s/þ/þ/g | |
s/þ/þ/g | |
s/ÿ/ÿ/g | |
s/ÿ/ÿ/g | |
# ISO-8859-1 symbols | |
s/ / /g | |
s/ / /g | |
s/¡/¡/g | |
s/¡/¡/g | |
s/¢/¢/g | |
s/¢/¢/g | |
s/£/£/g | |
s/£/£/g | |
s/¤/¤/g | |
s/¤/¤/g | |
s/¥/¥/g | |
s/¥/¥/g | |
s/¦/¦/g | |
s/¦/¦/g | |
s/§/§/g | |
s/§/§/g | |
s/¨/¨/g | |
s/¨/¨/g | |
s/©/©/g | |
s/©/©/g | |
s/ª/ª/g | |
s/ª/ª/g | |
s/«/«/g | |
s/«/«/g | |
s/¬/¬/g | |
s/¬/¬/g | |
s/­//g | |
s/­//g | |
s/®/®/g | |
s/®/®/g | |
s/¯/¯/g | |
s/¯/¯/g | |
s/°/°/g | |
s/°/°/g | |
s/±/±/g | |
s/±/±/g | |
s/²/²/g | |
s/²/²/g | |
s/³/³/g | |
s/³/³/g | |
s/´/´/g | |
s/´/´/g | |
s/µ/µ/g | |
s/µ/µ/g | |
s/¶/¶/g | |
s/¶/¶/g | |
s/¸/¸/g | |
s/¸/¸/g | |
s/¹/¹/g | |
s/¹/¹/g | |
s/º/º/g | |
s/º/º/g | |
s/»/»/g | |
s/»/»/g | |
s/¼/¼/g | |
s/¼/¼/g | |
s/½/½/g | |
s/½/½/g | |
s/¾/¾/g | |
s/¾/¾/g | |
s/¿/¿/g | |
s/¿/¿/g | |
s/×/×/g | |
s/×/×/g | |
s/÷/÷/g | |
s/÷/÷/g | |
# Math symbols | |
s/∀/∀/g | |
s/∀/∀/g | |
s/∂/∂/g | |
s/∂/∂/g | |
s/∃/∃/g | |
s/∃/∃/g | |
s/∅/∅/g | |
s/∅/∅/g | |
s/∇/∇/g | |
s/∇/∇/g | |
s/∈/∈/g | |
s/∈/∈/g | |
s/∉/∉/g | |
s/∉/∉/g | |
s/∋/∋/g | |
s/∋/∋/g | |
s/∏/∏/g | |
s/∏/∏/g | |
s/∑/∑/g | |
s/∑/∑/g | |
s/−/−/g | |
s/−/−/g | |
s/∗/∗/g | |
s/∗/∗/g | |
s/√/√/g | |
s/√/√/g | |
s/∝/∝/g | |
s/∝/∝/g | |
s/∞/∞/g | |
s/∞/∞/g | |
s/∠/∠/g | |
s/∠/∠/g | |
s/∧/∧/g | |
s/∧/∧/g | |
s/∨/∨/g | |
s/∨/∨/g | |
s/∩/∩/g | |
s/∩/∩/g | |
s/∪/∪/g | |
s/∪/∪/g | |
s/∫/∫/g | |
s/∫/∫/g | |
s/∴/∴/g | |
s/∴/∴/g | |
s/∼/∼/g | |
s/∼/∼/g | |
s/≅/≅/g | |
s/≅/≅/g | |
s/≈/≈/g | |
s/≈/≈/g | |
s/≠/≠/g | |
s/≠/≠/g | |
s/≡/≡/g | |
s/≡/≡/g | |
s/≤/≤/g | |
s/≤/≤/g | |
s/≥/≥/g | |
s/≥/≥/g | |
s/⊂/⊂/g | |
s/⊂/⊂/g | |
s/⊃/⊃/g | |
s/⊃/⊃/g | |
s/⊄/⊄/g | |
s/⊄/⊄/g | |
s/⊆/⊆/g | |
s/⊆/⊆/g | |
s/⊇/⊇/g | |
s/⊇/⊇/g | |
s/⊕/⊕/g | |
s/⊕/⊕/g | |
s/⊗/⊗/g | |
s/⊗/⊗/g | |
s/⊥/⊥/g | |
s/⊥/⊥/g | |
s/⋅/⋅/g | |
s/⋅/⋅/g | |
# Greek letters | |
s/Α/Α/g | |
s/Α/Α/g | |
s/Β/Β/g | |
s/Β/Β/g | |
s/Γ/Γ/g | |
s/Γ/Γ/g | |
s/Δ/Δ/g | |
s/Δ/Δ/g | |
s/Ε/Ε/g | |
s/Ε/Ε/g | |
s/Ζ/Ζ/g | |
s/Ζ/Ζ/g | |
s/Η/Η/g | |
s/Η/Η/g | |
s/Θ/Θ/g | |
s/Θ/Θ/g | |
s/Ι/Ι/g | |
s/Ι/Ι/g | |
s/Κ/Κ/g | |
s/Κ/Κ/g | |
s/Λ/Λ/g | |
s/Λ/Λ/g | |
s/Μ/Μ/g | |
s/Μ/Μ/g | |
s/Ν/Ν/g | |
s/Ν/Ν/g | |
s/Ξ/Ξ/g | |
s/Ξ/Ξ/g | |
s/Ο/Ο/g | |
s/Ο/Ο/g | |
s/Π/Π/g | |
s/Π/Π/g | |
s/Ρ/Ρ/g | |
s/Ρ/Ρ/g | |
s/Σ/Σ/g | |
s/Σ/Σ/g | |
s/Τ/Τ/g | |
s/Τ/Τ/g | |
s/Υ/Υ/g | |
s/Υ/Υ/g | |
s/Φ/Φ/g | |
s/Φ/Φ/g | |
s/Χ/Χ/g | |
s/Χ/Χ/g | |
s/Ψ/Ψ/g | |
s/Ψ/Ψ/g | |
s/Ω/Ω/g | |
s/Ω/Ω/g | |
s/α/α/g | |
s/α/α/g | |
s/β/β/g | |
s/β/β/g | |
s/γ/γ/g | |
s/γ/γ/g | |
s/δ/δ/g | |
s/δ/δ/g | |
s/ε/ε/g | |
s/ε/ε/g | |
s/ζ/ζ/g | |
s/ζ/ζ/g | |
s/η/η/g | |
s/η/η/g | |
s/θ/θ/g | |
s/θ/θ/g | |
s/ι/ι/g | |
s/ι/ι/g | |
s/κ/κ/g | |
s/κ/κ/g | |
s/λ/λ/g | |
s/λ/λ/g | |
s/μ/μ/g | |
s/μ/μ/g | |
s/ν/ν/g | |
s/ν/ν/g | |
s/ξ/ξ/g | |
s/ξ/ξ/g | |
s/ο/ο/g | |
s/ο/ο/g | |
s/π/π/g | |
s/π/π/g | |
s/ρ/ρ/g | |
s/ρ/ρ/g | |
s/ς/ς/g | |
s/ς/ς/g | |
s/σ/σ/g | |
s/σ/σ/g | |
s/τ/τ/g | |
s/τ/τ/g | |
s/υ/υ/g | |
s/υ/υ/g | |
s/φ/φ/g | |
s/φ/φ/g | |
s/χ/χ/g | |
s/χ/χ/g | |
s/ψ/ψ/g | |
s/ψ/ψ/g | |
s/ω/ω/g | |
s/ω/ω/g | |
s/ϑ/ϑ/g | |
s/ϑ/ϑ/g | |
s/ϒ/ϒ/g | |
s/ϒ/ϒ/g | |
s/ϖ/ϖ/g | |
s/ϖ/ϖ/g | |
# Misc entities | |
s/Œ/Œ/g | |
s/Œ/Œ/g | |
s/œ/œ/g | |
s/œ/œ/g | |
s/Š/Š/g | |
s/Š/Š/g | |
s/š/š/g | |
s/š/š/g | |
s/Ÿ/Ÿ/g | |
s/Ÿ/Ÿ/g | |
s/ƒ/ƒ/g | |
s/ƒ/ƒ/g | |
s/ˆ/ˆ/g | |
s/ˆ/ˆ/g | |
s/˜/˜/g | |
s/˜/˜/g | |
s/ / /g | |
s/ / /g | |
s/ / /g | |
s/ / /g | |
s/ / /g | |
s/ / /g | |
s/‌//g | |
s/‌//g | |
s/‍//g | |
s/‍//g | |
s/‎//g | |
s/‎//g | |
s/‏//g | |
s/‏//g | |
s/–/–/g | |
s/–/–/g | |
s/—/—/g | |
s/—/—/g | |
s/‘/‘/g | |
s/‘/‘/g | |
s/’/’/g | |
s/’/’/g | |
s/‚/‚/g | |
s/‚/‚/g | |
s/“/“/g | |
s/“/“/g | |
s/”/”/g | |
s/”/”/g | |
s/„/„/g | |
s/„/„/g | |
s/†/†/g | |
s/†/†/g | |
s/‡/‡/g | |
s/‡/‡/g | |
s/•/•/g | |
s/•/•/g | |
s/…/…/g | |
s/…/…/g | |
s/‰/‰/g | |
s/‰/‰/g | |
s/′/′/g | |
s/′/′/g | |
s/″/″/g | |
s/″/″/g | |
s/‹/‹/g | |
s/‹/‹/g | |
s/›/›/g | |
s/›/›/g | |
s/‾/‾/g | |
s/‾/‾/g | |
s/€/€/g | |
s/€/€/g | |
s/™/™/g | |
s/™/™/g | |
s/←/←/g | |
s/←/←/g | |
s/↑/↑/g | |
s/↑/↑/g | |
s/→/→/g | |
s/→/→/g | |
s/↓/↓/g | |
s/↓/↓/g | |
s/↔/↔/g | |
s/↔/↔/g | |
s/↵/↵/g | |
s/↵/↵/g | |
s/⌈/⌈/g | |
s/⌈/⌈/g | |
s/⌉/⌉/g | |
s/⌉/⌉/g | |
s/⌊/⌊/g | |
s/⌊/⌊/g | |
s/⌋/⌋/g | |
s/⌋/⌋/g | |
s/◊/◊/g | |
s/◊/◊/g | |
s/♠/♠/g | |
s/♠/♠/g | |
s/♣/♣/g | |
s/♣/♣/g | |
s/♥/♥/g | |
s/♥/♥/g | |
s/♦/♦/g | |
s/♦/♦/g | |
s/&/\&/g | |
s/&/\&/g |
Thanks @robin-a-meade, I changed the gist according to your comment.
The script works well! Thanks @Ajnasz .
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
sed treats ampersand specially in the replacement. See https://unix.stackexchange.com/a/296732
Consider:
Result:
Expected:
Need to change
s/&/&/g
tos/&/\&/g
(must backslash escape the special meaning of ampersand)Also: The replacement of ampersand must be the last replacement. That is, move
s/&/&/g
to be the last line of the sed script.Test using:
The result should be: