Skip to content

Instantly share code, notes, and snippets.

@jcubic
Created September 16, 2012 13:30
Show Gist options
  • Save jcubic/3732462 to your computer and use it in GitHub Desktop.
Save jcubic/3732462 to your computer and use it in GitHub Desktop.
Print missing html entities from XML file in DOCTYPE ready to paste format

Description

Command Line that print entities (like & or ") in way that can be paste into <!DOCTYPE if xml file don't define them and parser return error, that entity is not defined. Command check if Entity is not already defined.

ALIGN=18;grep -oE '&[^#][^;]+;' foo.xml | sort | uniq | while read entity; do name=$(echo -n $entity | sed -e 's/[;&]//g');grep ENTITY foo.xml | grep " $name " > /dev/null || echo '<!ENTITY '$(echo -n $entity | html2text | perl -e "printf '$name %'.($ALIGN-(length '$name')).'s;\'>','\'&#'.(ord <>);"); done
<!ENTITY aacute '&#195;'>
<!ENTITY acute '&#194;'>
<!ENTITY aelig '&#195;'>
<!ENTITY agrave '&#195;'>
<!ENTITY alpha '&#206;'>
<!ENTITY amp '&#38;'>
<!ENTITY apos '&#38;'>
<!ENTITY auml '&#195;'>
<!ENTITY eacute '&#195;'>
<!ENTITY Eacute '&#195;'>
<!ENTITY ecirc '&#195;'>
<!ENTITY egrave '&#195;'>
<!ENTITY empty '&#38;'>
<!ENTITY gt '&#62;'>
<!ENTITY hArr '&#38;'>
<!ENTITY iota '&#206;'>
<!ENTITY iuml '&#195;'>
<!ENTITY kappa '&#206;'>
<!ENTITY lambda '&#206;'>
<!ENTITY Lambda '&#206;'>
<!ENTITY lang '&#38;'>
<!ENTITY larr '&#226;'>
<!ENTITY ldquo '&#38;'>
<!ENTITY lsquo '&#38;'>
<!ENTITY lt '&#60;'>
<!ENTITY mdash '&#38;'>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment