Skip to content

Instantly share code, notes, and snippets.

@NikaZhenya

NikaZhenya/htmlclean

Last active Apr 17, 2019
Embed
What would you like to do?
Limpia HTML exportados por InDesign eliminando div, overrides, valores que inician '_id', atributos vacíos y span vacíos.
#!/bin/sh
clean ()
{
# Elimina en el siguiente orden:
# 1. Etiquetas div.
# 2. Overrides.
# 3. Elimina valores de atributos que inician con '_id'.
# 4. Elimina atributos vacíos.
# 5. Etiquetas span sin atributos.
# 6. Dobles saltos de línea.
cp $1 $1.bak
perl -00 \
-pe 's/\s+<[\/]*div[^<]*?>//g;\
s/\s*[A-Za-z]+Override-[0-9]+\s*//g;\
s/\s*_id[A-Za-z0-9]+\s*//g;\
s/\s*\S+=""//g;\
s/<span>(.*?)<\/span>/\1/g;'\
$1 > $1.clean
mv $1.clean $1
}
usage ()
{
echo "Limpia HTML exportados por InDesign eliminando div, overrides, valores que inician '_id', atributos vacíos y span vacíos."
echo " Usage: $1 file"
}
# Necesita un archivo de entrada
if [ -z "$1" ]; then
usage "$0"
exit 1
fi
# Limpia el HTML
clean "$1" || exit $?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.