Skip to content

Instantly share code, notes, and snippets.

@g10guang
Created January 18, 2020 12:33
Show Gist options
  • Save g10guang/04f11221dadf1ed019e0d3cf3e82caf3 to your computer and use it in GitHub Desktop.
Save g10guang/04f11221dadf1ed019e0d3cf3e82caf3 to your computer and use it in GitHub Desktop.
golang remove html tag from string
package utils
import (
"regexp"
"sort"
"strings"
)
// match html tag and replace it with ""
func RemoveHtmlTag(in string) string {
// regex to match html tag
const pattern = `(<\/?[a-zA-A]+?[^>]*\/?>)*`
r := regexp.MustCompile(pattern)
groups := r.FindAllString(in, -1)
// should replace long string first
sort.Slice(groups, func(i, j int) bool {
return len(groups[i]) > len(groups[j])
})
for _, group := range groups {
if strings.TrimSpace(group) != "" {
in = strings.ReplaceAll(in, group, "")
}
}
return in
}
@ariden83
Copy link

ariden83 commented Jan 4, 2024

your method for removing HTM tags is not very optimized. A simple regex or the bluemonday lib is much faster for the same result.

see bench and method comparaison here : https://go.dev/play/p/7fkN9Vp-86U

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment