varenc/zshexpn-explained.md

## zshexpn-explained.md

      
    Raw
  

              zshexpn-explained.md
            
          
    The following is taken from a brilliant answer on unix.se. Posting it here for personal reference. The question was:
What kind of patterns can I use in zsh parameter expansion?

${var//pattern/replacement} is using zsh wildcard patterns for pattern, the same ones as used for filename generation aka globbing which are a superset of the sh wildcard patterns. The syntax is also affected by the kshglob and extendedglob options. The ${var//pattern/replacement} comes from the Korn shell initially.
I'd recommend enabling extendedglob (set -o extendedglob in your ~/.zshrc) which gives you the most features (more so than standard EREs) at the expense of some backward incompatibility in some corner cases.
You'll find it documented at info zsh 'filename generation'.
A cheat sheet for the mapping between ERE and extended zsh wildcards:
Standard sh ones:

. -> ?
.* -> *
[...] -> [...]

zsh extensions:

* -> #
+ -> ##
{x,y} -> (#cx,y)
(...|...) -> (...|...)

some  extra features not available in standard EREs:

^pattern (negation)
x~y (except)
<12-234> match decimal number ranges
(#i) case insensitive matching
(#a2) approximate matching allowing up to 2 errors.
many more

Whether wildcard patterns are anchored at start or end of the subject depends on what operator is used.

Globs, case patterns, [[ string = pattern ]] and ${var:#pattern} are anchored at both (f*.txt will match on foo.txt, not Xfoo.txtY)
${var#pattern} and ${var##pattern} are anchored at the start
${var%pattern) and ${var%%pattern} are anchored at the end
${var/pattern/repl} and ${var//pattern/repl} are not anchored but can be made so with ${var/#pattern} (start) or ${var/%pattern} (end).

(#s) and (#e) can also be used as the equivalents of ^/$ (ERE) or \A/\z (PCRE).
Whether repeating operators (#, ##, *, (#cx,y), <x-y>) are greedy depends on the operator as well (greedy with ##, %%, //, / not with #, %), that can be changed with the S parameter expansion flag.
So for your examples:

regexp-replace nname "[^[:alnum:]]" "_": ${var//[^[:alnum:]]/_}
regexp-replace nname "_{2,}" "_": ${var//_(#c2,)/_}
regexp-replace nname "_+$" "": ${var%%_#} or ${var/%_#} (here using # for the * equivalent, you can use ## for a + equivalent but that won't make any difference in this case).
regexp-replace nname "^_+" "": ${var##_#} or ${var/#_#}

Here, you could combine them with ${${${var//[^[:alnum:]]##/_}#_}%_} (convert sequences of non-alnums to _ and remove an eventual leading or trailing _).
Another approach could be to extract all the sequences of alnums and join them with _, using this hack:
words=()
: ${var//(#m)[[:alnum:]]##/${words[1+$#words]::=$MATCH}}
var=${(j:_:)words}

regexp-replace itself is an autoloadable function that calls [[ $var =~ pattern ]] in a loop. Note that as a result, it doesn't work properly with the ^ anchor or word boundary or look-behind operators (if using the rematchpcre option):
$ a='aaab'; regexp-replace a '^a' x; echo "$a"
xxxb
$ a='abab'; regexp-replace a '\<ab' '<$MATCH>'; echo $a
<ab><ab>

(in the first example, ^a is matched in turn against aaab, aab, ab, b in that loop).