Skip to content

Instantly share code, notes, and snippets.

@Crystalh
Last active August 29, 2015 14:01
Show Gist options
  • Save Crystalh/6b5e3c72f9e0ed6e4163 to your computer and use it in GitHub Desktop.
Save Crystalh/6b5e3c72f9e0ed6e4163 to your computer and use it in GitHub Desktop.
Bash capture groups

data:

locator: urn:asset:a1e2d1df-a741-6547-93e2-e84692c8e981
locator: urn:bbc:cps:asset:27510321

script:

#!/bin/bash

while read l
    do
        if [[ $l =~ [(\w*:\s\w*:\w*:)(\w*-.*)] ]]; then
            for i in "${BASH_REMATCH[@]}"
            do
                echo "Index: $i"
            done
            
            echo "${BASH_REMATCH[2]}"
        fi
done < data.txt
@Crystalh
Copy link
Author

The special ${BASH_REMATCH} array should contain all the regex capture groups. However, echoing ${BASH_REMATCH[n] returns empty.

@pgchamberlin
Copy link

#!/bin/bash

while read -r l
    do
        if [[ $l =~ .+:(.*)$ ]]; then
            for i in ${BASH_REMATCH[@]}
            do
                echo "Index: $i"
            done

            echo ${BASH_REMATCH[1]}
        fi
done < data.txt

@pgchamberlin
Copy link

Not sure why my regex works though! ;-)

@edwellbrook
Copy link

@pgchamberlin I think yours works because it's matching the whole string, whereas Crystal's is matching only the individual letters.

@Crystalh
Copy link
Author

@pgchamberlin chars! I knew it could be vastly simplified.

I was using rubular.com to verify the matching. The result was misleading:

Match groups:
1.  locator: urn:asset:
2.  f3e8dae6-d5f5-b34e-999a-e256595c65b5

@Crystalh
Copy link
Author

@edwellbrook that definitely makes sense as doing:

for i in ${BASH_REMATCH[@]}
            do
                echo "Index: $i"
            done

Was echoing what looked like one char at a time. We couldn't work out why. Ah well.

@pgchamberlin
Copy link

I think Crystal's matches because surrounding the whole regex in [ ] makes Bash treat the whole regex as a single character class, and because that class contains characters that could match anything it does match everything - so you're right, it is matching all the individual letters, but is capturing none of them.

Looking at mine again, of course I had forgotten that .+ matches spaces, so it matches all characters up until the final :, then captures everything up to the end.

@sthulb
Copy link

sthulb commented May 23, 2014

Rubular uses PCRE and Bash uses ERE. There's subtle differences, but they're there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment