Skip to content

Instantly share code, notes, and snippets.

@brianlayman
Forked from GreenFootballs/amazon_regex.md
Created October 19, 2018 22:12
Show Gist options
  • Save brianlayman/9071ebd13b926b8a0cf9ae1c05ce035c to your computer and use it in GitHub Desktop.
Save brianlayman/9071ebd13b926b8a0cf9ae1c05ce035c to your computer and use it in GitHub Desktop.
A PHP regular expression to match Amazon links and extract the ASIN identifier
~
    (?:(smile\.|www\.))?    # optionally starts with smile. or www.
    ama?zo?n\.              # also allow shortened amzn.com URLs
    (?:
        com                 # match all Amazon domains
        |
        ca
        |
        co\.uk
        |
        co\.jp
        |
        de
        |
        fr
    )
    /
    (?:                     # here comes the stuff before the ASIN
        exec/obidos/ASIN/   # the possible components of an Amazon URL
        |
        o/
        |
        gp/product/
        |
        (?:                 # the dp/ format may contain a title
            (?:[^"\'/]*)/   # anything but a slash or quote
        )?                  # optional
        dp/
        |                   # if amzn.com format, nothing before the ASIN
    )
    ([A-Z0-9]{10})          # capture group $2 will contain the ASIN
    (?:                     # everything after the ASIN
        (?:/|\?|\#)         # starting with a slash, question mark, or hash
        (?:[^"\'\s]*)       # everything up to a quote or white space
    )?                      # optional
~isx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment