Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Bash urlencode and urldecode
urlencode() {
# urlencode <string>
old_lc_collate=$LC_COLLATE
LC_COLLATE=C
local length="${#1}"
for (( i = 0; i < length; i++ )); do
local c="${1:$i:1}"
case $c in
[a-zA-Z0-9.~_-]) printf '%s' "$c" ;;
*) printf '%%%02X' "'$c" ;;
esac
done
LC_COLLATE=$old_lc_collate
}
urldecode() {
# urldecode <string>
local url_encoded="${1//+/ }"
printf '%b' "${url_encoded//%/\\x}"
}
@kovagoz

This comment has been minimized.

Copy link

@kovagoz kovagoz commented Dec 20, 2011

great native solution!

@dabukalam

This comment has been minimized.

Copy link

@dabukalam dabukalam commented Oct 28, 2013

Nice one.

@hezhong002

This comment has been minimized.

Copy link

@hezhong002 hezhong002 commented Jan 2, 2014

nice

@gdbtek

This comment has been minimized.

Copy link

@gdbtek gdbtek commented Jan 28, 2014

it works! Thanks

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented May 5, 2014

The decode will be glacial for large input http://stackoverflow.com/q/14967299

@mlromramse

This comment has been minimized.

Copy link

@mlromramse mlromramse commented May 9, 2014

awesome!

@mrubin

This comment has been minimized.

Copy link

@mrubin mrubin commented May 15, 2014

The urlencode above wasn't handling newlines in the input correctly - it was creating a '%A' instead of a '%0A'.

I changed

*) printf '%%%X' "'$c"

to

*) printf '%%%02X' "'$c"

to pad the hex character with a 0, if necessary.

(The Bash version I'm using: GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu))

@mrubin

This comment has been minimized.

Copy link

@mrubin mrubin commented May 15, 2014

Also, according to http://stackoverflow.com/questions/2678551/when-to-encode-space-to-plus-or-20, '+' means a space only in some contexts... whereas it's always safe to use '%20'. I've locally taken out that middle line (which converts ' ' to +) as well.

@cdown

This comment has been minimized.

Copy link
Owner Author

@cdown cdown commented May 21, 2014

@mrubin Those are both valid points, thanks.

@meoow

This comment has been minimized.

Copy link

@meoow meoow commented Jul 2, 2014

printf '%%%02X' "'$c" won't handle wide-characters well:

urlencode 你好世界

%FFFFFFFFFFFFFFE4%FFFFFFFFFFFFFFE5%FFFFFFFFFFFFFFE4%FFFFFFFFFFFFFFE7

You might have to bring external tools like xxd into the script

urlencode() {
  local length="${#1}"
  for (( i = 0; i < length; i++ )); do
    local c="${1:i:1}"
    case $c in
      [a-zA-Z0-9.~_-]) printf "$c" ;;
    *) printf "$c" | xxd -p -c1 | while read x;do printf "%%%s" "$x";done
  esac
done
}
urlencode 你好世界

%e4%bd%a0%e5%a5%bd%e4%b8%96%e7%95%8c

It got the same result as python does.

import urllib
urllib.quote('你好世界')

'%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C'

@chutzpah

This comment has been minimized.

Copy link

@chutzpah chutzpah commented Aug 7, 2014

I believe you'll want double backslashes in line 18 for urldecode:

printf '%b' "${url_encoded//%/\\x}"
@kenorb

This comment has been minimized.

Copy link

@kenorb kenorb commented Mar 1, 2015

Related: How to decode URL-encoded string in shell? at stackoverflow SE

@csghuser

This comment has been minimized.

Copy link

@csghuser csghuser commented Mar 18, 2015

chutzpah thanks for that, without the black slashes the decoding wasn't working for me.

cdown thanks for the scripts!!!

@stek29

This comment has been minimized.

Copy link

@stek29 stek29 commented Sep 13, 2015

Hey, at 9th line it chould be printf '%s' "$c" , not printf '%s' '$c'
Bash doesn't substitute variables in single quotes, only in double
Sourldecode $(urlencode 'ЖжШщ i') gives $c$c$c$c$ci, but not ЖжШщ i

@mntnorv

This comment has been minimized.

Copy link

@mntnorv mntnorv commented Sep 18, 2015

doesn't work

@alethenorio

This comment has been minimized.

Copy link

@alethenorio alethenorio commented Sep 24, 2015

Couldn't quite get this to work until I added an eval in printf. If you are having issues try this instead

urlencode() {
    # urlencode <string>

    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            *) eval printf '%s' '$c' | xxd -p -c1 |
                   while read c; do printf '%%%s' "$c"; done ;;
        esac
    done
}
@zephred

This comment has been minimized.

Copy link

@zephred zephred commented Oct 6, 2015

I had to write line 9 with double quotes (instead of single) around $c:

    *) printf '%s' "$c" | xxd -p -c1 |

ot specials chars would always come as %24%63 (i.e. $c).

@cdown

This comment has been minimized.

Copy link
Owner Author

@cdown cdown commented Oct 20, 2015

yeah, that was a typo. sorry about that.

@morgan-greywolf

This comment has been minimized.

Copy link

@morgan-greywolf morgan-greywolf commented Nov 18, 2015

You should use stdin for urldecode rather than a parameter. This leaves you open to shell command injection.

@thedward

This comment has been minimized.

Copy link

@thedward thedward commented Dec 4, 2015

Here's a version that doesn't need xxd, but still handles wide characters. It's much faster.

urlencode() {
    # urlencode <string>

    local LANG=C
    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            *) printf '%%%02X' "'$c" ;; 
        esac
    done
}
@mkllnk

This comment has been minimized.

Copy link

@mkllnk mkllnk commented Dec 19, 2015

On my computer the character ä is not encoded.

case ä in [a-z]) echo yes ;; esac
# yes
@sunnybear

This comment has been minimized.

Copy link

@sunnybear sunnybear commented Jan 5, 2016

Thank you. This one is 4-5x faster

@cdown

This comment has been minimized.

Copy link
Owner Author

@cdown cdown commented Feb 15, 2016

@morgan-greywolf Using stdin vs. a parameter has no relation to command injection -- if you are having problems with command injection, then they are outside of these functions in the first place.

@bigretromike

This comment has been minimized.

Copy link

@bigretromike bigretromike commented Aug 26, 2016

@thedward thans for code

@lifeofguenter

This comment has been minimized.

Copy link

@lifeofguenter lifeofguenter commented Sep 15, 2016

https://gist.github.com/cdown/1163649#file-gistfile1-sh-L4

Instead of setting LC_COLLATE and replacing it later with the "old value" (this method can/will cause race conditions) you can also just set it as "local" variable, it will not overwrite the main shell:

local LC_COLLATE=C
@DarbyCrash

This comment has been minimized.

Copy link

@DarbyCrash DarbyCrash commented Nov 4, 2016

This is my edited version, a bit more fast and clean:

function urlencode() {
	local LANG=C
	for ((i=0;i<${#1};i++)); do
		if [[ ${1:$i:1} =~ ^[a-zA-Z0-9\.\~\_\-]$ ]]; then
			printf "${1:$i:1}"
		else
			printf '%%%02X' "'${1:$i:1}"
		fi
	done
}
@cdown

This comment has been minimized.

Copy link
Owner Author

@cdown cdown commented Nov 7, 2016

@lifeofguenter local is not POSIX.

@aljex

This comment has been minimized.

Copy link

@aljex aljex commented Jul 24, 2017

As long as we're employing bash-isms...
features of this version:

  • don't make it parse a fancy expansion ${1:$i:1} 3 times per character, collect it into $c once
  • set $c to default value every time, then test only overwrites it sometimes
  • use bash-specific printf -v to write to a variable
  • assemble an output variable rather than print each character as it's encountered
    This do 2 things:
    1 - only one print command at the end instead of one per character
    2 - code is more easily re-used or modified for other uses besides direct output. For instance to fill other variables without having to fork and collect the output of a function. Instead you can either copy the meat of the function right into your loop and replace final echo $e with arr[n]=$e etc, and/or you can have the function write to a global variable (and not echo anything), ie: in function make e not local, then you can have 50 urlencode ${original[n]} ;encoded[n]=$e and everything would doing direct memory operations foo=bar, no forking or collecting stdout output.
urlencode() {
	local LANG=C i c e=''
	for ((i=0;i<${#1};i++)); do
                c=${1:$i:1}
		[[ "$c" =~ [a-zA-Z0-9\.\~\_\-] ]] || printf -v c '%%%02X' "'$c"
                e+="$c"
	done
        echo "$e"
}
@linuxmail

This comment has been minimized.

Copy link

@linuxmail linuxmail commented Dec 27, 2017

@aljex

can you put your code under a proper license? :-) : Icinga/icinga2#5540

@daniele-pecora

This comment has been minimized.

Copy link

@daniele-pecora daniele-pecora commented Jan 20, 2018

Didn't get any of all versions to work when spaces are contained in string.

This is what I go with and works for me

function rawurlencode {
  local string="${1}"
  local strlen=${#string}
  local encoded=""
  local pos c o

  for (( pos=0 ; pos<strlen ; pos++ )); do
     c=${string:${pos}:1}
     case "$c" in
        [-_.~a-zA-Z0-9] ) o="${c}" ;;
        * )               printf -v o '%%%02x' "'$c"
     esac
     encoded+="${o}"
  done
  echo "${encoded}"    # You can either set a return variable (FASTER)
  REPLY="${encoded}"   #+or echo the result (EASIER)... or both... :p
}
@madicd

This comment has been minimized.

Copy link

@madicd madicd commented Jan 20, 2018

Version @aljex posted on Jul 24, 2017 works great for me even with spaces contained in string.
Thanks!

@aljex

This comment has been minimized.

Copy link

@aljex aljex commented Jan 26, 2018

My code was only a massaging and optimization of someone else's code, but since I have been asked, I'll answer that as far as I'm concerned, the license for this little bash function is MIT. brian@aljex.com

@rbukovansky

This comment has been minimized.

Copy link

@rbukovansky rbukovansky commented Apr 12, 2018

@aljex Thank you. That code is lifesaver! 👍 👏

@deajan

This comment has been minimized.

Copy link

@deajan deajan commented Jun 5, 2018

Just my two cents with Busybox, the for loop won't work.
Had to replace it with for i in $(seq 0 $((length-1))); do

@xoyabc

This comment has been minimized.

Copy link

@xoyabc xoyabc commented Jul 1, 2018

This works for me.

https://stackoverflow.com/questions/296536/how-to-urlencode-data-for-curl-command

rawurlencode() {
  local string="${1}"
  local strlen=${#string}
  local encoded=""
  local pos c o

  for (( pos=0 ; pos<strlen ; pos++ )); do
     c=${string:$pos:1}
     case "$c" in
        [-_.~a-zA-Z0-9] ) o="${c}" ;;
        * )               printf -v o '%%%02x' "'$c"
     esac
     encoded+="${o}"
  done
  echo "${encoded}"    # You can either set a return variable (FASTER) 
  REPLY="${encoded}"   #+or echo the result (EASIER)... or both... :p
}
@renyuntao

This comment has been minimized.

Copy link

@renyuntao renyuntao commented Jul 25, 2018

What does the ' mean in "'${_print_offset}"?

@egoarka

This comment has been minimized.

Copy link

@egoarka egoarka commented Nov 7, 2018

if you have node js installed:

alias urlencode='node --eval "console.log(encodeURIComponent(process.argv[1]))"'
@callicoder

This comment has been minimized.

Copy link

@callicoder callicoder commented Dec 10, 2018

Thanks for the gist. It was helpful. You can also encode URLs online using URLEncoder.io

@westfly

This comment has been minimized.

Copy link

@westfly westfly commented Dec 26, 2018

have a little problem with Chinese Character.

this solution below using curl command to encode url can work with Chinese Character.

function urlencode() {
if [[ $# != 1 ]]; then
echo "Usage: $0 string-to-urlencode"
return 1
fi
local data="$(curl -s -o /dev/null -w %{url_effective} --get --data-urlencode "$1" "")"
if [[ $? == 0 ]]; then
echo "${data##/?}"
fi
return 0
}
https://gist.github.com/westfly/ed7e25ee4353751d94132f92837a7074

@keith-bennett-gbg

This comment has been minimized.

Copy link

@keith-bennett-gbg keith-bennett-gbg commented Feb 13, 2019

Looks good. Shellcheck complains about the printf printing the character directly. Easily solvable to be shellcheck-clean:

<             [a-zA-Z0-9.~_-]) printf "%s" "$c" ;;
---
>             [a-zA-Z0-9.~_-]) printf "$c" ;;
@benyaminl

This comment has been minimized.

Copy link

@benyaminl benyaminl commented Jun 2, 2019

Great script! Thanks!

@vyachkonovalov

This comment has been minimized.

Copy link

@vyachkonovalov vyachkonovalov commented Oct 3, 2019

For fish users

function urlencode
  set str (string join ' ' $argv)

  for c in (string split '' $str)
    if string match -qr '[a-zA-Z0-9.~_-]' $c
      env LC_COLLATE=C printf "$c"
    else
      env LC_COLLATE=C printf '%%%02X' "'$c"
    end
  end
end

function urldecode
    set url_encoded (string replace -a '+' ' ' $argv[1])
    printf '%b' (string replace -a '%' '\\x' $url_encoded)
end
@paxsali

This comment has been minimized.

Copy link

@paxsali paxsali commented Oct 9, 2019

urlencode() {
    # urlencode <string>
    old_lc_collate=$LC_COLLATE
    LC_COLLATE=C
    
    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            ' ') printf "%%20" ;;
            *) printf '%%%02X' "'$c" ;;
        esac
    done
    
    LC_COLLATE=$old_lc_collate
}
@d3vopz-net

This comment has been minimized.

Copy link

@d3vopz-net d3vopz-net commented Nov 3, 2019

paxsalis version works with bash like charm, but not with bourne (

That snippet worked with bourne

urlencode() {
    # urlencode <string>
    old_lc_collate=$LC_COLLATE
    LC_COLLATE=C
    local i=1
    local length="${#1}"
    while [ $i -le $length ]
    do
        local c=$(echo "$(expr substr $1 $i 1)")
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            ' ') printf "%%20" ;;
            *) printf '%%%02X' "'$c" ;;
        esac
        i=`expr $i + 1`
    done

    LC_COLLATE=$old_lc_collate
}
@krin-san

This comment has been minimized.

Copy link

@krin-san krin-san commented Jun 18, 2020

On zsh I was getting unrecognized modifier 'i' until I changed the following line:

-      local c="${1:i:1}"
+      local c="${1:$i:1}"
@abv1992

This comment has been minimized.

Copy link

@abv1992 abv1992 commented Aug 5, 2020

amazing solution. thanks

@rajeshisnepali

This comment has been minimized.

Copy link

@rajeshisnepali rajeshisnepali commented Aug 20, 2020

On zsh I was getting unrecognized modifier 'i' until I changed the following line:

-      local c="${1:i:1}"
+      local c="${1:$i:1}"

@cdown, I think this should be replaced on your gist.

@rajeshisnepali

This comment has been minimized.

Copy link

@rajeshisnepali rajeshisnepali commented Aug 20, 2020

urldecode "abc%40abc.com" returns "abc%40abc.com".
Not working.

@cdown

This comment has been minimized.

Copy link
Owner Author

@cdown cdown commented Aug 20, 2020

It works just fine.

$ urldecode() {
>     # urldecode <string>
> 
>     local url_encoded="${1//+/ }"
>     printf '%b' "${url_encoded//%/\\x}"
> }
$ urldecode "abc%40abc.com"
abc@abc.com
@rajeshisnepali

This comment has been minimized.

Copy link

@rajeshisnepali rajeshisnepali commented Aug 20, 2020

I've made a small screencast. Would you mind watching this, please?

https://www.dropbox.com/s/dul4wipk59o2ttk/urldecode_not_working_gist_1163649.webm?dl=0

@rajeshisnepali

This comment has been minimized.

Copy link

@rajeshisnepali rajeshisnepali commented Aug 20, 2020

No, It's your own gist. I've just modified a line 'local c="${1:$i:1}"' according to @krin-san. I did only because I was getting "unrecognized modifier 'i'" error on zsh.

@Sleepingwell

This comment has been minimized.

Copy link

@Sleepingwell Sleepingwell commented Sep 16, 2020

Lovely! Thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.