Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Bash urlencode and urldecode
urlencode() {
# urlencode <string>
old_lc_collate=$LC_COLLATE
LC_COLLATE=C
local length="${#1}"
for (( i = 0; i < length; i++ )); do
local c="${1:$i:1}"
case $c in
[a-zA-Z0-9.~_-]) printf '%s' "$c" ;;
*) printf '%%%02X' "'$c" ;;
esac
done
LC_COLLATE=$old_lc_collate
}
urldecode() {
# urldecode <string>
local url_encoded="${1//+/ }"
printf '%b' "${url_encoded//%/\\x}"
}
@kovagoz

This comment has been minimized.

Copy link

@kovagoz kovagoz commented Dec 20, 2011

great native solution!

@dabukalam

This comment has been minimized.

Copy link

@dabukalam dabukalam commented Oct 28, 2013

Nice one.

@hezhong002

This comment has been minimized.

Copy link

@hezhong002 hezhong002 commented Jan 2, 2014

nice

@gdbtek

This comment has been minimized.

Copy link

@gdbtek gdbtek commented Jan 28, 2014

it works! Thanks

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented May 5, 2014

The decode will be glacial for large input http://stackoverflow.com/q/14967299

@mlromramse

This comment has been minimized.

Copy link

@mlromramse mlromramse commented May 9, 2014

awesome!

@mrubin

This comment has been minimized.

Copy link

@mrubin mrubin commented May 15, 2014

The urlencode above wasn't handling newlines in the input correctly - it was creating a '%A' instead of a '%0A'.

I changed

*) printf '%%%X' "'$c"

to

*) printf '%%%02X' "'$c"

to pad the hex character with a 0, if necessary.

(The Bash version I'm using: GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu))

@mrubin

This comment has been minimized.

Copy link

@mrubin mrubin commented May 15, 2014

Also, according to http://stackoverflow.com/questions/2678551/when-to-encode-space-to-plus-or-20, '+' means a space only in some contexts... whereas it's always safe to use '%20'. I've locally taken out that middle line (which converts ' ' to +) as well.

@cdown

This comment has been minimized.

Copy link
Owner Author

@cdown cdown commented May 21, 2014

@mrubin Those are both valid points, thanks.

@meoow

This comment has been minimized.

Copy link

@meoow meoow commented Jul 2, 2014

printf '%%%02X' "'$c" won't handle wide-characters well:

urlencode 你好世界

%FFFFFFFFFFFFFFE4%FFFFFFFFFFFFFFE5%FFFFFFFFFFFFFFE4%FFFFFFFFFFFFFFE7

You might have to bring external tools like xxd into the script

urlencode() {
  local length="${#1}"
  for (( i = 0; i < length; i++ )); do
    local c="${1:i:1}"
    case $c in
      [a-zA-Z0-9.~_-]) printf "$c" ;;
    *) printf "$c" | xxd -p -c1 | while read x;do printf "%%%s" "$x";done
  esac
done
}
urlencode 你好世界

%e4%bd%a0%e5%a5%bd%e4%b8%96%e7%95%8c

It got the same result as python does.

import urllib
urllib.quote('你好世界')

'%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C'

@chutzpah

This comment has been minimized.

Copy link

@chutzpah chutzpah commented Aug 7, 2014

I believe you'll want double backslashes in line 18 for urldecode:

printf '%b' "${url_encoded//%/\\x}"
@kenorb

This comment has been minimized.

Copy link

@kenorb kenorb commented Mar 1, 2015

Related: How to decode URL-encoded string in shell? at stackoverflow SE

@csghuser

This comment has been minimized.

Copy link

@csghuser csghuser commented Mar 18, 2015

chutzpah thanks for that, without the black slashes the decoding wasn't working for me.

cdown thanks for the scripts!!!

@stek29

This comment has been minimized.

Copy link

@stek29 stek29 commented Sep 13, 2015

Hey, at 9th line it chould be printf '%s' "$c" , not printf '%s' '$c'
Bash doesn't substitute variables in single quotes, only in double
Sourldecode $(urlencode 'ЖжШщ i') gives $c$c$c$c$ci, but not ЖжШщ i

@mntnorv

This comment has been minimized.

Copy link

@mntnorv mntnorv commented Sep 18, 2015

doesn't work

@alethenorio

This comment has been minimized.

Copy link

@alethenorio alethenorio commented Sep 24, 2015

Couldn't quite get this to work until I added an eval in printf. If you are having issues try this instead

urlencode() {
    # urlencode <string>

    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            *) eval printf '%s' '$c' | xxd -p -c1 |
                   while read c; do printf '%%%s' "$c"; done ;;
        esac
    done
}
@zephred

This comment has been minimized.

Copy link

@zephred zephred commented Oct 6, 2015

I had to write line 9 with double quotes (instead of single) around $c:

    *) printf '%s' "$c" | xxd -p -c1 |

ot specials chars would always come as %24%63 (i.e. $c).

@cdown

This comment has been minimized.

Copy link
Owner Author

@cdown cdown commented Oct 20, 2015

yeah, that was a typo. sorry about that.

@morgan-greywolf

This comment has been minimized.

Copy link

@morgan-greywolf morgan-greywolf commented Nov 18, 2015

You should use stdin for urldecode rather than a parameter. This leaves you open to shell command injection.

@thedward

This comment has been minimized.

Copy link

@thedward thedward commented Dec 4, 2015

Here's a version that doesn't need xxd, but still handles wide characters. It's much faster.

urlencode() {
    # urlencode <string>

    local LANG=C
    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            *) printf '%%%02X' "'$c" ;; 
        esac
    done
}
@mkllnk

This comment has been minimized.

Copy link

@mkllnk mkllnk commented Dec 19, 2015

On my computer the character ä is not encoded.

case ä in [a-z]) echo yes ;; esac
# yes
@sunnybear

This comment has been minimized.

Copy link

@sunnybear sunnybear commented Jan 5, 2016

Thank you. This one is 4-5x faster

@cdown

This comment has been minimized.

Copy link
Owner Author

@cdown cdown commented Feb 15, 2016

@morgan-greywolf Using stdin vs. a parameter has no relation to command injection -- if you are having problems with command injection, then they are outside of these functions in the first place.

@bigretromike

This comment has been minimized.

Copy link

@bigretromike bigretromike commented Aug 26, 2016

@thedward thans for code

@lifeofguenter

This comment has been minimized.

Copy link

@lifeofguenter lifeofguenter commented Sep 15, 2016

https://gist.github.com/cdown/1163649#file-gistfile1-sh-L4

Instead of setting LC_COLLATE and replacing it later with the "old value" (this method can/will cause race conditions) you can also just set it as "local" variable, it will not overwrite the main shell:

local LC_COLLATE=C
@DarbyCrash

This comment has been minimized.

Copy link

@DarbyCrash DarbyCrash commented Nov 4, 2016

This is my edited version, a bit more fast and clean:

function urlencode() {
	local LANG=C
	for ((i=0;i<${#1};i++)); do
		if [[ ${1:$i:1} =~ ^[a-zA-Z0-9\.\~\_\-]$ ]]; then
			printf "${1:$i:1}"
		else
			printf '%%%02X' "'${1:$i:1}"
		fi
	done
}
@cdown

This comment has been minimized.

Copy link
Owner Author

@cdown cdown commented Nov 7, 2016

@lifeofguenter local is not POSIX.

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Jul 24, 2017

As long as we're employing bash-isms...
features of this version:

  • don't make it parse a fancy expansion ${1:$i:1} 3 times per character, collect it into $c once
  • set $c to default value every time, then test only overwrites it sometimes
  • use bash-specific printf -v to write to a variable
  • assemble an output variable rather than print each character as it's encountered
    This do 2 things:
    1 - only one print command at the end instead of one per character
    2 - code is more easily re-used or modified for other uses besides direct output. For instance to fill other variables without having to fork and collect the output of a function. Instead you can either copy the meat of the function right into your loop and replace final echo $e with arr[n]=$e etc, and/or you can have the function write to a global variable (and not echo anything), ie: in function make e not local, then you can have 50 urlencode ${original[n]} ;encoded[n]=$e and everything would doing direct memory operations foo=bar, no forking or collecting stdout output.
urlencode() {
	local LANG=C i c e=''
	for ((i=0;i<${#1};i++)); do
                c=${1:$i:1}
		[[ "$c" =~ [a-zA-Z0-9\.\~\_\-] ]] || printf -v c '%%%02X' "'$c"
                e+="$c"
	done
        echo "$e"
}
@linuxmail

This comment has been minimized.

Copy link

@linuxmail linuxmail commented Dec 27, 2017

@aljex

can you put your code under a proper license? :-) : Icinga/icinga2#5540

@daniele-pecora

This comment has been minimized.

Copy link

@daniele-pecora daniele-pecora commented Jan 20, 2018

Didn't get any of all versions to work when spaces are contained in string.

This is what I go with and works for me

function rawurlencode {
  local string="${1}"
  local strlen=${#string}
  local encoded=""
  local pos c o

  for (( pos=0 ; pos<strlen ; pos++ )); do
     c=${string:${pos}:1}
     case "$c" in
        [-_.~a-zA-Z0-9] ) o="${c}" ;;
        * )               printf -v o '%%%02x' "'$c"
     esac
     encoded+="${o}"
  done
  echo "${encoded}"    # You can either set a return variable (FASTER)
  REPLY="${encoded}"   #+or echo the result (EASIER)... or both... :p
}
@madicd

This comment has been minimized.

Copy link

@madicd madicd commented Jan 20, 2018

Version @aljex posted on Jul 24, 2017 works great for me even with spaces contained in string.
Thanks!

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Jan 26, 2018

My code was only a massaging and optimization of someone else's code, but since I have been asked, I'll answer that as far as I'm concerned, the license for this little bash function is MIT. brian@aljex.com

@rbukovansky

This comment has been minimized.

Copy link

@rbukovansky rbukovansky commented Apr 12, 2018

@aljex Thank you. That code is lifesaver! 👍 👏

@deajan

This comment has been minimized.

Copy link

@deajan deajan commented Jun 5, 2018

Just my two cents with Busybox, the for loop won't work.
Had to replace it with for i in $(seq 0 $((length-1))); do

@xoyabc

This comment has been minimized.

Copy link

@xoyabc xoyabc commented Jul 1, 2018

This works for me.

https://stackoverflow.com/questions/296536/how-to-urlencode-data-for-curl-command

rawurlencode() {
  local string="${1}"
  local strlen=${#string}
  local encoded=""
  local pos c o

  for (( pos=0 ; pos<strlen ; pos++ )); do
     c=${string:$pos:1}
     case "$c" in
        [-_.~a-zA-Z0-9] ) o="${c}" ;;
        * )               printf -v o '%%%02x' "'$c"
     esac
     encoded+="${o}"
  done
  echo "${encoded}"    # You can either set a return variable (FASTER) 
  REPLY="${encoded}"   #+or echo the result (EASIER)... or both... :p
}
@renyuntao

This comment has been minimized.

Copy link

@renyuntao renyuntao commented Jul 25, 2018

What does the ' mean in "'${_print_offset}"?

@egoarka

This comment has been minimized.

Copy link

@egoarka egoarka commented Nov 7, 2018

if you have node js installed:

alias urlencode='node --eval "console.log(encodeURIComponent(process.argv[1]))"'
@callicoder

This comment has been minimized.

Copy link

@callicoder callicoder commented Dec 10, 2018

Thanks for the gist. It was helpful. You can also encode URLs online using URLEncoder.io

@westfly

This comment has been minimized.

Copy link

@westfly westfly commented Dec 26, 2018

have a little problem with Chinese Character.

this solution below using curl command to encode url can work with Chinese Character.

function urlencode() {
if [[ $# != 1 ]]; then
echo "Usage: $0 string-to-urlencode"
return 1
fi
local data="$(curl -s -o /dev/null -w %{url_effective} --get --data-urlencode "$1" "")"
if [[ $? == 0 ]]; then
echo "${data##/?}"
fi
return 0
}
https://gist.github.com/westfly/ed7e25ee4353751d94132f92837a7074

@keith-bennett-gbg

This comment has been minimized.

Copy link

@keith-bennett-gbg keith-bennett-gbg commented Feb 13, 2019

Looks good. Shellcheck complains about the printf printing the character directly. Easily solvable to be shellcheck-clean:

<             [a-zA-Z0-9.~_-]) printf "%s" "$c" ;;
---
>             [a-zA-Z0-9.~_-]) printf "$c" ;;
@benyaminl

This comment has been minimized.

Copy link

@benyaminl benyaminl commented Jun 2, 2019

Great script! Thanks!

@crabvk

This comment has been minimized.

Copy link

@crabvk crabvk commented Oct 3, 2019

For fish users

function urlencode
  set str (string join ' ' $argv)

  for c in (string split '' $str)
    if string match -qr '[a-zA-Z0-9.~_-]' $c
      env LC_COLLATE=C printf "$c"
    else
      env LC_COLLATE=C printf '%%%02X' "'$c"
    end
  end
end

function urldecode
    set url_encoded (string replace -a '+' ' ' $argv[1])
    printf '%b' (string replace -a '%' '\\x' $url_encoded)
end
@paxsali

This comment has been minimized.

Copy link

@paxsali paxsali commented Oct 9, 2019

urlencode() {
    # urlencode <string>
    old_lc_collate=$LC_COLLATE
    LC_COLLATE=C
    
    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            ' ') printf "%%20" ;;
            *) printf '%%%02X' "'$c" ;;
        esac
    done
    
    LC_COLLATE=$old_lc_collate
}
@d3vopz-net

This comment has been minimized.

Copy link

@d3vopz-net d3vopz-net commented Nov 3, 2019

paxsalis version works with bash like charm, but not with bourne (

That snippet worked with bourne

urlencode() {
    # urlencode <string>
    old_lc_collate=$LC_COLLATE
    LC_COLLATE=C
    local i=1
    local length="${#1}"
    while [ $i -le $length ]
    do
        local c=$(echo "$(expr substr $1 $i 1)")
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            ' ') printf "%%20" ;;
            *) printf '%%%02X' "'$c" ;;
        esac
        i=`expr $i + 1`
    done

    LC_COLLATE=$old_lc_collate
}
@krin-san

This comment has been minimized.

Copy link

@krin-san krin-san commented Jun 18, 2020

On zsh I was getting unrecognized modifier 'i' until I changed the following line:

-      local c="${1:i:1}"
+      local c="${1:$i:1}"
@abv1992

This comment has been minimized.

Copy link

@abv1992 abv1992 commented Aug 5, 2020

amazing solution. thanks

@rajeshisnepali

This comment has been minimized.

Copy link

@rajeshisnepali rajeshisnepali commented Aug 20, 2020

On zsh I was getting unrecognized modifier 'i' until I changed the following line:

-      local c="${1:i:1}"
+      local c="${1:$i:1}"

@cdown, I think this should be replaced on your gist.

@rajeshisnepali

This comment has been minimized.

Copy link

@rajeshisnepali rajeshisnepali commented Aug 20, 2020

urldecode "abc%40abc.com" returns "abc%40abc.com".
Not working.

@cdown

This comment has been minimized.

Copy link
Owner Author

@cdown cdown commented Aug 20, 2020

It works just fine.

$ urldecode() {
>     # urldecode <string>
> 
>     local url_encoded="${1//+/ }"
>     printf '%b' "${url_encoded//%/\\x}"
> }
$ urldecode "abc%40abc.com"
abc@abc.com
@rajeshisnepali

This comment has been minimized.

Copy link

@rajeshisnepali rajeshisnepali commented Aug 20, 2020

I've made a small screencast. Would you mind watching this, please?

https://www.dropbox.com/s/dul4wipk59o2ttk/urldecode_not_working_gist_1163649.webm?dl=0

@rajeshisnepali

This comment has been minimized.

Copy link

@rajeshisnepali rajeshisnepali commented Aug 20, 2020

No, It's your own gist. I've just modified a line 'local c="${1:$i:1}"' according to @krin-san. I did only because I was getting "unrecognized modifier 'i'" error on zsh.

@Sleepingwell

This comment has been minimized.

Copy link

@Sleepingwell Sleepingwell commented Sep 16, 2020

Lovely! Thank you all!

@mountaineerbr

This comment has been minimized.

Copy link

@mountaineerbr mountaineerbr commented Dec 12, 2020

@rajeshisnepali for urldecode to work with my zsh:
In zsh ${url_encoded//%/\\x} adds a \x to the end but ${url_encoded//\%/\\x} replaces % with \x.
lri - https://unix.stackexchange.com/questions/159253/decoding-url-encoding-percent-encoding

also, in urlencode, $i could be made local
local i length="${#1}"

@rajeshisnepali

This comment has been minimized.

Copy link

@rajeshisnepali rajeshisnepali commented Dec 13, 2020

@mountaineerbr Thanks for the information but I couldn't make the above script work.

But it worked using an alias from the python3 script 👍 (from the link above).

@cjplay02

This comment has been minimized.

Copy link

@cjplay02 cjplay02 commented Jan 29, 2021

I'm seeing strings with 2 or more consecutive spaces get shrunk to 1 space. So %20%20 or ' ' changes to only ' '. Any thoughts on how to not truncate these spaces?

@cdown

This comment has been minimized.

Copy link
Owner Author

@cdown cdown commented Jan 29, 2021

@cjplay02 I'm pretty sure the issue is elsewhere.

$ urldecode() {
    # urldecode <string>

    local url_encoded="${1//+/ }"
    printf '%b' "${url_encoded//%/\\x}"
}
$ urldecode 'foo%20%20%20%20%20bar'
foo     bar
@cjplay02

This comment has been minimized.

Copy link

@cjplay02 cjplay02 commented Jan 29, 2021

Good call @cdown. I had the urldecode call in a command substitution - urldecoded=$(urldecode 's3://...'). Once I removed the function call from the command substitution, the spaces were retained from the encoding. Now I just need to find a better way to declare the result as a variable...

Edit. Double Quoting around the variable's presentation in downstream commands fixed my issue. Ie echo "$varname"

@dicktyr

This comment has been minimized.

Copy link

@dicktyr dicktyr commented Jan 31, 2021

just a brief nod to mawk
which is five times faster in my tests
(indeed, often faster than sed)

I know it's not a de facto standard like bash
i.e. installed by default on so many systems
but it should be and it is on my systems

I also notice that bash seems to be catching up with ksh93

@GeekDuanLian

This comment has been minimized.

Copy link

@GeekDuanLian GeekDuanLian commented Feb 9, 2021

One line implementation, suitable for storing in .bashrc

urle () { [[ "${1}" ]] || return 1; local LANG=C i x; for (( i = 0; i < ${#1}; i++ )); do x="${1:i:1}"; [[ "${x}" == [a-zA-Z0-9.~_-] ]] && echo -n "${x}" || printf '%%%02X' "'${x}"; done; echo; }
urld () { [[ "${1}" ]] || return 1; : "${1//+/ }"; echo -e "${_//%/\\x}"; }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment