Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Bash urlencode and urldecode
urlencode() {
# urlencode <string>
old_lc_collate=$LC_COLLATE
LC_COLLATE=C
local length="${#1}"
for (( i = 0; i < length; i++ )); do
local c="${1:i:1}"
case $c in
[a-zA-Z0-9.~_-]) printf "$c" ;;
*) printf '%%%02X' "'$c" ;;
esac
done
LC_COLLATE=$old_lc_collate
}
urldecode() {
# urldecode <string>
local url_encoded="${1//+/ }"
printf '%b' "${url_encoded//%/\\x}"
}
@kovagoz

This comment has been minimized.

Copy link

kovagoz commented Dec 20, 2011

great native solution!

@dabukalam

This comment has been minimized.

Copy link

dabukalam commented Oct 28, 2013

Nice one.

@hezhong002

This comment has been minimized.

Copy link

hezhong002 commented Jan 2, 2014

nice

@gdbtek

This comment has been minimized.

Copy link

gdbtek commented Jan 28, 2014

it works! Thanks

@cup

This comment has been minimized.

Copy link

cup commented May 5, 2014

The decode will be glacial for large input http://stackoverflow.com/q/14967299

@mlromramse

This comment has been minimized.

Copy link

mlromramse commented May 9, 2014

awesome!

@mrubin

This comment has been minimized.

Copy link

mrubin commented May 15, 2014

The urlencode above wasn't handling newlines in the input correctly - it was creating a '%A' instead of a '%0A'.

I changed

*) printf '%%%X' "'$c"

to

*) printf '%%%02X' "'$c"

to pad the hex character with a 0, if necessary.

(The Bash version I'm using: GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu))

@mrubin

This comment has been minimized.

Copy link

mrubin commented May 15, 2014

Also, according to http://stackoverflow.com/questions/2678551/when-to-encode-space-to-plus-or-20, '+' means a space only in some contexts... whereas it's always safe to use '%20'. I've locally taken out that middle line (which converts ' ' to +) as well.

@cdown

This comment has been minimized.

Copy link
Owner Author

cdown commented May 21, 2014

@mrubin Those are both valid points, thanks.

@meoow

This comment has been minimized.

Copy link

meoow commented Jul 2, 2014

printf '%%%02X' "'$c" won't handle wide-characters well:

urlencode 你好世界

%FFFFFFFFFFFFFFE4%FFFFFFFFFFFFFFE5%FFFFFFFFFFFFFFE4%FFFFFFFFFFFFFFE7

You might have to bring external tools like xxd into the script

urlencode() {
  local length="${#1}"
  for (( i = 0; i < length; i++ )); do
    local c="${1:i:1}"
    case $c in
      [a-zA-Z0-9.~_-]) printf "$c" ;;
    *) printf "$c" | xxd -p -c1 | while read x;do printf "%%%s" "$x";done
  esac
done
}
urlencode 你好世界

%e4%bd%a0%e5%a5%bd%e4%b8%96%e7%95%8c

It got the same result as python does.

import urllib
urllib.quote('你好世界')

'%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C'

@chutzpah

This comment has been minimized.

Copy link

chutzpah commented Aug 7, 2014

I believe you'll want double backslashes in line 18 for urldecode:

printf '%b' "${url_encoded//%/\\x}"
@kenorb

This comment has been minimized.

Copy link

kenorb commented Mar 1, 2015

Related: How to decode URL-encoded string in shell? at stackoverflow SE

@csghuser

This comment has been minimized.

Copy link

csghuser commented Mar 18, 2015

chutzpah thanks for that, without the black slashes the decoding wasn't working for me.

cdown thanks for the scripts!!!

@stek29

This comment has been minimized.

Copy link

stek29 commented Sep 13, 2015

Hey, at 9th line it chould be printf '%s' "$c" , not printf '%s' '$c'
Bash doesn't substitute variables in single quotes, only in double
Sourldecode $(urlencode 'ЖжШщ i') gives $c$c$c$c$ci, but not ЖжШщ i

@mntnorv

This comment has been minimized.

Copy link

mntnorv commented Sep 18, 2015

doesn't work

@alethenorio

This comment has been minimized.

Copy link

alethenorio commented Sep 24, 2015

Couldn't quite get this to work until I added an eval in printf. If you are having issues try this instead

urlencode() {
    # urlencode <string>

    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            *) eval printf '%s' '$c' | xxd -p -c1 |
                   while read c; do printf '%%%s' "$c"; done ;;
        esac
    done
}
@zephred

This comment has been minimized.

Copy link

zephred commented Oct 6, 2015

I had to write line 9 with double quotes (instead of single) around $c:

    *) printf '%s' "$c" | xxd -p -c1 |

ot specials chars would always come as %24%63 (i.e. $c).

@cdown

This comment has been minimized.

Copy link
Owner Author

cdown commented Oct 20, 2015

yeah, that was a typo. sorry about that.

@morgan-greywolf

This comment has been minimized.

Copy link

morgan-greywolf commented Nov 18, 2015

You should use stdin for urldecode rather than a parameter. This leaves you open to shell command injection.

@thedward

This comment has been minimized.

Copy link

thedward commented Dec 4, 2015

Here's a version that doesn't need xxd, but still handles wide characters. It's much faster.

urlencode() {
    # urlencode <string>

    local LANG=C
    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            *) printf '%%%02X' "'$c" ;; 
        esac
    done
}
@mkllnk

This comment has been minimized.

Copy link

mkllnk commented Dec 19, 2015

On my computer the character ä is not encoded.

case ä in [a-z]) echo yes ;; esac
# yes
@sunnybear

This comment has been minimized.

Copy link

sunnybear commented Jan 5, 2016

Thank you. This one is 4-5x faster

@cdown

This comment has been minimized.

Copy link
Owner Author

cdown commented Feb 15, 2016

@morgan-greywolf Using stdin vs. a parameter has no relation to command injection -- if you are having problems with command injection, then they are outside of these functions in the first place.

@bigretromike

This comment has been minimized.

Copy link

bigretromike commented Aug 26, 2016

@thedward thans for code

@lifeofguenter

This comment has been minimized.

Copy link

lifeofguenter commented Sep 15, 2016

https://gist.github.com/cdown/1163649#file-gistfile1-sh-L4

Instead of setting LC_COLLATE and replacing it later with the "old value" (this method can/will cause race conditions) you can also just set it as "local" variable, it will not overwrite the main shell:

local LC_COLLATE=C
@DarbyCrash

This comment has been minimized.

Copy link

DarbyCrash commented Nov 4, 2016

This is my edited version, a bit more fast and clean:

function urlencode() {
	local LANG=C
	for ((i=0;i<${#1};i++)); do
		if [[ ${1:$i:1} =~ ^[a-zA-Z0-9\.\~\_\-]$ ]]; then
			printf "${1:$i:1}"
		else
			printf '%%%02X' "'${1:$i:1}"
		fi
	done
}
@cdown

This comment has been minimized.

Copy link
Owner Author

cdown commented Nov 7, 2016

@lifeofguenter local is not POSIX.

@aljex

This comment has been minimized.

Copy link

aljex commented Jul 24, 2017

As long as we're employing bash-isms...
features of this version:

  • don't make it parse a fancy expansion ${1:$i:1} 3 times per character, collect it into $c once
  • set $c to default value every time, then test only overwrites it sometimes
  • use bash-specific printf -v to write to a variable
  • assemble an output variable rather than print each character as it's encountered
    This do 2 things:
    1 - only one print command at the end instead of one per character
    2 - code is more easily re-used or modified for other uses besides direct output. For instance to fill other variables without having to fork and collect the output of a function. Instead you can either copy the meat of the function right into your loop and replace final echo $e with arr[n]=$e etc, and/or you can have the function write to a global variable (and not echo anything), ie: in function make e not local, then you can have 50 urlencode ${original[n]} ;encoded[n]=$e and everything would doing direct memory operations foo=bar, no forking or collecting stdout output.
urlencode() {
	local LANG=C i c e=''
	for ((i=0;i<${#1};i++)); do
                c=${1:$i:1}
		[[ "$c" =~ [a-zA-Z0-9\.\~\_\-] ]] || printf -v c '%%%02X' "'$c"
                e+="$c"
	done
        echo "$e"
}
@linuxmail

This comment has been minimized.

Copy link

linuxmail commented Dec 27, 2017

@aljex

can you put your code under a proper license? :-) : Icinga/icinga2#5540

@daniele-pecora

This comment has been minimized.

Copy link

daniele-pecora commented Jan 20, 2018

Didn't get any of all versions to work when spaces are contained in string.

This is what I go with and works for me

function rawurlencode {
  local string="${1}"
  local strlen=${#string}
  local encoded=""
  local pos c o

  for (( pos=0 ; pos<strlen ; pos++ )); do
     c=${string:${pos}:1}
     case "$c" in
        [-_.~a-zA-Z0-9] ) o="${c}" ;;
        * )               printf -v o '%%%02x' "'$c"
     esac
     encoded+="${o}"
  done
  echo "${encoded}"    # You can either set a return variable (FASTER)
  REPLY="${encoded}"   #+or echo the result (EASIER)... or both... :p
}
@madicd

This comment has been minimized.

Copy link

madicd commented Jan 20, 2018

Version @aljex posted on Jul 24, 2017 works great for me even with spaces contained in string.
Thanks!

@aljex

This comment has been minimized.

Copy link

aljex commented Jan 26, 2018

My code was only a massaging and optimization of someone else's code, but since I have been asked, I'll answer that as far as I'm concerned, the license for this little bash function is MIT. brian@aljex.com

@rbukovansky

This comment has been minimized.

Copy link

rbukovansky commented Apr 12, 2018

@aljex Thank you. That code is lifesaver! 👍 👏

@deajan

This comment has been minimized.

Copy link

deajan commented Jun 5, 2018

Just my two cents with Busybox, the for loop won't work.
Had to replace it with for i in $(seq 0 $((length-1))); do

@xoyabc

This comment has been minimized.

Copy link

xoyabc commented Jul 1, 2018

This works for me.

https://stackoverflow.com/questions/296536/how-to-urlencode-data-for-curl-command

rawurlencode() {
  local string="${1}"
  local strlen=${#string}
  local encoded=""
  local pos c o

  for (( pos=0 ; pos<strlen ; pos++ )); do
     c=${string:$pos:1}
     case "$c" in
        [-_.~a-zA-Z0-9] ) o="${c}" ;;
        * )               printf -v o '%%%02x' "'$c"
     esac
     encoded+="${o}"
  done
  echo "${encoded}"    # You can either set a return variable (FASTER) 
  REPLY="${encoded}"   #+or echo the result (EASIER)... or both... :p
}
@renyuntao

This comment has been minimized.

Copy link

renyuntao commented Jul 25, 2018

What does the ' mean in "'${_print_offset}"?

@egoarka

This comment has been minimized.

Copy link

egoarka commented Nov 7, 2018

if you have node js installed:

alias urlencode='node --eval "console.log(encodeURIComponent(process.argv[1]))"'
@callicoder

This comment has been minimized.

Copy link

callicoder commented Dec 10, 2018

Thanks for the gist. It was helpful. You can also encode URLs online using URLEncoder.io

@westfly

This comment has been minimized.

Copy link

westfly commented Dec 26, 2018

have a little problem with Chinese Character.

this solution below using curl command to encode url can work with Chinese Character.

function urlencode() {
if [[ $# != 1 ]]; then
echo "Usage: $0 string-to-urlencode"
return 1
fi
local data="$(curl -s -o /dev/null -w %{url_effective} --get --data-urlencode "$1" "")"
if [[ $? == 0 ]]; then
echo "${data##/?}"
fi
return 0
}
https://gist.github.com/westfly/ed7e25ee4353751d94132f92837a7074

@keith-bennett-gbg

This comment has been minimized.

Copy link

keith-bennett-gbg commented Feb 13, 2019

Looks good. Shellcheck complains about the printf printing the character directly. Easily solvable to be shellcheck-clean:

<             [a-zA-Z0-9.~_-]) printf "%s" "$c" ;;
---
>             [a-zA-Z0-9.~_-]) printf "$c" ;;
@benyaminl

This comment has been minimized.

Copy link

benyaminl commented Jun 2, 2019

Great script! Thanks!

@vyachkonovalov

This comment has been minimized.

Copy link

vyachkonovalov commented Oct 3, 2019

For fish users

function urlencode
  set str (string join ' ' $argv)

  for c in (string split '' $str)
    if string match -qr '[a-zA-Z0-9.~_-]' $c
      env LC_COLLATE=C printf "$c"
    else
      env LC_COLLATE=C printf '%%%02X' "'$c"
    end
  end
end
@paxsali

This comment has been minimized.

Copy link

paxsali commented Oct 9, 2019

urlencode() {
    # urlencode <string>
    old_lc_collate=$LC_COLLATE
    LC_COLLATE=C
    
    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            ' ') printf "%%20" ;;
            *) printf '%%%02X' "'$c" ;;
        esac
    done
    
    LC_COLLATE=$old_lc_collate
}
@d3vopz-net

This comment has been minimized.

Copy link

d3vopz-net commented Nov 3, 2019

paxsalis version works with bash like charm, but not with bourne (

That snippet worked with bourne

urlencode() {
    # urlencode <string>
    old_lc_collate=$LC_COLLATE
    LC_COLLATE=C
    local i=1
    local length="${#1}"
    while [ $i -le $length ]
    do
        local c=$(echo "$(expr substr $1 $i 1)")
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            ' ') printf "%%20" ;;
            *) printf '%%%02X' "'$c" ;;
        esac
        i=`expr $i + 1`
    done

    LC_COLLATE=$old_lc_collate
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.