-
-
Save cameronmaske/f520903ade824e4c30ab to your computer and use it in GitHub Desktop.
""" | |
base64's `urlsafe_b64encode` uses '=' as padding. | |
These are not URL safe when used in URL paramaters. | |
Functions below work around this to strip/add back in padding. | |
See: | |
https://docs.python.org/2/library/base64.html | |
https://mail.python.org/pipermail/python-bugs-list/2007-February/037195.html | |
""" | |
import base64 | |
def base64_encode(string): | |
""" | |
Removes any `=` used as padding from the encoded string. | |
""" | |
encoded = base64.urlsafe_b64encode(string) | |
return encoded.rstrip("=") | |
def base64_decode(string): | |
""" | |
Adds back in the required padding before decoding. | |
""" | |
padding = 4 - (len(string) % 4) | |
string = string + ("=" * padding) | |
return base64.urlsafe_b64decode(string) |
>>> test = "helloworld" | |
>>> encode_base64(test) | |
'aGVsbG93b3JsZA' | |
>>> e = encode_base64(test) | |
>>> decode_base64(e) | |
'helloworld' | |
>>> test = "Hello World" | |
>>> encoded = encode_base64(test) | |
>>> print encoded | |
SGVsbG8gV29ybGQ | |
>>> decoded = decode_base64(encoded) | |
>>> decoded | |
'Hello World' | |
>>> decoded == test | |
True |
Thanks a lot!
This suggestion save me!
padding cannot be recovered if strings are concatenated
>>> decode_base64(encode_base64(b'a') + encode_base64(b'aa'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in b64d
File "/usr/lib/python3.8/base64.py", line 133, in urlsafe_b64decode
return b64decode(s)
File "/usr/lib/python3.8/base64.py", line 87, in b64decode
return binascii.a2b_base64(s)
binascii.Error: Invalid base64-encoded string: number of data characters (5) cannot be 1 more than a multiple of 4
Thank you for this. I was implementing into a project and tweaked it for my setup and have this now:
# modified for 3.8.12
import base64
import random
import string
def make_random_string() -> str:
return "".join(random.choice(string.ascii_lowercase) for x in range(20))
def base64_encode(value: str) -> str:
encoded = base64.urlsafe_b64encode(str.encode(value))
result = encoded.rstrip(b"=")
return result.decode()
def base64_decode(value: str) -> str:
padding = 4 - (len(value) % 4)
value = value + ("=" * padding)
result = base64.urlsafe_b64decode(value)
return result.decode()
def encode_decode_test() -> dict:
original_str = make_random_string()
encoded_str = base64_encode(original_str)
decoded_str = base64_decode(encoded_str)
assert original_str == decoded_str
return {
"original_str": original_str,
"encoded_str": encoded_str,
"decoded_str": decoded_str,
}
result = encode_decode_test()
print(result)
while this is a nice workaround, I don't believe it's the "correct" way to do it anymore. Padding isn't the only thing that can break base64 encoding as it may use 0 - 9 , a - z , A - Z , + , and /. + is a special URL char (meaning a space) and / of course is another special URL char. So you'd have to take care of those yourself as well via string replacement. I don't recommend this route as at some point the standards of something may change. Better to let the libraries handle all of this stuff for you. There are 3 things that might help and be safer.
(1) Instead of stripping the padding you can use the following functions:
base64.urlsafe_b64encode(s)
Encode [bytes-like object](https://docs.python.org/3/glossary.html#term-bytes-like-object) s using the URL- and filesystem-safe alphabet, which substitutes - instead of + and _ instead of / in the standard Base64 alphabet, and return the encoded [bytes](https://docs.python.org/3/library/stdtypes.html#bytes). The result can still contain =.
base64.urlsafe_b64decode(s)
Decode [bytes-like object](https://docs.python.org/3/glossary.html#term-bytes-like-object) or ASCII string s using the URL- and filesystem-safe alphabet, which substitutes - instead of + and _ instead of / in the standard Base64 alphabet, and return the decoded [bytes](https://docs.python.org/3/library/stdtypes.html#bytes).
(2) These functions aren't really necessary. The correct way to do this would be to not build your URL using raw base64 encoded stuff and string concatenation. instead something like:
requests.get(http://example.com, params = {"base64_encoded_param" : base_64_param})
Requests, along with any other decent URL library will then percent encode these for the URL for you. No need to keep track of what needs to be percent-encoded etc.
(3) Use base58 encoding. The allowed charset is A-Z and the digits 1-9. Base58 excludes zero, uppercase 'O', uppercase 'I', and lowercase 'l'. In other words, no padding, no funny chars. Just A-Z and 1-9. It requires an additional library as I don't think there's a standard lib module yet, but I usually find it well worth it to just deal with ascii-range and no special char stuff.
Here are 2 one-liners for encoding and decoding:
(lambda string: urlsafe_b64encode(string).strip(b"="))(b"this will be converted into base64!")
(lambda string: urlsafe_b64decode((string+(b"="*(4-(len(string)%4))))))(b"this will be converted back into base64!")
Examples:
Encoding:
>>> (lambda string: urlsafe_b64encode(string).strip(b"="))(b"this will be converted into base64!")
b'dGhpcyB3aWxsIGJlIGNvbnZlcnRlZCBpbnRvIGJhc2U2NCE'
Decoding
>>> (lambda string: urlsafe_b64decode((string+(b"="*(4-(len(string)%4))))))(b'dGhpcyB3aWxsIGJlIGNvbnZlcnRlZCBpbnRvIGJhc2U2NCE')
b'this will be converted into base64!'
Note
It does require base64
's urlsafe_b64encode
and urlsafe_b64decode
to be imported like this
from base64 import urlsafe_b64encode, urlsafe_b64decode
This can be easily changed, as all you need to do is change the calls to the functions (right after (lambda string:
)
base64.urlsafe_b64encode
will return a byte object, and somehow yourrstrip()
will need to pass an byte-like object as well.encoded.rstrip(b"=")
works for me.