Skip to content

Instantly share code, notes, and snippets.

@aoirint
Created March 19, 2023 08:18
Show Gist options
  • Save aoirint/2ef1cc895321a3a784dd8e01878a33ed to your computer and use it in GitHub Desktop.
Save aoirint/2ef1cc895321a3a784dd8e01878a33ed to your computer and use it in GitHub Desktop.
def decode_utf8_bytestream(byte_stream):
i = 0
decoded_strings = []
decoded_string = b''
while i < len(byte_stream):
byte = byte_stream[i]
if byte == 0x00:
decoded_strings.append(decoded_string)
decoded_string = b''
i += 1
elif byte <= 0x7F:
decoded_string += byte_stream[i:i+1]
i += 1
elif byte <= 0xDF:
decoded_string += byte_stream[i:i+2]
i += 2
elif byte <= 0xEF:
decoded_string += byte_stream[i:i+3]
i += 3
else:
decoded_string += byte_stream[i:i+4]
i += 4
if len(decoded_string) != 0:
decoded_strings.append(decoded_string)
return [s.decode('utf-8') for s in decoded_strings]
byte_str = b'\xe3\x81\x82\xe3\x81\x84\xe3\x81\x86\x00\xe3\x81\x8a\xe3\x81\x86\xe3\x81\x8c\xe3\x81\x84\x00'
decoded_strings = decode_utf8_bytestream(byte_str)
print(decoded_strings[0]) # 'あいう'
print(decoded_strings[1]) # 'おうがい'
@aoirint
Copy link
Author

aoirint commented Mar 19, 2023

# あいう😀おう😁がい
byte_str = b'\xe3\x81\x82\xe3\x81\x84\xe3\x81\x86\x00\xf0\x9f\x98\x80\xe3\x81\x8a\xe3\x81\x86\xf0\x9f\x98\x81\xe3\x81\x8c\xe3\x81\x84\x00'
decoded_strings = decode_utf8_bytestream(byte_str)
print(decoded_strings[0]) # 'あいう😀'
print(decoded_strings[1]) # 'おう😁がい'

実際には、

あいう
😀おう😁がい

が出力されるので、まだ動いていなさそう

@aoirint
Copy link
Author

aoirint commented Mar 19, 2023

https://twitter.com/aoirint/status/1637375030031179779

のことがわかったので、このプログラムはゴミです

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment