Skip to content

Instantly share code, notes, and snippets.

@ansrivas
Last active March 17, 2022 21:09
Show Gist options
  • Save ansrivas/c4a9a045e26ba363c5b229a77f657df4 to your computer and use it in GitHub Desktop.
Save ansrivas/c4a9a045e26ba363c5b229a77f657df4 to your computer and use it in GitHub Desktop.
Convert iso-8859-1 to utf-8 in python
# convert iso-8859-1 to unicode to utf-8, where `v` is the string in `iso-8859-1` format
v.decode("iso-8859-1").encode("utf-8")

And as a note, this is also some basic rule:

If you have no way of finding out the correct encoding of the file, then try the following encodings, in this order:

utf-8
iso-8859-1 (also known as latin-1)
(This is the encoding of all census data and much other data produced by government entities.)
utf-16```
@heranmane
Copy link

heranmane commented Mar 17, 2022

)Hi I have a tweet dataset and I am wondering how to convert the encoding. The text as an example 'People saying that he should be removed from : 1) Thatâ<U+0080><U+0099>s another movie. Two wrongs donâ<U+0080><U+0099>t make a right' . I have tried encode().decode(UTF-8) and nothing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment