Skip to content

Instantly share code, notes, and snippets.

@pashri
Created January 7, 2021 23:16
Show Gist options
  • Save pashri/44ec811ba4f38925f1457e448bf1ff5e to your computer and use it in GitHub Desktop.
Save pashri/44ec811ba4f38925f1457e448bf1ff5e to your computer and use it in GitHub Desktop.
UTF-8 And Latin-1 character encoding errors.

Sometimes you've got some incorrectly-encoded text that you want to fix. If you encode it to bytes and decode it back to a string, you can select the source and target encodings and fix the encoding in Python.

'Microsoft® Windows™ 10 is Patrick’s favourite OS'.encode('utf-8').decode('latin-1')
'Microsoft® Windowsâ\x84¢ 10 is Patrickâ\x80\x99s favourite OS'.encode('latin-1').decode('utf-8')

What if you encounter this in a SQL statement? You can choose to collate a column into a different encoding.

SELECT
    name COLLATE latin1
FROM
    people
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment