Skip to content

Instantly share code, notes, and snippets.

@sharonhe
Last active November 23, 2016 16:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sharonhe/2360692ec8d9d0404ee89565d2bb28fa to your computer and use it in GitHub Desktop.
Save sharonhe/2360692ec8d9d0404ee89565d2bb28fa to your computer and use it in GitHub Desktop.
Trying to figure out text encoding
What's the difference between the following two lines? The first one is copied from a webpage and the second one is typed out.
None of my text editors understand the first one...I want some way to go from the first line to the second, without having to type it out again.
𝙶𝙶𝙲𝙶𝙲
GGCGC
@codeMonkeysBe
Copy link

codeMonkeysBe commented Nov 23, 2016

I hope you have a bash shell and the file utility

$ echo "GGCGC" | file -i -
/dev/stdin: text/plain; charset=utf-8

The second one regular asciii

○ → echo "GGCGC" | file -i -
/dev/stdin: text/plain; charset=us-ascii

The first one is just UTF-8. But those are not standard letters. The G for example can be found here http://unicode-table.com/en/1D676/

So you have a bunch of chars looking like other chars, what now...

So you should transliterate ( with bash ) like this:

$ echo "GGCGC" | iconv -f utf-8 -t ascii//translit
Meaning that you convert all the characters that look like a G to a real ascii G. ( allmost sounds like a rap lyric )

side note: I already converted the chars in this comment, because github wouldn't let me write comment with utf chars higher then FFFF in it ( or so it says )

@codeMonkeysBe
Copy link

And you could something like the following to convert whole files

iconv -f utf-8 -t ascii//translit originalfile > newfile

Where original file is the filename with the weird chars and newfile is a new file to which you would like to output.

@AdySan
Copy link

AdySan commented Nov 23, 2016

Thanks a lot, this works! (And yeah, GitHub wont let me type those characters in a comment, but somehow they work in the gist..

$ echo "GGCGC" | iconv -f utf-8 -t ascii//translit
GGCGC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment