Last active
November 23, 2016 16:00
-
-
Save sharonhe/2360692ec8d9d0404ee89565d2bb28fa to your computer and use it in GitHub Desktop.
Trying to figure out text encoding
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
What's the difference between the following two lines? The first one is copied from a webpage and the second one is typed out. | |
None of my text editors understand the first one...I want some way to go from the first line to the second, without having to type it out again. | |
𝙶𝙶𝙲𝙶𝙲 | |
GGCGC | |
And you could something like the following to convert whole files
iconv -f utf-8 -t ascii//translit originalfile > newfile
Where original file is the filename with the weird chars and newfile is a new file to which you would like to output.
Thanks a lot, this works! (And yeah, GitHub wont let me type those characters in a comment, but somehow they work in the gist..
$ echo "GGCGC" | iconv -f utf-8 -t ascii//translit
GGCGC
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I hope you have a bash shell and the file utility
The second one regular asciii
The first one is just UTF-8. But those are not standard letters. The G for example can be found here http://unicode-table.com/en/1D676/
So you have a bunch of chars looking like other chars, what now...
So you should transliterate ( with bash ) like this:
$ echo "GGCGC" | iconv -f utf-8 -t ascii//translit
Meaning that you convert all the characters that look like a G to a real ascii G. ( allmost sounds like a rap lyric )
side note: I already converted the chars in this comment, because github wouldn't let me write comment with utf chars higher then FFFF in it ( or so it says )