Skip to content

Instantly share code, notes, and snippets.

@szepeviktor
Last active October 31, 2023 13:20
Show Gist options
  • Save szepeviktor/2cc8c80653188be049c9a93df1611eb0 to your computer and use it in GitHub Desktop.
Save szepeviktor/2cc8c80653188be049c9a93df1611eb0 to your computer and use it in GitHub Desktop.
Sort characters by their UNICODE codepoint
#!/bin/bash
# File with a string of UTF-8 characters
FILE="$1"
paste <(cat "${FILE}" | iconv -f UTF-8 -t UNICODE | hexdump -s 2 -e '1/2 "U+%04X\n"') <(grep -o '.' "${FILE}") \
| sort
@szepeviktor
Copy link
Author

szepeviktor commented Oct 31, 2023

With ÁÉÍÓÖŐÚÜŰáéíóöőúüű being in the file the output is

U+00C1  Á
U+00C9  É
U+00CD  Í
U+00D3  Ó
U+00D6  Ö
U+00DA  Ú
U+00DC  Ü
U+00E1  á
U+00E9  é
U+00ED  í
U+00F3  ó
U+00F6  ö
U+00FA  ú
U+00FC  ü
U+0150  Ő
U+0151  ő
U+0170  Ű
U+0171  ű

https://www.compart.com/en/unicode/U+00C1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment