Skip to content

Instantly share code, notes, and snippets.

@bortzmeyer
Created December 16, 2015 17:00
Show Gist options
  • Save bortzmeyer/705384b215034162da92 to your computer and use it in GitHub Desktop.
Save bortzmeyer/705384b215034162da92 to your computer and use it in GitHub Desktop.
The longest TLD (including Unicode ones)
TRAVELERSINSURANCE 18
VERMöGENSBERATUNG 17
VERMöGENSBERATER 16
SANDVIKCOROMANT 15
CANCERRESEARCH 14
SPREADBETTING 13
INTERNATIONAL 13
VERSICHERUNG 12
SCHOLARSHIPS 12
CONSTRUCTION 12
சிங்கப்பூர் 11
WILLIAMHILL 11
REDUMBRELLA 11
PRODUCTIONS 11
PLAYSTATION 11
PHOTOGRAPHY 11
MOTORCYCLES 11
LAMBORGHINI 11
INVESTMENTS 11
ENTERPRISES 11
ENGINEERING 11
CREDITUNION 11
CONTRACTORS 11
BRIDGESTONE 11
BLACKFRIDAY 11
BARCLAYCARD 11
ACCOUNTANTS 11
VLAANDEREN 10
VISTAPRINT 10
UNIVERSITY 10
TELEFONICA 10
TECHNOLOGY 10
TATAMOTORS 10
RESTAURANT 10
REPUBLICAN 10
PROTECTION 10
PROPERTIES 10
MANAGEMENT 10
INDUSTRIES 10
IMMOBILIEN 10
HEALTHCARE 10
FOUNDATION 10
EUROVISION 10
CUISINELLA 10
CREDITCARD 10
CONSULTING 10
BOEHRINGER 10
BNPPARIBAS 10
ASSOCIATES 10
APARTMENTS 10
ACCOUNTANT 10
@bortzmeyer
Copy link
Author

The code:

#!/bin/sh

# Things to keep in mind:
#    * wc -c counts bytes, not characters
#    * idn produces text with a end-of-line, we transfer it to a variable to strip it.

for tld in [A-Z]*; do
    s=$(echo -n $tld | sed 's/^XN--//')
    if [ "$s" != "$tld" ]; then
    # Unicode TLD
    tld=$(echo -n $s | idn -d)
    fi
    echo $tld $(echo -n $tld | wc -m)
done | sort  -k 2 -n -r 

@Barmy
Copy link

Barmy commented Dec 16, 2015

Même s'il n'est pas le premier en nombre de caractères unicode, notre ami indien a tout de même un nombre d'octets impressionnant

echo "சிங்கப்பூர்" | xxd
0000000: e0ae 9ae0 aebf e0ae 99e0 af8d e0ae 95e0 ................
0000010: aeaa e0af 8de0 aeaa e0af 82e0 aeb0 e0af ................
0000020: 8d0a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment