Skip to content

Instantly share code, notes, and snippets.

@jmandel

jmandel/explanation.md

Last active Dec 20, 2015
Embed
What would you like to do?
Generating value set for C-CDA Language Codes

Picking Language Codes for MU2 Consolidated CDA

Here you'll find:

  • An English-language description of how to choose language codes for MU2 C-CDA
  • An AWK script that generates the complete list of languages in the value set
  • The value set itself, with two tab-separated columns (Code, Language)

Here's the algorithm:

  1. Fetch the complete ISO-639-2 list as a starting point.
  2. Filter it to include only entries that have a 2-letter code.
  3. For each language in step 2, choose a 3-letter code as follows: If there is a "terminologic" 3-letter code (column 2) use that. Otherwise (if column 2 is empty), choose the "bibliographlic" 3-letter code (column 1).
curl -s http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt | \
awk -F\| '{
if ($3){
printf("%s\t%s\n", $2 ? $2 : $1, $4)
}
}'
aar Afar
abk Abkhazian
afr Afrikaans
aka Akan
sqi Albanian
amh Amharic
ara Arabic
arg Aragonese
hye Armenian
asm Assamese
ava Avaric
ave Avestan
aym Aymara
aze Azerbaijani
bak Bashkir
bam Bambara
eus Basque
bel Belarusian
ben Bengali
bih Bihari languages
bis Bislama
bos Bosnian
bre Breton
bul Bulgarian
mya Burmese
cat Catalan; Valencian
cha Chamorro
che Chechen
zho Chinese
chu Church Slavic; Old Slavonic; Church Slavonic; Old Bulgarian; Old Church Slavonic
chv Chuvash
cor Cornish
cos Corsican
cre Cree
ces Czech
dan Danish
div Divehi; Dhivehi; Maldivian
nld Dutch; Flemish
dzo Dzongkha
eng English
epo Esperanto
est Estonian
ewe Ewe
fao Faroese
fij Fijian
fin Finnish
fra French
fry Western Frisian
ful Fulah
kat Georgian
deu German
gla Gaelic; Scottish Gaelic
gle Irish
glg Galician
glv Manx
ell Greek, Modern (1453-)
grn Guarani
guj Gujarati
hat Haitian; Haitian Creole
hau Hausa
heb Hebrew
her Herero
hin Hindi
hmo Hiri Motu
hrv Croatian
hun Hungarian
ibo Igbo
isl Icelandic
ido Ido
iii Sichuan Yi; Nuosu
iku Inuktitut
ile Interlingue; Occidental
ina Interlingua (International Auxiliary Language Association)
ind Indonesian
ipk Inupiaq
ita Italian
jav Javanese
jpn Japanese
kal Kalaallisut; Greenlandic
kan Kannada
kas Kashmiri
kau Kanuri
kaz Kazakh
khm Central Khmer
kik Kikuyu; Gikuyu
kin Kinyarwanda
kir Kirghiz; Kyrgyz
kom Komi
kon Kongo
kor Korean
kua Kuanyama; Kwanyama
kur Kurdish
lao Lao
lat Latin
lav Latvian
lim Limburgan; Limburger; Limburgish
lin Lingala
lit Lithuanian
ltz Luxembourgish; Letzeburgesch
lub Luba-Katanga
lug Ganda
mkd Macedonian
mah Marshallese
mal Malayalam
mri Maori
mar Marathi
msa Malay
mlg Malagasy
mlt Maltese
mon Mongolian
nau Nauru
nav Navajo; Navaho
nbl Ndebele, South; South Ndebele
nde Ndebele, North; North Ndebele
ndo Ndonga
nep Nepali
nno Norwegian Nynorsk; Nynorsk, Norwegian
nob Bokmål, Norwegian; Norwegian Bokmål
nor Norwegian
nya Chichewa; Chewa; Nyanja
oci Occitan (post 1500); Provençal
oji Ojibwa
ori Oriya
orm Oromo
oss Ossetian; Ossetic
pan Panjabi; Punjabi
fas Persian
pli Pali
pol Polish
por Portuguese
pus Pushto; Pashto
que Quechua
roh Romansh
ron Romanian; Moldavian; Moldovan
run Rundi
rus Russian
sag Sango
san Sanskrit
sin Sinhala; Sinhalese
slk Slovak
slv Slovenian
sme Northern Sami
smo Samoan
sna Shona
snd Sindhi
som Somali
sot Sotho, Southern
spa Spanish; Castilian
srd Sardinian
srp Serbian
ssw Swati
sun Sundanese
swa Swahili
swe Swedish
tah Tahitian
tam Tamil
tat Tatar
tel Telugu
tgk Tajik
tgl Tagalog
tha Thai
bod Tibetan
tir Tigrinya
ton Tonga (Tonga Islands)
tsn Tswana
tso Tsonga
tuk Turkmen
tur Turkish
twi Twi
uig Uighur; Uyghur
ukr Ukrainian
urd Urdu
uzb Uzbek
ven Venda
vie Vietnamese
vol Volapük
cym Welsh
wln Walloon
wol Wolof
xho Xhosa
yid Yiddish
yor Yoruba
zha Zhuang; Chuang
zul Zulu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.