Skip to content

Instantly share code, notes, and snippets.



Last active Feb 23, 2021
What would you like to do?
Working with WALS Data in CLDF

How to work with WALS data in CLDF

This code example accomanies a blog post published as part of the blog "Computer-Assisted Langauge Comparison in Practice" (

In order to get started, install the WALS dataset in CLDF format with the help of pip (ideally, make sure to use a fresh virtual environment!).

$ pip install -e git+

Once this has been done, you should be able to run the script by simply typing:

$ python

For details, check

Load WALS data and convert them to a table.
from cldfbench import get_dataset
from collections import OrderedDict
import codecs
wals = get_dataset("wals").cldf_reader()
languages = {row["ID"]: row for row in wals.iter_rows("LanguageTable")}
parameters = OrderedDict({row["ID"]: row for row in wals.iter_rows("ParameterTable")})
codes = OrderedDict({row["ID"]: row for row in wals.iter_rows("CodeTable")})
parameter_list = list(parameters)
varieties = {
language["ID"]: ["" for x in parameters] for language in languages.values()
for row in wals.iter_rows("ValueTable"):
pid = parameter_list.index(row["Parameter_ID"])
varieties[row["Language_ID"]][pid] = codes[row["Code_ID"]]["Name"]
count = 0
for i, param in enumerate(parameters):
if count == 20:
if varieties["aab"][i]:
print(param, varieties["aab"][i])
count += 1
with"wals_by_language.tsv", "w", "utf-8") as f:
+ "\t"
+ "\t".join([row["Name"] for row in parameters.values()])
+ "\n"
for variety, values in varieties.items():
languages[variety]["Name"] or "",
languages[variety]["Glottocode"] or "",
languages[variety]["Family"] or "",
str(languages[variety]["Latitude"] or ""),
str(languages[variety]["Longitude"] or ""),
+ "\t"
+ "\t".join(values)
+ "\n"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment