This GIST offers test sets for phylogenetic networks approaches. All data is given in different formats. The following formats are distinguished:
- tree-representation of the underlying taxa using the Newick format (nwk-file)
- csv-representation of the presence-absence patterns of the data (csv-file)
- nexus-representation of the presence-absence matrix of the data (nex-file)
- wordlist representation of the data which is important for additional linguistic analyses (qlc-format)
At the moment, only one testset is offered in these formats. This testset was the bases of our network analysis of 40 Indo-European languages (see https://gist.github.com/LinguList/7475830). Here, it is offered in the formats specified above. In this dataset, known borrowings have been deliberately reintroduced into the data, in order to see