Skip to content

Instantly share code, notes, and snippets.

Avatar

Johann-Mattis List LinguList

View GitHub Profile
@LinguList
LinguList / README.md
Last active Aug 29, 2015
PhylogeneticNetworkApproaches
View README.md

Test Sets for Phylogenetic Network Approaches in Historical Linguistics

This GIST offers test sets for phylogenetic networks approaches. All data is given in different formats. The following formats are distinguished:

  • tree-representation of the underlying taxa using the Newick format (nwk-file)
  • csv-representation of the presence-absence patterns of the data (csv-file)
  • nexus-representation of the presence-absence matrix of the data (nex-file)
  • wordlist representation of the data which is important for additional linguistic analyses (qlc-format)

At the moment, only one testset is offered in these formats. This testset was the bases of our network analysis of 40 Indo-European languages (see https://gist.github.com/LinguList/7475830). Here, it is offered in the formats specified above. In this dataset, known borrowings have been deliberately reintroduced into the data, in order to see

@LinguList
LinguList / Networks_of_Lexical_Borrowing.md
Last active Dec 28, 2015
MLN reconstruction for Indo-European languages.
View Networks_of_Lexical_Borrowing.md

Source Code and Data for the Paper: "Networks of lexical borrowing and lateral gene transfer in language and genome evolution"

Usage

Usage is straightforward: Having downloaded all scripts (just clone this gist), cd into the folder and type:

@LinguList
LinguList / ChineseDialectHistory.md
Last active Dec 28, 2015
Python code to accompany the paper "Using Phylogenetic Networks to Model Chinese Dialect History".
View ChineseDialectHistory.md

Source code for the paper "Using Phylogenetic Networks to Model Chinese Dialect History"

@LinguList
LinguList / SCACognateDetection.md
Last active Dec 29, 2015
SCA Cognate Detection
View SCACognateDetection.md

SCA Cognate Detection Applied to ASJP Data

Carry out cognate detection analyses on ASJP-data (http://email.eva.mpg.de/~wichmann/ASJPHomePage.htm). By defining a language family or a genus, and specifying the parameters for the respective methods, this Python script carries out an automatic cognate detection analysis and outputs the data in aligned HTML format. For an overview on the three different cognate detection analyses, see the paper by List (2012, PDF version can be downloaded from: http://aclweb.org/anthology-new/W/W12/#0200).

@LinguList
LinguList / Sample_Size.md
Last active Jan 2, 2016
Supplementary Material for the Paper "Investigating the Impact of Sample Size on Cognate Detection"
View Sample_Size.md

Supplementary Material for the Paper "Investigating the Impact of Sample Size on Cognate Detection"

Format

The data-format is basically simple CSV format. Additional markup allows to add key-value descriptions of the dataset and to comment out parts of the data by using the hash character in the beginning of a line. A closer description of the input format (which can be regularly parsed with help of LingPy (http://www.lingpy.org), is given on http://lingpy.org/tutorial/lingpy.basic.wordlist.html.

Information

This dataset is part of the larger "Benchmark Database for Cognate Detection", currently hosted at http://quanthistling.info/bdhl/cognates.php.

@LinguList
LinguList / README.md
Created Jun 28, 2016
Vowel Purity and Rhyme Evidence in Old Chinese Reconstruction
View README.md

Vowel Purity and Rhyme Evidence in Old Chinese Reconstruction

Data

Data contains the rhyme network (in YAML-format), the different character readings (missing characters indicated by a "?", and the vowel annotations in JSON.

Code

To run the code, make sure you have Python3 installed, as well as a recent version of NetworkX and the community-extension for NetworkX.

@LinguList
LinguList / README.md
Last active Jul 16, 2018
Exporting Sublists form a Wordlist with LingPy and Concepticon
View README.md

Exporting Sublists form a Wordlist with LingPy and Concepticon

This gist describes, how you can extract sublists from a wordlist in LingPy with help of the pyconcepticon API. See https://calc.hypotheses.org/date/2018/07 for details on the code and additional explanations.

@LinguList
LinguList / README.md
Created Nov 6, 2018
Inferring consonant clusters from CLICS data with LingPy: Data and Code
View README.md

Inferring consonant clusters from CLICS data with LingPy: Data and Code

This GIST accompanies the blogpost explaining the code, which you can finde here.

To install and run the code, run the following in your terminal:

$ pip install -r pip-requirements.txt
$ git clone https://github.com/clld/concepticon-data.git
$ cd concepticon-data
@LinguList
LinguList / README.md
Created Dec 11, 2018
Merging datasets with LingPy and the CLDF curation framework
View README.md
@LinguList
LinguList / README.md
Created Feb 24, 2019
Automatic morpheme segmentation (Open problems in computational diversity linguistics 1)
View README.md

Automatic morpheme segmentation (Open problems in computational diversity linguistics 1)

This little repository contains the analyses I have done to test the Morfessor software on sparse data. It should be mentioned that I just used the defaults for the computation, so it is quite possible, that the results could be further enhanced.

Requirements

To install Morfessor, just type:

$ pip install morfessor