Skip to content

Instantly share code, notes, and snippets.

@LinguList
LinguList / Networks_of_Lexical_Borrowing.md
Last active December 28, 2015 08:59
MLN reconstruction for Indo-European languages.

Source Code and Data for the Paper: "Networks of lexical borrowing and lateral gene transfer in language and genome evolution"

Usage

Usage is straightforward: Having downloaded all scripts (just clone this gist), cd into the folder and type:

@LinguList
LinguList / ChineseDialectHistory.md
Last active December 28, 2015 09:39
Python code to accompany the paper "Using Phylogenetic Networks to Model Chinese Dialect History".

Source code for the paper "Using Phylogenetic Networks to Model Chinese Dialect History"

@LinguList
LinguList / SCACognateDetection.md
Last active December 29, 2015 04:09
SCA Cognate Detection

SCA Cognate Detection Applied to ASJP Data

Carry out cognate detection analyses on ASJP-data (http://email.eva.mpg.de/~wichmann/ASJPHomePage.htm). By defining a language family or a genus, and specifying the parameters for the respective methods, this Python script carries out an automatic cognate detection analysis and outputs the data in aligned HTML format. For an overview on the three different cognate detection analyses, see the paper by List (2012, PDF version can be downloaded from: http://aclweb.org/anthology-new/W/W12/#0200).

@LinguList
LinguList / Sample_Size.md
Last active January 2, 2016 02:18
Supplementary Material for the Paper "Investigating the Impact of Sample Size on Cognate Detection"

Supplementary Material for the Paper "Investigating the Impact of Sample Size on Cognate Detection"

Format

The data-format is basically simple CSV format. Additional markup allows to add key-value descriptions of the dataset and to comment out parts of the data by using the hash character in the beginning of a line. A closer description of the input format (which can be regularly parsed with help of LingPy (http://www.lingpy.org), is given on http://lingpy.org/tutorial/lingpy.basic.wordlist.html.

Information

This dataset is part of the larger "Benchmark Database for Cognate Detection", currently hosted at http://quanthistling.info/bdhl/cognates.php.

@LinguList
LinguList / README.md
Last active August 29, 2015 14:02
PhylogeneticNetworkApproaches

Test Sets for Phylogenetic Network Approaches in Historical Linguistics

This GIST offers test sets for phylogenetic networks approaches. All data is given in different formats. The following formats are distinguished:

  • tree-representation of the underlying taxa using the Newick format (nwk-file)
  • csv-representation of the presence-absence patterns of the data (csv-file)
  • nexus-representation of the presence-absence matrix of the data (nex-file)
  • wordlist representation of the data which is important for additional linguistic analyses (qlc-format)

At the moment, only one testset is offered in these formats. This testset was the bases of our network analysis of 40 Indo-European languages (see https://gist.github.com/LinguList/7475830). Here, it is offered in the formats specified above. In this dataset, known borrowings have been deliberately reintroduced into the data, in order to see

@LinguList
LinguList / README.md
Created June 28, 2016 14:11
Vowel Purity and Rhyme Evidence in Old Chinese Reconstruction

Vowel Purity and Rhyme Evidence in Old Chinese Reconstruction

Data

Data contains the rhyme network (in YAML-format), the different character readings (missing characters indicated by a "?", and the vowel annotations in JSON.

Code

To run the code, make sure you have Python3 installed, as well as a recent version of NetworkX and the community-extension for NetworkX.

@LinguList
LinguList / README.md
Last active July 16, 2018 15:08
Exporting Sublists form a Wordlist with LingPy and Concepticon

Exporting Sublists form a Wordlist with LingPy and Concepticon

This gist describes, how you can extract sublists from a wordlist in LingPy with help of the pyconcepticon API. See https://calc.hypotheses.org/date/2018/07 for details on the code and additional explanations.

@LinguList
LinguList / README.md
Created November 6, 2018 10:50
Inferring consonant clusters from CLICS data with LingPy: Data and Code

Inferring consonant clusters from CLICS data with LingPy: Data and Code

This GIST accompanies the blogpost explaining the code, which you can finde here.

To install and run the code, run the following in your terminal:

$ pip install -r pip-requirements.txt
$ git clone https://github.com/clld/concepticon-data.git
$ cd concepticon-data
@LinguList
LinguList / README.md
Created December 11, 2018 12:22
Merging datasets with LingPy and the CLDF curation framework
@LinguList
LinguList / README.md
Created February 24, 2019 21:02
Automatic morpheme segmentation (Open problems in computational diversity linguistics 1)

Automatic morpheme segmentation (Open problems in computational diversity linguistics 1)

This little repository contains the analyses I have done to test the Morfessor software on sparse data. It should be mentioned that I just used the defaults for the computation, so it is quite possible, that the results could be further enhanced.

Requirements

To install Morfessor, just type:

$ pip install morfessor