Johann-Mattis List LinguList

## README.md

      
              5 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                LinguList
                / README.md
            
            
              Last active
              August 29, 2015 14:02
            
              
                PhylogeneticNetworkApproaches
              
          
    Test Sets for Phylogenetic Network Approaches in Historical Linguistics

This GIST offers test sets for phylogenetic networks approaches. All data is given in different formats. The following formats are distinguished:

tree-representation of the underlying taxa using the Newick format (nwk-file)
csv-representation of the presence-absence patterns of the data (csv-file)
nexus-representation of the presence-absence matrix of the data (nex-file)
wordlist representation of the data which is important for additional linguistic analyses (qlc-format)

At the moment, only one testset is offered in these formats. This testset was the bases of our network analysis of 40 Indo-European languages (see https://gist.github.com/LinguList/7475830). Here, it is offered in the formats specified above. In this dataset, known borrowings have been deliberately reintroduced into the data, in order to see

  
## Networks_of_Lexical_Borrowing.md

      
              4 files
            
          
              0 forks
            
          
              1 comment
            
          
              0 stars
            
          
                LinguList
                / Networks_of_Lexical_Borrowing.md
            
            
              Last active
              December 28, 2015 08:59
            
              
                MLN reconstruction for Indo-European languages.
              
          
    Source Code and Data for the Paper: "Networks of lexical borrowing and lateral gene transfer in language and genome evolution"


Author: Johann-Mattis List mattis.list@uni-marburg.de
Date: 2013-11-17

Usage

Usage is straightforward: Having downloaded all scripts (just clone this gist), cd into the folder and type:

  
## ChineseDialectHistory.md

      
              7 files
            
          
              0 forks
            
          
              1 comment
            
          
              0 stars
            
          
                LinguList
                / ChineseDialectHistory.md
            
            
              Last active
              December 28, 2015 09:39
            
              
                Python code to accompany the paper "Using Phylogenetic Networks to Model Chinese Dialect History". 
              
          
    Source code for the paper "Using Phylogenetic Networks to Model Chinese Dialect History"


## SCACognateDetection.md

      
              3 files
            
          
              0 forks
            
          
              1 comment
            
          
              0 stars
            
          
                LinguList
                / SCACognateDetection.md
            
            
              Last active
              December 29, 2015 04:09
            
              
                SCA Cognate Detection
              
          
    SCA Cognate Detection Applied to ASJP Data

Carry out cognate detection analyses on ASJP-data
(http://email.eva.mpg.de/~wichmann/ASJPHomePage.htm).  By defining a language
family or a genus, and specifying the parameters for the respective methods,
this Python script carries out an automatic cognate detection analysis and
outputs the data in aligned HTML format. For an overview on the three different
cognate detection analyses, see the paper by List (2012, PDF version can be
downloaded from: http://aclweb.org/anthology-new/W/W12/#0200).

  
## Sample_Size.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                LinguList
                / Sample_Size.md
            
            
              Last active
              January 2, 2016 02:18
            
              
                Supplementary Material for the Paper "Investigating the Impact of Sample Size on Cognate Detection"
              
          
    Supplementary Material for the Paper "Investigating the Impact of Sample Size on Cognate Detection"

Format

The data-format is basically simple CSV format. Additional markup allows to add key-value descriptions of the dataset and to comment out parts of the data by using the hash character in the beginning of a line. A closer description of the input format (which can be regularly parsed with help of LingPy (http://www.lingpy.org), is given on http://lingpy.org/tutorial/lingpy.basic.wordlist.html.
Information

This dataset is part of the larger "Benchmark Database for Cognate Detection", currently hosted at http://quanthistling.info/bdhl/cognates.php.

  
## README.md

      
              5 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                LinguList
                / README.md
            
            
              Created
              June 28, 2016 14:11
            
              
                Vowel Purity and Rhyme Evidence in Old Chinese Reconstruction
              
          
    Vowel Purity and Rhyme Evidence in Old Chinese Reconstruction

Data

Data contains the rhyme network (in YAML-format), the different character readings (missing characters indicated by a "?", and the vowel annotations in JSON.
Code

To run the code, make sure you have Python3 installed, as well as a recent version of NetworkX and the community-extension for NetworkX.

  
## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                LinguList
                / README.md
            
            
              Last active
              July 16, 2018 15:08
            
              
                Exporting Sublists form a Wordlist with LingPy and Concepticon
              
          
    Exporting Sublists form a Wordlist with LingPy and Concepticon

This gist describes, how you can extract sublists from a wordlist in LingPy with help of the pyconcepticon API. See https://calc.hypotheses.org/date/2018/07 for details on the code and additional explanations.

  
## README.md

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                LinguList
                / README.md
            
            
              Created
              November 6, 2018 10:50
            
              
                Inferring consonant clusters from CLICS data with LingPy: Data and Code
              
          
    Inferring consonant clusters from CLICS data with LingPy: Data and Code

This GIST accompanies the blogpost explaining the code, which you can finde here.
To install and run the code, run the following in your terminal:
$ pip install -r pip-requirements.txt
$ git clone https://github.com/clld/concepticon-data.git
$ cd concepticon-data


## README.md

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                LinguList
                / README.md
            
            
              Created
              December 11, 2018 12:22
            
              
                Merging datasets with LingPy and the CLDF curation framework
              
          
    Merging datasets with LingPy and the CLDF curation framework

This gist provides the code in one file, underlying the blog post Merging datasets with LingPy and the CLDF curation framework.

  
## README.md

      
              4 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                LinguList
                / README.md
            
            
              Created
              February 24, 2019 21:02
            
              
                Automatic morpheme segmentation (Open problems in computational diversity linguistics 1)
              
          
    Automatic morpheme segmentation (Open problems in computational diversity linguistics 1)

This little repository contains the analyses I have done to test the Morfessor software on sparse data. It should be mentioned that I just used the defaults for the computation, so it is quite possible, that the results could be further enhanced.
Requirements

To install Morfessor, just type:
$ pip install morfessor