Skip to content

Instantly share code, notes, and snippets.

@cmungall
Created October 23, 2018 00:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cmungall/213444c1becc0d9caad5416272992236 to your computer and use it in GitHub Desktop.
Save cmungall/213444c1becc0d9caad5416272992236 to your computer and use it in GitHub Desktop.
Example of bash datalog on GO
rel(S,P,O) :~ cat relationships.tsv
anc(S,O) :- rel(S,_,O).
anc(S,O) :- anc(S,Z),anc(Z,O).
main(X) :- anc(X, "GO:0005634").
@cmungall
Copy link
Author

cmungall commented Oct 23, 2018

The input is a TSV of S-P-O edges.

e.g.

S P O
GO:0000001 subclass GO:0048308
GO:0000001 subclass GO:0048311
GO:0000002 subclass GO:0007005
GO:0000003 subclass GO:0008150
GO:0000006 subclass GO:0005385
GO:0000007 subclass GO:0005385
GO:0000009 subclass GO:0000030
GO:0000010 subclass GO:0016765
GO:0000011 subclass GO:0007033
GO:0000011 subclass GO:0048308
...

compiling to bash yields:

#!/bin/bash
###############################################################
# This script was generated by bashlog
# For more information, visit thomasrebele.org/projects/bashlog
###############################################################

export LC_ALL=C
mkdir -p tmp
rm -f tmp/*
if type mawk > /dev/null; then awk="mawk"; else awk="awk"; fi
sort="sort "
check() { grep -- $1 <(sort --help) > /dev/null; }
check "--buffer-size" && sort="$sort --buffer-size=34% "
check "--parallel"    && sort="$sort --parallel=2 "

read_ntriples() { $awk -F" " '{ sub(" ", "\t"); sub(" ", "\t"); sub(/ \.$/, ""); print $0 }' "$@"; }
conv_ntriples() { $awk -F$'\t' '{ print $1 " " $2 " " $3 " ." }'; }


$sort -t $'\t' -k 1 -u \
<($awk -v FS=$'\t' ' 
      BEGIN { 
       out0_cond1["GO:0005634"] = "1"; 
      }
    
     (($2) in out0_cond1){ print $1 } 
      ' \
    <($sort -t $'\t' -k 1 -k 2 -u \
            <($awk -v FS=$'\t' '  { print $1 FS $3} 
                  ' relationships.tsv) \
             | tee tmp/full0 > tmp/delta0
        while 
        
        $sort -t $'\t' -k 1 -k 2 -u \
            <(join -t $'\t' -1 2 -2 1 -o 1.1,2.2 \
                <($sort -t $'\t' -k 2 tmp/delta0) \
                <($sort -t $'\t' -k 1 tmp/full0)) \
             | comm -23 - tmp/full0 > tmp/new0;
        
        mv tmp/new0 tmp/delta0 ; 
        $sort -u --merge -o tmp/full0 tmp/full0 tmp/delta0 ; 
        [ -s tmp/delta0 ]; 
        do continue; done
        
        rm tmp/delta0
        cat tmp/full0))

 rm -f tmp/*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment