Skip to content

Instantly share code, notes, and snippets.

@nickynicolson
Forked from PeterHovenkamp/Taxonomic-mind-mapper
Last active August 29, 2015 14:01
Show Gist options
  • Save nickynicolson/6b48b1c131abcab7e13d to your computer and use it in GitHub Desktop.
Save nickynicolson/6b48b1c131abcab7e13d to your computer and use it in GitHub Desktop.
= The taxonomic mind-mapper
Nicky Nicolson, Peter Hovenkamp
:neo4j-version: 2.1.0
:author: Nicky Nicolson
:twitter: nickynicolson
== Domain
We investigate the use of the scheme in: https://docs.google.com/document/d/1FIxNrrGrIZs0l4QJEdGfctXbYVL4cQhKspmMEHmKAGg/edit with a use case in the taxonomy of Diplazium tomentosum (https://docs.google.com/document/d/1vni44RBwGNZ7iRCFf243NtcD-7xwjHrDeAPehato7KY/edit?usp=sharing)
== Data model
=== Node types
* CollectionEvent - shown in red. Properties:
** source (e.g. "L0051073")
** collector (e.g. "Blume")
** locality (e.g. "Java")
** country (e.g. "Indonesia")
* Specimen - shown in purple. Properties:
** id (e.g. "L0051073")
** url (e.g. "http://plants.jstor.org/specimen/l0051073")
** heldIn (e.g. "L")
* Name - shown in green. Properties:
** name (e.g. "Diplazium tomentosum Blume")
** ipniid (e.g. "17089100-1")
** url (e.g. "http://biodiversitylibrary.org/page/31163025")
=== Relationship types
* Conspecific_with
** (Specimen)-[CONSPECIFIC_WITH]->(Specimen)
** assertedBy (e.g. "PH")
* Derived_from
** (Specimen)-[DERIVED_FROM]->(CollectionEvent)
** assertedBy (e.g. "PH")
* Type_of
** (Specimen)-[TYPE_OF]->(Name)
** assertedBy (e.g. "Morton C.V. in http://www.biodiversitylibrary.org/page/409837#page/305")
** typeOfType (e.g. "Lecto")
== Use case
Our case starts with one of the types from the Leiden herbarium (http://plants.jstor.org/specimen/l0051073). It is the type of Diplazium tomentosum Blume:
//setup
[source,cypher]
----
CREATE (c:CollectingEvent
{source: 'L0051073'
,collector:'Blume'
,locality:"Java"
,country:"Indonesia"}
)
,(s:Specimen
{id:"L0051073"
,url:'http://plants.jstor.org/specimen/l0051073'
,heldIn:'L'}
)
,(n:Name
{name:"Diplazium tomentosum Blume"
,ipniid: "17089100-1"
,url:"http://biodiversitylibrary.org/page/31163025"}
)
,(s)<-[:DerivedFrom]-(c)
,(c)-[:TypeOf]->(n)
RETURN s, n, c
----
//table
Now we add details of another type specimen from Berlin: http://plants.jstor.org/specimen/b%2020%200051655
This time, there are no data about the collecting event, which means that there is no collection event to which to attach it
//setup
[source,cypher]
----
CREATE (s:Specimen
{id:'B-200051655'
,heldIn:'B'}
)
RETURN s
----
//table
But that's not what they say in Berlin. They say it's a type of Diplazium tomentosum Blume.
We model that as an assertion by Berlin that it is derived from the same collection event as the Leiden specimen.
//setup
[source,cypher]
----
MATCH ({name:"Diplazium tomentosum Blume"})<-[:TypeOf]-(c)
MATCH (s:Specimen
{id:'B-200051655'}
)
CREATE (s) <-[r:DerivedFrom{assertedBy: 'Berlin'}]- (c)
RETURN s, c, r
----
//graph
The graph now shows the basic relations between the three different types of nodes.
We next add several other specimens from different herbaria, which all have been identified as Diplazium tomentosum,
starting with this one (http://plants.jstor.org/specimen/gh00022916) from the GH herbarium.
We have to introduce a new name to accomodate the type status of the specimen. We model the identification of the specimen as an AsIdentifiedBy link to a Type collecting event. When we have examined the specimen we could confirm this
which we express by adding a Conspecificity link to one of the specimens derived from the collecting event.
//setup
[source,cypher]
----
MATCH ({name:"Diplazium tomentosum Blume"})<-[:TypeOf]-(c2)-[:DerivedFrom]->(s3:Specimen{heldIn: 'L'})
CREATE (c1:CollectingEvent
{collNumber:'386'
,collector:'H. Cuming'}
)
,(s1:Specimen{id:'GH00022916'
,heldIn:'GH'
,url:'http://plants.jstor.org/specimen/gh00022916'}
)
, (n1:Name
{name:'Asplenium deflexum Mett.'
,ipniid:'17042170-1'}
)
, (s1)<-[:ConspecificWith
{asIdentifiedBy:'M.G.Price 1989'}]-(c2)
, (s1)<-[:DerivedFrom]- (c1)
, (s1) <-[:ConspecificWith
{assertedBy: 'PH'}] -(s3)
, (c1)-[:TypeOf]->(n1)
RETURN s1, s3, c2, n1, c1
----
//graph
//table
We find that a duplicate of this specimen is at the Michigan herbarium (http://plants.jstor.org/specimen/mich1190057)
and this can now easily be added:
//setup
[source,cypher]
----
MATCH (c:CollectingEvent
{collector:'H. Cuming'
,collNumber: '386'}
)
CREATE (s:Specimen
{id:'MICH1190057'
,url:'http://plants.jstor.org/specimen/mich1190057'
,heldIn:'MICH'})
, (s)<-[:DerivedFrom]-(c)
RETURN s, c
----
//graph
//table
Next stop is Brussels, where another specimen is held (http://plants.jstor.org/specimen/br0000006990008).
The annotations that come with this specimens are quite extensive, en introduce a number of new nodes.
//setup
[source,cypher]
----
MATCH ({name:"Diplazium tomentosum Blume"})<-[:TypeOf]-(c2)
CREATE (c:CollectingEvent
{collector: 'Roxburgh W.'
,collNumber:'S.N.'}
)
,(s:Specimen{id:'BR0000006990008'
,url:'http://plants.jstor.org/specimen/br0000006990008'
,heldIn:'BR'}
)
, (n:Name{ipniid: '17044840-1',name:'Asplenium hemionitoides Roxb.'})
, (s)<-[:DerivedFrom]- (c)
, (s)<-[conspec:ConspecificWith
{assertedBy:'Morton C.V. 1970/7/1'}]-(c2)
, (s) <-[:ConspecificWith
{assertedBy: 'PH'}] -(c2)
, (s)-[type:TypeOf
{assertedBy: 'Morton C.V. in http://www.biodiversitylibrary.org/page/409837#page/305'
, typeOfType: "Lecto"}]->(n)
RETURN n,s, c, c2, conspec, type
----
//graph
//table
And to accommodate the annotations by Morton, again a new specimen has to be introduced, from the Geneva herbarium.
//setup
[source,cypher]
----
MATCH (c:CollectingEvent
{collector: 'Roxburgh W.'
,collNumber:'S.N.'}
)
CREATE (s:Specimen
{collector: 'Christopher Smith'
,locality: 'Amboina'
,heldIn: 'G'}
)
,(s)<-[:DerivedFrom]-(c)
RETURN s,c
----
//graph
//table
Next stop is Sweden, where an interesting specimen from Taiwan is held, under the same name, but marked as type of another name:
//setup
[source,cypher]
----
CREATE (c:CollectingEvent
{collector:'Faurie, U.J.'
,collNumber:'168'
,locality:'Formosa. Urai.'
,country:'Taiwan'
,eventDate:'1914/4'}
)
,(s:Specimen{id:'SP10962'
,url:'http://plants.jstor.org/specimen/s-p-10962'
,heldIn:'SP'}
)
, (n:Name
{name:'Diplazium crenato-serratum (Blume) T.Moore var. hirta Rosenst.'}
)
, (s)<-[:DerivedFrom]-(c)
, (c)-[:TypeOf]->(n)
RETURN n, c, s
----
//graph
//table
To model the identification of this specimen, the specimen is linked to the type material of D. tomentosum
//setup
[source,cypher]
----
MATCH ({collector:'Faurie, U.J.'
,collNumber:'168'})-->(s)
MATCH ({name:"Diplazium tomentosum Blume"})<-[:TypeOf]-(c)
CREATE (s)<-[:ConspecificWith
{asIdentifiedBy: 'S'}] -(c)
RETURN s.heldIn
----
//graph
//table
And finally a specimen from Sumatra, also in Sweden, where it is labeled as type of D. burchardii, but the only type specimen we know of for this species is in L, at least, according to Morton's annotations in Geneva, who has seen and photographed it, with the number "Rosenst. fil. sumatranae exs. 22".
But we know that Rosenstock frequently renumbered specimens in series he distributed, so we can express this an an assertion that the Swedish specimen indeed derives from the same collection as the Leiden one.
//setup
[source,cypher]
----
CREATE (s1:Specimen{id:'SP4702'
,url:'http://plants.jstor.org/specimen/s-p-4702'
,heldIn:'SP'}
)
,(s2:Specimen{
heldIn:'L'
,collNumber:'Rosenst. fil. sumatranae exs. 22'}
)
,(c:CollectingEvent{
collector:'Burchard, O.'
,collNumber:'121'
,locality:'Sumatera: Indragiri, inter Tjinaco et Pukan Herun.'
,eventDate:'1907'}
)
, (n:Name{
name:'Diplazium burchardii Rosenst.'
,ipniid:'17246030-1'}
)
, (s1)<-[:DerivedFrom
{assertedBy: 'S'}]-(c)
, (s2)<-[:DerivedFrom
{assertedBy: 'PH'}]-(c)
, (c)-[:TypeOf]->(n)
RETURN s1, s2, n, c
----
//graph
//table
Knowing this, we could ask someone in L to examine the specimen and compare it with
any of the other specimens that are in L and that we have already connected to D. tomentosum. A Conspecificity link with any of the specimens in the network would include the name D. burchardii in the list of available names.
//setup
[source,cypher]
----
MATCH (n1:Name{name:"Diplazium burchardii Rosenst."})<--()-->(s2{heldIn: "L"})
MATCH (n2:Name{name:"Diplazium tomentosum Blume"})<--()-->(s1{heldIn: "L"})
CREATE(s1)<-[r:ConspecificWith
{assertedBy: "someone in L"}]-(s2)
RETURN s1, s2, r
----
//table
We may now list all the specimens that are connected by Conspecificity links to the original Blume type specimen.
[source,cypher]
----
MATCH (typeL:Specimen{id: "L0051073"})-[conspec:ConspecificWith*1..10]-()-[:DerivedFrom*1..10]-(s:Specimen)
WHERE s <> typeL
RETURN DISTINCT s, conspec
----
//table
Instead of listing a path, we could just output the names of the people that asserted the conspecificity links connected to the original Blume type specimen.
[source,cypher]
----
MATCH (typeL:Specimen{id: "L0051073"})-[conspec:ConspecificWith*1..10]-()-[:DerivedFrom*1..10]-(s:Specimen)
WHERE s <> typeL
RETURN DISTINCT s, reduce(x=[], r in conspec | x + (CASE WHEN (r.assertedBy IN x OR r.assertedBy IS NULL) THEN [] ELSE [r.assertedBy] END)) as namesOfAsserters
----
//table
By applying more strict criteria to the paths traversed to arrive at these results, we are now in principle able to recover different taxon concepts and specify these as specimen lists with associated names.
[source,cypher]
----
MATCH (typeL:Specimen{id: "L0051073"})-[conspec:ConspecificWith*1..10{assertedBy:"PH"}]-()-[:DerivedFrom*1..10]-(s:Specimen)
WHERE s <> typeL
RETURN DISTINCT s, conspec
----
//table
[source,cypher]
----
MATCH (typeL:Specimen{id: "L0051073"})-[conspec:ConspecificWith*1..10]-()-[deriv:DerivedFrom*1..10]-(s:Specimen)
WHERE s <> typeL
AND all(x in conspec WHERE x.assertedBy IN ['PH', 'Morton C.V. 1970/7/1', 'Morton C.V. in http://www.biodiversitylibrary.org/page/409837#page/305'])
AND all(x in conspec WHERE has(x.assertedBy))
AND all(x in deriv WHERE x.assertedBy IN ['PH', 'Morton C.V. 1970/7/1', 'Morton C.V. in http://www.biodiversitylibrary.org/page/409837#page/305'])
AND all(x in deriv WHERE has(x.assertedBy))
RETURN typeL, conspec, deriv, s
----
//table
//graph
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment