-
-
Save vjrj/60b68a68af76bfaa6cd54c1a787d88b7 to your computer and use it in GitHub Desktop.
// | |
// DEPRECATED | |
// Use: https://github.com/living-atlases/gbif-taxonomy-for-la | |
// | |
const parse = require("csv-parse/lib/es5"); | |
const transform = require('stream-transform'); | |
const fs = require('fs'); | |
var readStream = fs.createReadStream("./gbif-backbone/Taxon.tsv.orig"); | |
const parser = parse({ | |
quote: null, | |
delimiter: '\t' | |
}); | |
var atFirstLine = true; | |
const transformer = transform(function(record) { | |
if (atFirstLine) { | |
// we skip the first line | |
atFirstLine = false; | |
} else { | |
const hasAuthor = record[6].length > 0; | |
const hasCanonicalName = record[7].length > 0; | |
const sciName = record[5]; | |
if (hasAuthor) { | |
if (hasCanonicalName) record[5] = record[7]; | |
else { | |
// no canonicalName so we try to remove author from sciName if it's there | |
var pos = record[5].lastIndexOf(" " + record[6]); | |
if (pos !== -1) { | |
record[5] = record[5].substr(0, pos); | |
} | |
} | |
} | |
} | |
return record.join('\t')+'\n'; | |
}); | |
readStream.on('open', function () { | |
readStream.pipe(parser).pipe(transformer).pipe(process.stdout) | |
}); | |
readStream.on('error', function(err) { | |
readStream.end(err); | |
}); |
If my memory doesn't fails I tried canoncialName
without success because sometimes is empty.
But you are right @rpfigueira:
https://especies.gbif.es/species/1000111
I'll fix it ASAP. Thanks indeed for the feedback.
In this case I see that is caused because the Author field is
Balch & Wolfe, 1981 (Smith & Hungate, 1958)
but is added to the full name as:
(Smith & Hungate, 1958) Balch & Wolfe, 1981
so it's not correct detected.
Maybe the better way to fix this is to use canonicalName
when possible and if not to try the truncate option as last option.
What do you think @rpfigueira? cc @djtfmartin.
I wasn't aware of empty values in canonicalName
. I wonder why that happens, and will check in my case. Also I wonder why the scientificNameAuthorship
is inverted in those cases? (question to @timrobertson100). About the approach, it will work, although my last solution was to simply rename in taxon.tsv the column scientificName
to nameComplete
and canonicalName
to scientificName
.
If I'm not wrong, after some hours running the nameindexer
, you'll get an error similar to: https://atlaslivingaustralia.slack.com/archives/CCTFGEU1G/p1573075450135000 when you find some name (cannonical
or scientificName
) empty.
Well, I was lucky! I already ran and didn't get that error!...
Thanks for alerting me @rpfigueira
Looks like combination authors and basionym authors are swapped in the authorship field, but they are correct in the scientificName field... I'll log an issue.
@rpfigueira lucky you! I So have to reindex my errors personal index ;-)
Back to this. I updated my gist following the @rpfigueira comment.
Before:
nameindexer --testSearch "Methanobrevibacter ruminantium"
(...)
Search for name: Methanobrevibacter ruminantium
ID: 1000111
GUID: 1000111
Classification: "Balch & Wolfe, 1981 (Smith & Hungate, 1958)",Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobrevibacter
Scientific name: Methanobrevibacter ruminantium (Smith & Hungate, 1958) Balch & Wolfe, 1981
Authorship: Balch & Wolfe, 1981 (Smith & Hungate, 1958)
Rank: SPECIES
Synonym: null
Match type: exactMatch
and now:
nameindexer --testSearch "Methanobrevibacter ruminantium"
(...)
Search for name: Methanobrevibacter ruminantium
ID: 1000111
GUID: 1000111
Classification: "Balch & Wolfe, 1981 (Smith & Hungate, 1958)",Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobrevibacter
Scientific name: Methanobrevibacter ruminantium
Authorship: Balch & Wolfe, 1981 (Smith & Hungate, 1958)
Rank: SPECIES
Synonym: null
Match type: exactMatch
Thanks!
The issue reported by Tim gbif/checklistbank#100
When the scientificName has two authors, the first one in brackets, it seems it doesn't work, like with "Methanobrevibacter ruminantium (Smith & Hungate, 1958) Balch & Wolfe, 1981". I am using this script, but copying the canonicalName (column 8) to the scientificName.