Skip to content

Instantly share code, notes, and snippets.

@vjrj
Last active December 13, 2022 08:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vjrj/60b68a68af76bfaa6cd54c1a787d88b7 to your computer and use it in GitHub Desktop.
Save vjrj/60b68a68af76bfaa6cd54c1a787d88b7 to your computer and use it in GitHub Desktop.
A node parse-transform utility to remove authors from scientificNames in GBIF Backbone to use in ALA
//
// DEPRECATED
// Use: https://github.com/living-atlases/gbif-taxonomy-for-la
//
const parse = require("csv-parse/lib/es5");
const transform = require('stream-transform');
const fs = require('fs');
var readStream = fs.createReadStream("./gbif-backbone/Taxon.tsv.orig");
const parser = parse({
quote: null,
delimiter: '\t'
});
var atFirstLine = true;
const transformer = transform(function(record) {
if (atFirstLine) {
// we skip the first line
atFirstLine = false;
} else {
const hasAuthor = record[6].length > 0;
const hasCanonicalName = record[7].length > 0;
const sciName = record[5];
if (hasAuthor) {
if (hasCanonicalName) record[5] = record[7];
else {
// no canonicalName so we try to remove author from sciName if it's there
var pos = record[5].lastIndexOf(" " + record[6]);
if (pos !== -1) {
record[5] = record[5].substr(0, pos);
}
}
}
}
return record.join('\t')+'\n';
});
readStream.on('open', function () {
readStream.pipe(parser).pipe(transformer).pipe(process.stdout)
});
readStream.on('error', function(err) {
readStream.end(err);
});
@rpfigueira
Copy link

When the scientificName has two authors, the first one in brackets, it seems it doesn't work, like with "Methanobrevibacter ruminantium (Smith & Hungate, 1958) Balch & Wolfe, 1981". I am using this script, but copying the canonicalName (column 8) to the scientificName.

@vjrj
Copy link
Author

vjrj commented Jan 20, 2020

If my memory doesn't fails I tried canoncialName without success because sometimes is empty.

But you are right @rpfigueira:
https://especies.gbif.es/species/1000111

I'll fix it ASAP. Thanks indeed for the feedback.

@vjrj
Copy link
Author

vjrj commented Jan 20, 2020

In this case I see that is caused because the Author field is
Balch & Wolfe, 1981 (Smith & Hungate, 1958)
but is added to the full name as:
(Smith & Hungate, 1958) Balch & Wolfe, 1981
so it's not correct detected.

Maybe the better way to fix this is to use canonicalName when possible and if not to try the truncate option as last option.

What do you think @rpfigueira? cc @djtfmartin.

@rpfigueira
Copy link

I wasn't aware of empty values in canonicalName. I wonder why that happens, and will check in my case. Also I wonder why the scientificNameAuthorship is inverted in those cases? (question to @timrobertson100). About the approach, it will work, although my last solution was to simply rename in taxon.tsv the column scientificName to nameComplete and canonicalName to scientificName.

@vjrj
Copy link
Author

vjrj commented Jan 20, 2020

If I'm not wrong, after some hours running the nameindexer, you'll get an error similar to: https://atlaslivingaustralia.slack.com/archives/CCTFGEU1G/p1573075450135000 when you find some name (cannonical or scientificName) empty.

@rpfigueira
Copy link

Well, I was lucky! I already ran and didn't get that error!...

@timrobertson100
Copy link

Thanks for alerting me @rpfigueira

Looks like combination authors and basionym authors are swapped in the authorship field, but they are correct in the scientificName field... I'll log an issue.

@vjrj
Copy link
Author

vjrj commented Jan 20, 2020

@rpfigueira lucky you! I So have to reindex my errors personal index ;-)

@vjrj
Copy link
Author

vjrj commented Jun 18, 2020

Back to this. I updated my gist following the @rpfigueira comment.
Before:

nameindexer --testSearch "Methanobrevibacter ruminantium"         
(...)
Search for name: Methanobrevibacter ruminantium
ID: 1000111
GUID: 1000111
Classification: "Balch & Wolfe, 1981 (Smith & Hungate, 1958)",Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobrevibacter
Scientific name: Methanobrevibacter ruminantium (Smith & Hungate, 1958) Balch & Wolfe, 1981
Authorship: Balch & Wolfe, 1981 (Smith & Hungate, 1958)
Rank: SPECIES
Synonym: null
Match type: exactMatch

and now:

nameindexer --testSearch "Methanobrevibacter ruminantium"                                                                                                                             
(...)
Search for name: Methanobrevibacter ruminantium
ID: 1000111
GUID: 1000111
Classification: "Balch & Wolfe, 1981 (Smith & Hungate, 1958)",Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobrevibacter
Scientific name: Methanobrevibacter ruminantium
Authorship: Balch & Wolfe, 1981 (Smith & Hungate, 1958)
Rank: SPECIES
Synonym: null
Match type: exactMatch

Thanks!

@vjrj
Copy link
Author

vjrj commented Nov 24, 2022

The issue reported by Tim gbif/checklistbank#100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment