Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@rossmounce
Last active August 29, 2015 14:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rossmounce/3a8d6ea07ec0017ce549 to your computer and use it in GitHub Desktop.
Save rossmounce/3a8d6ea07ec0017ce549 to your computer and use it in GitHub Desktop.
Reply to Rod Page (having technical problems posting this at PeerJ PrePrints)
Thanks for your feedback Rod. I really value it.
I don't pretend to have all the answers. All of the academic content discovery
services are fairly murky about how they actually index things,
as I'm sure you know (Google Scholar perhaps being the most open-ish about how it does things?).
> how comparable are PLoS and Zootaxa from the perspective of search engines?
I am not a search engine. I am a human researcher. Whether a paper is
published in Nature, Science, PLOS ONE or Zootaxa, it is the same to me -
this is a logical and defensible position. I get what you're asking but as
I've never had a job at a search engine I'm afraid I don't have much insight
there.
> you used a complete set of Zootaxa PDFs obtained from the NHM?
yes, that information is in the paper. Metadata about those PDFs is in the
supplementary materials on figshare. As you know I cannot easily 'prove' I
had the full set of PDFs because copyright restrictions do not enable me to
repost the entire dataset, publicly online. This would infringe the copyright
of Magnolia Press. I can however repost the entire set of PLOS ONE articles
analysed as they were all published under CC BY or CC0.
> articles that are both open access and behind a paywall?
Yes. This is acknowledged in the paper. Regardless of whether a paper is open
access to the general public, it could still be privately indexed by content
search providers & that private full-text indexing made available during
search. Discoverability is not access. Paywalls can be made semi-permeable,
allowing known IP addresses through e.g. Google Scholar's indexing crawlers
and bots, whilst denying access to non-subscribers at other IP addresses.
> Perhaps a better question is how the open access subset of Zootaxa compares to PLoS?
I'm sorry if I didn't make the hypothesis I was testing clearer. I want to
test the discoverability of articles (regardless of OA or not). Yes, it does
seem reasonable to pre-suppose that open access articles might be advantaged,
but until we prove that with data I can't just make that assumption. If you
know of any other research that demonstrates superiority of discoverability
of OA research (not citation, views, downloads) then please let me know, I
should cite it in this paper.
> confounding different media (PDF versus HTML) with different degrees of access?
I agree. This could certainly be one of the causitive mechanisms of the
observed low recall of Zootaxa in Google Scholar. The point is, the observed
effect (poor discoverability in Google Scholar) is real regardless of the
cause [You're welcome to dispute the data given in the tables, but since I
did the searches only a few days ago I doubt the results have changed]. If
the cause is that Zootaxa does not provide HTML, then the obvious solution is
that Zootaxa should provide HTML full-text. Or just accept low
discoverability in Google Scholar :S
> Did you talk to Zhi-Qiang Zhang (editor of Zootaxa)?
Yes. I emailed him this morning.
I'm very pleased Magnolia Press have recently adopted DOIs, are moving the to
OJS platform, and have adopted the CC BY licence for hybrid open access
articles. These are all good moves towards better publishing. Given the
results here, perhaps they should also look at providing full text HTML or
XML, to continue their progress. They are an extremely important publisher of
taxonomy.
> You are making various statements about how you think search engines access
content, it would be interesting to actually know.
I agree, and also feel uncomfortable about the lack of evidence but services
like Scopus, WoK, MAS, MS *are* untransparent, proprietary, opaque systems. I
can't really change that. I certainly see that as a problem. Academia sorely
needs an open, transparent system of indexing peer-reviewed published content.
> ...there is a world of difference...
Yes. I agree there is vast difference in funding between fields. I'm not
entirely sure that difference prevents Magnolia Press from publishing full
text HTML on their OJS platform. Other, similar "shoe-string" (your words not
mine!) operations also produce full text HTML on OJS, albeit not quite at
the scale of Zootaxa & Phytotaxa. But surely this research could be used as
evidence to ask for more funding? Here is objective evidence showing that
more money is needed to do more useful taxonomic publishing to maximize
return on investment. (?)
Prior to this research I was not aware of anything (aside from cited papers
on OA citation, downloads, views advantage) that proves with real data that
publishing in PLOS ONE provides excellent discoverability of research (in
Google Scholar), substantially better than at other journals. That's why I've
published this. I think people need to know about this. I think it's
important. Incidentally this paper doesn't directly test whether
discoverability has anything to do with OA. That needs follow-up work to
demonstrate.
This is merely a first-pass demonstration that born-digital journal content
can have substantially different discoverability in academic search engines,
depending on where it's published (Making a conscious effort here not to
overstate what I've done).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment