Skip to content

Instantly share code, notes, and snippets.

@zuphilip
Last active January 15, 2018 16:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save zuphilip/e881f1bcbdabe7537eead67b0df72109 to your computer and use it in GitHub Desktop.
Save zuphilip/e881f1bcbdabe7537eead67b0df72109 to your computer and use it in GitHub Desktop.
LOC-DB Journals Analysis
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@anlausch
Copy link

Hi Philipp,
thanks for the nice script. Here are my comments:

  • I added for each calculation a min/ max estimation.
    In the loop:
    number_of_references_per_article.append(nref)
    and then later:
print("Max", max(number_of_references_per_article))
print("Min", min(number_of_references_per_article))
  • The has-references is not a misclassification; as I understand from the documentation, it just means that they are not openly accessible via the api. But you are right, for our purposes it is interesting, no matter how we name it.
  • The estimation for books is a bit tricky. Why do we need it per page (the journal references are also not calculated per page)? With no identifiers, one could filter for the type of the resource and the year:
works = Works(etiquette=my_etiquette).filter(has_references="true").filter(from_pub_date=2011).filter(until_pub_date=2011).filter(type="monograph")
for work in works:
    nref = work['reference-count']
    k += 1
    sum += nref
    number_of_references_per_resource.append(nref)
print("Max", max(number_of_references_per_resource))
print("Min", min(number_of_references_per_resource))
print("Number of monographs returned: ", k)
print("Sum of references: ", sum)
print("Average: ", sum/k)

We could do this for all crossref resource types, which you refer to by saying "book", i.e. book, monograph (more?). The same can be done for book chapters, of course.
The problem is that this is not related to the UniMA purchases nor domain-specific. Maybe category-name could help for the latter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment