Skip to content

Instantly share code, notes, and snippets.

@copystar
Last active December 22, 2016 08:10
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save copystar/082ee2161f9c7a54c47a to your computer and use it in GitHub Desktop.
Save copystar/082ee2161f9c7a54c47a to your computer and use it in GitHub Desktop.
How to use bibliograph.parsing?
I'm hoping to do some citation analysis.
I want to count from a set of Web of Science records:
How many come from a particular institution? How many from that set are from a particular publisher?
I know this is possible if I learn BibTex.
But that's sounds like a lot of work for really just counting items in a set.
I found a Python module called bibliograph.parsing that sounds promising.
It takes citations of various formats and returns them in a Python dictionary.
And I know that there are a lot of Python commands for counting items within dictionaries.
The module is here: https://github.com/collective/bibliograph.parsing
The trouble is that there is no documentation or even an example so I can figure out how to use this module.
This suggests that it's using some sort of protocol (piping, I guess)
that is obvious to experienced programmers and invisible to others.
So given that I have a file called citations.txt, citations.bib, or citations.enl,
what would I type in to return a dictionary of citations using bibliograph.parsing?
@smoynes
Copy link

smoynes commented May 10, 2015

This is what I figured out.

Install the package and packages it depends on:

$ brew install bibutils
$ pip install zope.component
$ pip install zope.schema
$ pip install bibliograph.rendering
$ pip install bibliograph.parsing

I'm not sure why those packages aren't declared as dependencies. 😏

>>> from bibliograph.parsing.parsers.bibtex import BibtexParser
>>> bib = open('bibtex_test.bib').read()
>>> parser = BibtexParser()
>>> source = parser.preprocess(bib)
>>> result = parser.parseEntry(source)
>>> from pprint import pprint
>>> pprint(result)
{'ABSTRACT': 'Plone is a free, open source Content Management System. The focus of   Plone is to provide value at every level of an organization. It comes with a workflow   engine, pre-configured security and roles, a set of content types and multi-lingual   support. There are many developers, writers and testers from all over the world,   contributing to Plone everyday. Plone is based on the Content Management Framework.',
 'ADDRESS': 'Sebastoplol, CA',
 'ANNOTE': 'I really like it.',
 'AUTHORURLS': 'http://www.agmweb.ca/ and http://plone.org',
 'DOI': '1-23-345',
 'EDITION': '2nd  @BookLattier2001',
 'ISBN': '3874402436',
 'NOTE': 'unfinished but already quite substantial',
 'PUBLISHER': 'New Riders',
 'TITLE': 'The Plone Book.',
 'URL': 'http://plone.org/documentation/book/',
 'YEAR': '2003',
 'abstract': 'Plone is a free, open source Content Management System. The focus of   Plone is to provide value at every level of an organization. It comes with a workflow   engine, pre-configured security and roles, a set of content types and multi-lingual   support. There are many developers, writers and testers from all over the world,   contributing to Plone everyday. Plone is based on the Content Management Framework.',
 'address': 'Sebastoplol, CA',
 'annote': 'I really like it.',
 'author': ['Mark Lutz',
            'Amos Latteier and Michel Pelletier',
            'Andy McKay and Various Developers'],
 'authors': [{'firstname': 'Mark', 'lastname': 'Lutz', 'middlename': ''},
             {'firstname': 'Amos', 'lastname': 'Latteier', 'middlename': ''},
             {'firstname': 'Michel',
              'lastname': 'Pelletier',
              'middlename': ''},
             {'firstname': 'Andy', 'lastname': 'McKay', 'middlename': ''},
             {'firstname': 'Various',
              'lastname': 'Developers',
              'middlename': ''}],
 'authorurls': 'http://www.agmweb.ca/ and http://plone.org',
 'doi': '1-23-345',
 'edition': '2nd  @BookLattier2001',
 'identifiers': [{'label': 'ISBN', 'value': '3874402436'},
                 {'label': 'DOI', 'value': '1-23-345'}],
 'isbn': '3874402436',
 'note': 'unfinished but already quite substantial',
 'pid': 'Lutz2001',
 'publication_month': '',
 'publication_url': 'http://plone.org/documentation/book/',
 'publication_year': '2003',
 'publisher': 'New Riders',
 'reference_type': 'BookReference',
 'title': 'The Plone Book.',
 'url': 'http://plone.org/documentation/book/',
 'year': '2003'}

That's a simple example. Let me know if you have any trouble getting that far or if you get stuck on something else. 😺 😺 😺

@smoynes

@smoynes
Copy link

smoynes commented May 10, 2015

hrmm. i notice this only parsers a single entry from the file 😦 I'll dig a little further.

@smoynes
Copy link

smoynes commented May 10, 2015

Ah ha! 😊

>>> result = parser.getEntries(source)
>>> pprint(result)
[{'ABSTRACT': "This edition has been updated for Python version 2.   More significantly, it is also an almost entirely new advanced Python topics   book–a complete rewrite from the ground up. Very roughly, this edition includes:      * 240 pages (4 chapters) on Systems Programming     * 260 pages (4 chapters) on GUI Programming     * 400 pages (6 chapters) on Internet Scripting     * 150 pages (3 chapters) on Databases, Objects, and Text     * 100 pages (2 chapters) on Python/C Integration    To give you some idea of this edition's scope, it spans 1256 pages in its final   published format, and includes some 446 program file examples, and 302 screen shots   (broken down here). See the table of contents link above for more content details.    The book also includes a brand new CD-ROM with book examples, Python version 2.x   release packages and installers, and related open source packages (NumPy, SWIG, PIL,   PMW, and so on). Among other things, the book examples distribution on CD includes   self-launching and Python-coded clocks, text editors, image viewers, email clients,   calculators, and more.",
  'ADDRESS': 'Sebastoplol, CA',
  'AUTHORURLS': 'http://home.rmi.net/~lutz/',
  'EDITION': '2nd',
  'PUBLISHER': "O'Reilly",
  'TITLE': 'Programming Python.',
  'URL': 'http://home.rmi.net/~lutz/about-pp2e.html',
  'YEAR': '2001',
  'abstract': "This edition has been updated for Python version 2.   More significantly, it is also an almost entirely new advanced Python topics   book–a complete rewrite from the ground up. Very roughly, this edition includes:      * 240 pages (4 chapters) on Systems Programming     * 260 pages (4 chapters) on GUI Programming     * 400 pages (6 chapters) on Internet Scripting     * 150 pages (3 chapters) on Databases, Objects, and Text     * 100 pages (2 chapters) on Python/C Integration    To give you some idea of this edition's scope, it spans 1256 pages in its final   published format, and includes some 446 program file examples, and 302 screen shots   (broken down here). See the table of contents link above for more content details.    The book also includes a brand new CD-ROM with book examples, Python version 2.x   release packages and installers, and related open source packages (NumPy, SWIG, PIL,   PMW, and so on). Among other things, the book examples distribution on CD includes   self-launching and Python-coded clocks, text editors, image viewers, email clients,   calculators, and more.",
  'address': 'Sebastoplol, CA',
  'author': ['Mark Lutz'],
  'authors': [{'firstname': 'Mark', 'lastname': 'Lutz', 'middlename': ''}],
  'authorurls': 'http://home.rmi.net/~lutz/',
  'edition': '2nd',
  'pid': 'Lutz2001',
  'publication_month': '',
  'publication_url': 'http://home.rmi.net/~lutz/about-pp2e.html',
  'publication_year': '2001',
  'publisher': "O'Reilly",
  'reference_type': 'BookReference',
  'title': 'Programming Python.',
  'url': 'http://home.rmi.net/~lutz/about-pp2e.html',
  'year': '2001'},
 {'ABSTRACT': "The Zope Book is an authoritative guide to Zope, an open-source   Web application server. Zope goes beyond server-side scripting languages like PHP   by providing a complete object framework, a built-in Web server, a Web-based management   interface, and load-balancing through ZEO (Zope Enterprise Objects). That's a considerable   punch, and Zope is attracting increasing interest from developers looking for an   alternative to heavyweight commercial application servers. Zope is implemented in   Python, an object-oriented scripting language, and runs on Windows, Linux, and Solaris.",
  'NOTE': '2.6 edition forthcoming (see URL)',
  'PUBLISHER': 'New Riders',
  'TITLE': 'The ZOPE Book.',
  'URL': 'http://www.zope.org/Documentation/Books/ZopeBook/2_6Edition',
  'YEAR': '2001',
  'abstract': "The Zope Book is an authoritative guide to Zope, an open-source   Web application server. Zope goes beyond server-side scripting languages like PHP   by providing a complete object framework, a built-in Web server, a Web-based management   interface, and load-balancing through ZEO (Zope Enterprise Objects). That's a considerable   punch, and Zope is attracting increasing interest from developers looking for an   alternative to heavyweight commercial application servers. Zope is implemented in   Python, an object-oriented scripting language, and runs on Windows, Linux, and Solaris.",
  'author': ['Amos Latteier and Michel Pelletier'],
  'authors': [{'firstname': 'Amos',
               'lastname': 'Latteier',
               'middlename': ''},
              {'firstname': 'Michel',
               'lastname': 'Pelletier',
               'middlename': ''}],
  'note': '2.6 edition forthcoming (see URL)',
  'pid': 'Lattier2001',
  'publication_month': '',
  'publication_url': 'http://www.zope.org/Documentation/Books/ZopeBook/2_6Edition',
  'publication_year': '2001',
  'publisher': 'New Riders',
  'reference_type': 'BookReference',
  'title': 'The ZOPE Book.',
  'url': 'http://www.zope.org/Documentation/Books/ZopeBook/2_6Edition',
  'year': '2001'},
 {'ABSTRACT': 'Plone is a free, open source Content Management System. The focus of   Plone is to provide value at every level of an organization. It comes with a workflow   engine, pre-configured security and roles, a set of content types and multi-lingual   support. There are many developers, writers and testers from all over the world,   contributing to Plone everyday. Plone is based on the Content Management Framework.',
  'ANNOTE': 'I really like it.',
  'AUTHORURLS': 'http://www.agmweb.ca/ and http://plone.org',
  'DOI': '1-23-345',
  'ISBN': '3874402436',
  'NOTE': 'unfinished but already quite substantial',
  'TITLE': 'The Plone Book.',
  'URL': 'http://plone.org/documentation/book/',
  'YEAR': '2003',
  'abstract': 'Plone is a free, open source Content Management System. The focus of   Plone is to provide value at every level of an organization. It comes with a workflow   engine, pre-configured security and roles, a set of content types and multi-lingual   support. There are many developers, writers and testers from all over the world,   contributing to Plone everyday. Plone is based on the Content Management Framework.',
  'annote': 'I really like it.',
  'author': ['Andy McKay and Various Developers'],
  'authors': [{'firstname': 'Andy', 'lastname': 'McKay', 'middlename': ''},
              {'firstname': 'Various',
               'lastname': 'Developers',
               'middlename': ''}],
  'authorurls': 'http://www.agmweb.ca/ and http://plone.org',
  'doi': '1-23-345',
  'identifiers': [{'label': 'ISBN', 'value': '3874402436'},
                  {'label': 'DOI', 'value': '1-23-345'}],
  'isbn': '3874402436',
  'note': 'unfinished but already quite substantial',
  'pid': 'McKay2003',
  'publication_month': '',
  'publication_url': 'http://plone.org/documentation/book/',
  'publication_year': '2003',
  'reference_type': 'WebpublishedReference',
  'title': 'The Plone Book.',
  'url': 'http://plone.org/documentation/book/',
  'year': '2003'}]

@copystar
Copy link
Author

Thanks SOOOO much! Unfortunately, for some reason, the line "source = parser.preprocess(bib)" results in an error message:

source = parser.preprocess(bib)
Traceback (most recent call last):
File "", line 1, in
NameError: name 'parser' is not defined

(Because I'm using Ubuntu on a non-mac, instead of "brew install bibutils" - I used apt-get install bibutils. Not sure if this is a difference that makes a difference)

@smoynes
Copy link

smoynes commented May 11, 2015

Oops! I missed copypasting this line:

>>> parser = BibtexParser()

I've added it to my comment above.

apt-get install bibutils should be totally fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment