Skip to content

Instantly share code, notes, and snippets.

@dleehr
Created November 17, 2017 18:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dleehr/006ddf61f982e56922c5642edf57a1a9 to your computer and use it in GitHub Desktop.
Save dleehr/006ddf61f982e56922c5642edf57a1a9 to your computer and use it in GitHub Desktop.
crossref CN lookups
#!/bin/bash
curl -SLH "Accept: text/bibliography; style=apa" http://dx.doi.org/$1
#!/usr/bin/env python
from __future__ import print_function
from habanero import cn
import sys
doi_name = 'doi:' + sys.argv[1]
print(cn.content_negotiation(ids=doi_name, format="text", style="apa"))
#!/usr/bin/env python
from __future__ import print_function
import requests
import sys
headers = {'Accept': 'text/bibliography; style=apa', 'Accept-Encoding': 'utf-8'}
response = requests.get('http://dx.doi.org/' + sys.argv[1], headers=headers)
print(response.text.encode(response.encoding).decode('utf-8'))
@dleehr
Copy link
Author

dleehr commented Nov 17, 2017

Using habanero to do content negotiation can result in mangled character encoding.

For example, looking up doi:10.1093/bioinformatics/btp324 results in different output from habanero.cn vs a standard curl:

lookup_curl.sh:

Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14), 1754–1760. doi:10.1093/bioinformatics/btp324

lookup_habanero.py:

Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14), 1754�1760. doi:10.1093/bioinformatics/btp324

Requests marks the response encoding as ISO-8859-1. While cn returns the response text, it does not pass along the encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment