Skip to content

Instantly share code, notes, and snippets.

View langner's full-sized avatar

Karol M. Langner langner

  • Google
  • Mountain View
View GitHub Profile
@langner
langner / pubmed_search.py
Last active January 29, 2022 14:52
A class that searches Pubmed for a list of PMIDs via the BioPython Entrez module and returns the results in a simpler dictionary format.
"""Tools for searching Pubmed for a list of PMIDs.
The goal here is to search for many PMIDs at once, since searching
sequentially can take a long time. Using the the BioPython Entrez module
is super convenient to this end.
The results results are returned in a simple dictionary format.
"""
@langner
langner / similar_doi.py
Last active August 29, 2015 14:05
Python function for testing similarity of two DOIs
def similar_dois(d1, d2, accuracy=1.00):
"""Determine whether DOIs are similar.
DOIs basically need to be identical, but are case insensitive.
"""
if not (d1 and d2):
return False
d1 = d1.lower().strip()
d2 = d2.lower().strip()
return difflib.SequenceMatcher(None, d1, d2).ratio() == accuracy
@langner
langner / similar_pages.py
Last active August 29, 2015 14:05
Python function for testing similarity of two article pages fields
def similar_pages(pages1, pages2):
"""Determine whether two pages strings are similar.
Redundant digits in the end page should be ignored -- for example, 1660-1661 can be
reduced to 1660-1 -- and the end page (and hyphen) can be skipped if it's a single page.
Additionally, for some journals, WoK can also replace the end page with something else,
for example: 241-+ instead of 241-247.e9 (supp info), or O1125-U144 (no idea what that is),
and they have said this cannot change for technical reasons. Oh well.
Additional exceptions:
@langner
langner / similar_titles.py
Last active May 7, 2021 09:55
Python function for testing similarity of two article title fields
import difflib
import string
def similar_titles(t1, t2, accuracy=1.00, debug=None):
"""Determine whether two titles are similar.
As a rule, we want titles to be identical after removing whitespace
and punctuation. Other discrepancies should be dealt with manually by
ensuring the titles are correct, or by replacing strings in all titles,
in this function, before comparing them.
@langner
langner / logcheck_ubuntu
Last active January 7, 2017 19:48
Additional logcheck rules for Ubuntu 10/12 workstations and servers
# amavis messages
amavis\[[0-9]+\]: \([-0-9]+\) Passed (CLEAN|BAD-HEADER|SPAM|BANNED)
# avahi daemon: warnings about invalid repsonses and such
avahi-daemon\[[0-9]+\]: Invalid (query packet|legacy unicast query packet|response packet from host)
avahi-daemon\[[0-9]+\]: Received response from host [.0-9]+ with invalid source port [0-9]+ on interface
avahi-daemon\[[0-9]+\]:( last)? message repeated [0-9]+ times
avahi-daemon\[[0-9]+\]: server.c: Packet too short or invalid while reading response record.
avahi-daemon\[[0-9]+\]: dbus-protocol.c: Too many objects for client
@langner
langner / nuclear_repulsion_energy.py
Created October 21, 2014 02:45
Example of how to calculate the nuclear repulsion energy from parsed cclib data
import cclib
import numpy
import urllib
def nuclear_repulsion_energy(data):
nre = 0.0
for i in range(data.natom):
ri = data.atomcoords[0][i]
zi = data.atomnos[i]
for j in range(i+1, data.natom):
@langner
langner / namedatalen-256.patch
Created October 21, 2014 17:29
Patch that increases NAMEDATALEN to 256 in postgresql-9.1.14-0ubuntu0.12.04 (use with https://gist.github.com/langner/12a032a8793c2df80f5d)
Index: postgresql-9.1-9.1.14/src/include/pg_config_manual.h
===================================================================
--- postgresql-9.1-9.1.14.orig/src/include/pg_config_manual.h 2014-10-14 16:55:38.000000000 -0400
+++ postgresql-9.1-9.1.14/src/include/pg_config_manual.h 2014-10-14 16:56:01.598940653 -0400
@@ -17,7 +17,7 @@
*
* Changing this requires an initdb.
*/
-#define NAMEDATALEN 64
+#define NAMEDATALEN 256
@langner
langner / postgresql-namedatalen.sh
Last active June 13, 2017 07:10
Script that builds postgresql-9.1 packages on Ubuntu with increased NAMEDATALEN patch (use with patch https://gist.github.com/langner/5c7bc1d74a8b957cab26)
#!/bin/bash
# Create work directory and empty it.
ROOTDIR="postgresql-namedatalen"
mkdir -p $ROOTDIR
cd $ROOTDIR
rm -rf *
# Get the sources and determine current version.
PKGNAME="postgresql-9.1"
@langner
langner / clean_debconfigs.sh
Created November 11, 2014 21:26
Purge all packages that still have configuration files installed
aptitude purge `aptitude search ~c | awk '{ print $2 }'`
greek_alphabet = {
u'\u0391': 'Alpha',
u'\u0392': 'Beta',
u'\u0393': 'Gamma',
u'\u0394': 'Delta',
u'\u0395': 'Epsilon',
u'\u0396': 'Zeta',
u'\u0397': 'Eta',
u'\u0398': 'Theta',
u'\u0399': 'Iota',