Skip to content

Instantly share code, notes, and snippets.

View langner's full-sized avatar

Karol M. Langner langner

  • Google
  • Mountain View
View GitHub Profile
@langner
langner / prufer_code.py
Created September 21, 2015 22:13
Transformation of trees into Prufer sequence and back
"""Transformation of trees into Prufer sequence and back.
The Prufer sequence of a labeled tree is a unique seqience associated
with that tree on length n-2 where there are n vertices in the tree.
More information: https://en.wikipedia.org/wiki/Prüfer_sequence
Copyright 2015 Karol M. Langner, Google Inc.
Licensed under the Apache License, Version 2.0
http://www.apache.org/licenses/LICENSE-2.0
"""
@langner
langner / similar_first_author.py
Last active May 7, 2021 09:34
Python function for testing similarity between the first authors in two article author fields
def similar_first_author(author1, author2):
"""Determine whether two first authors have the same names.
Since there can be various fluctuations in first names and initials, we will
only check the first word in the author string and the first letter of the
second word. Although the second word is usually the first name, there will
be exceptions for multi-word last names, but this will be a small minority
and still passes our test. In case there is just a single word, use just that.
"""
author1 = author1.lower().decode('utf-8')
greek_alphabet = {
u'\u0391': 'Alpha',
u'\u0392': 'Beta',
u'\u0393': 'Gamma',
u'\u0394': 'Delta',
u'\u0395': 'Epsilon',
u'\u0396': 'Zeta',
u'\u0397': 'Eta',
u'\u0398': 'Theta',
u'\u0399': 'Iota',
@langner
langner / clean_debconfigs.sh
Created November 11, 2014 21:26
Purge all packages that still have configuration files installed
aptitude purge `aptitude search ~c | awk '{ print $2 }'`
@langner
langner / postgresql-namedatalen.sh
Last active June 13, 2017 07:10
Script that builds postgresql-9.1 packages on Ubuntu with increased NAMEDATALEN patch (use with patch https://gist.github.com/langner/5c7bc1d74a8b957cab26)
#!/bin/bash
# Create work directory and empty it.
ROOTDIR="postgresql-namedatalen"
mkdir -p $ROOTDIR
cd $ROOTDIR
rm -rf *
# Get the sources and determine current version.
PKGNAME="postgresql-9.1"
@langner
langner / namedatalen-256.patch
Created October 21, 2014 17:29
Patch that increases NAMEDATALEN to 256 in postgresql-9.1.14-0ubuntu0.12.04 (use with https://gist.github.com/langner/12a032a8793c2df80f5d)
Index: postgresql-9.1-9.1.14/src/include/pg_config_manual.h
===================================================================
--- postgresql-9.1-9.1.14.orig/src/include/pg_config_manual.h 2014-10-14 16:55:38.000000000 -0400
+++ postgresql-9.1-9.1.14/src/include/pg_config_manual.h 2014-10-14 16:56:01.598940653 -0400
@@ -17,7 +17,7 @@
*
* Changing this requires an initdb.
*/
-#define NAMEDATALEN 64
+#define NAMEDATALEN 256
@langner
langner / nuclear_repulsion_energy.py
Created October 21, 2014 02:45
Example of how to calculate the nuclear repulsion energy from parsed cclib data
import cclib
import numpy
import urllib
def nuclear_repulsion_energy(data):
nre = 0.0
for i in range(data.natom):
ri = data.atomcoords[0][i]
zi = data.atomnos[i]
for j in range(i+1, data.natom):
@langner
langner / logcheck_ubuntu
Last active January 7, 2017 19:48
Additional logcheck rules for Ubuntu 10/12 workstations and servers
# amavis messages
amavis\[[0-9]+\]: \([-0-9]+\) Passed (CLEAN|BAD-HEADER|SPAM|BANNED)
# avahi daemon: warnings about invalid repsonses and such
avahi-daemon\[[0-9]+\]: Invalid (query packet|legacy unicast query packet|response packet from host)
avahi-daemon\[[0-9]+\]: Received response from host [.0-9]+ with invalid source port [0-9]+ on interface
avahi-daemon\[[0-9]+\]:( last)? message repeated [0-9]+ times
avahi-daemon\[[0-9]+\]: server.c: Packet too short or invalid while reading response record.
avahi-daemon\[[0-9]+\]: dbus-protocol.c: Too many objects for client
@langner
langner / similar_titles.py
Last active May 7, 2021 09:55
Python function for testing similarity of two article title fields
import difflib
import string
def similar_titles(t1, t2, accuracy=1.00, debug=None):
"""Determine whether two titles are similar.
As a rule, we want titles to be identical after removing whitespace
and punctuation. Other discrepancies should be dealt with manually by
ensuring the titles are correct, or by replacing strings in all titles,
in this function, before comparing them.
@langner
langner / similar_pages.py
Last active August 29, 2015 14:05
Python function for testing similarity of two article pages fields
def similar_pages(pages1, pages2):
"""Determine whether two pages strings are similar.
Redundant digits in the end page should be ignored -- for example, 1660-1661 can be
reduced to 1660-1 -- and the end page (and hyphen) can be skipped if it's a single page.
Additionally, for some journals, WoK can also replace the end page with something else,
for example: 241-+ instead of 241-247.e9 (supp info), or O1125-U144 (no idea what that is),
and they have said this cannot change for technical reasons. Oh well.
Additional exceptions: