Skip to content

Instantly share code, notes, and snippets.

View turian's full-sized avatar

Joseph Turian turian

View GitHub Profile
@turian
turian / extractors.py
Created August 5, 2012 02:11 — forked from osiloke/extractors.py
A scrapy link extractor that uses BeautifulSoup
"""
Modified from: https://gist.github.com/1142142
to include the text of the link.
"""
import re
from scrapy.link import Link
from urlparse import urljoin
from BeautifulSoup import BeautifulSoup

Keybase proof

I hereby claim:

  • I am turian on github.
  • I am turian (https://keybase.io/turian) on keybase.
  • I have a public key whose fingerprint is B8C4 F42B D493 3F85 FA4E 2E94 7384 B7F2 7748 51AB

To claim this, I am signing this object:

@turian
turian / annoytune.py
Last active June 19, 2019 05:16
Tune search_k for annoy library.
"""
Tune search_k for annoy library.
This is for people who want their nearest-neighbors to be above a
certain threshold of precision, but otherwise want it as fast as
possible.
AUTHOR: Joseph Turian
LICENSE: Apache License 2.0
"""
@turian
turian / pytorch-lightning-siamese.ipynb
Last active June 25, 2020 22:47
Pytorch Lightning Siamese
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# Number of inputs
NIN = 1000
NHID = 10
# Number of examples
EXAMPLES = 100000
import timeit
import logging
from collections import OrderedDict
@turian
turian / pandas_first_by_column.py
Created August 6, 2020 18:44
Pandas function to take to the first value for each of a column value
def first_by_column(df, colname):
"""
Pandas function to take to the first value for each of a column value.
Useful if the table if sorted in descending order and you want to group over
a particular column, to find the max for each value of that field.
"""
newdf = []
#print(colname)
#print(df)
colvals = pd.unique(df.loc[:, [colname]].squeeze())