Skip to content

Instantly share code, notes, and snippets.

@binshengliu
binshengliu / random_choice_noreplace.py
Created June 8, 2020 05:44
Random choice unique along axis
# https://stackoverflow.com/a/45438143/955952
def random_choice_noreplace(m, n, axis=-1):
# m, n are the number of rows, cols of output
return np.random.rand(m, n).argsort(axis=axis)
@binshengliu
binshengliu / conda-perl-for-rouge.sh
Created April 6, 2020 08:55
Solve `Can't locate DB_File.pm in @INC` when running ROUGE-1.5.5.pl without conda
conda config --append channels bioconda
conda install perl perl-db-file perl-xml-parser perl-libwww-perl # This will install tons of packages
sed -i '1 s/perl.*/env perl/' ~/.files2rouge/ROUGE-1.5.5.pl
@binshengliu
binshengliu / product_ndarray.py
Created March 25, 2020 08:42
All combinations of two ndarray
arr = np.array(np.meshgrid(arr0, arr1)).T.reshape(-1, 2)
@binshengliu
binshengliu / dedup-trec-dup.sh
Created March 22, 2020 05:25
work around duplicate documents in trec run
sort -u -k 1,2 run.monobert.dev.small.tsv | trec_eval -m recip_rank_cut.10 qrels.dev.small.tsv -
sort -u -k 1,2 run.duobert.dev.small.tsv | trec_eval -m recip_rank_cut.10 qrels.dev.small.tsv -
perf record -F 99 -g -a <command>
perf script | ~/src/FlameGraph/stackcollapse-perf.pl | ~/src/FlameGraph/flamegraph.pl > perf.svg
google-chrome perf.svg
@binshengliu
binshengliu / extra_legend.py
Created February 27, 2020 00:29
Avoid legend being cut off
# https://stackoverflow.com/a/10154763/955952
fig.savefig('samplefigure', bbox_extra_artists=(lgd,text), bbox_inches='tight')
@binshengliu
binshengliu / fit_figure.tex
Last active February 27, 2020 00:28
Fit a figure to slide size.
% https://tex.stackexchange.com/q/32886/124998
\includegraphics[width=\textwidth,height=\textheight,keepaspectratio]{myfig}
@binshengliu
binshengliu / groupby_mp.py
Last active January 9, 2020 06:41
Apply to pandas groups with multi processing
from more_itertools import always_iterable
from typing import Callable, Union
import multiprocessing
import pandas as pd
def groupby_mp(groupby_df: pd.core.groupby.DataFrameGroupBy,
func: Callable[[pd.DataFrame], Union[pd.DataFrame, pd.Series]],
num_cpus: int = multiprocessing.cpu_count() // 2,
chunksize: int = 1) -> pd.DataFrame:
@binshengliu
binshengliu / rank_metrics.py
Created December 24, 2019 07:54 — forked from bwhite/rank_metrics.py
Ranking Metrics
"""Information Retrieval metrics
Useful Resources:
http://www.cs.utexas.edu/~mooney/ir-course/slides/Evaluation.ppt
http://www.nii.ac.jp/TechReports/05-014E.pdf
http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
http://hal.archives-ouvertes.fr/docs/00/72/67/60/PDF/07-busa-fekete.pdf
Learning to Rank for Information Retrieval (Tie-Yan Liu)
"""
import numpy as np
@binshengliu
binshengliu / tqdmf.py
Created December 10, 2019 04:28
Iterate huge file line by line with tqdm progress bar
def tqdmf(path, *args, **kwargs):
with tqdm(
total=os.stat(path).st_size,
unit='B',
unit_scale=True,
unit_divisor=1024,
*args,
**kwargs) as bar:
with open(path, 'r') as f:
for line in f: