Skip to content

Instantly share code, notes, and snippets.

View vadimkantorov's full-sized avatar
💭
looking for an internship for summer/fall 2021

Vadim Kantorov vadimkantorov

💭
looking for an internship for summer/fall 2021
View GitHub Profile
@vadimkantorov
vadimkantorov / find.sh
Last active January 1, 2024 10:40
Various find tricks
# cat all files not in .git directory
find -not -iwholename '*.git*' -type f -exec cat {} +
# cat all tsx files not in .git directory with file name section header
find -not -iwholename '*.git*' -type f -name '*.tsx' -exec tail --lines=+1 {} +
# unpack several downloaded tar files and then delete the original archives
find -name '*.tar.gz' -type f -exec tar -xf {} \; -delete
# search recursively in all CMakeLists.txt files a substring 1.78
@vadimkantorov
vadimkantorov / readwiktionary.py
Last active February 19, 2024 21:49
Read Wiktionary dump in Python
# https://dumps.wikimedia.org/wikidatawiki/entities/ https://dumps.wikimedia.org/ruwiktionary/ https://dumps.wikimedia.org/ruwiktionary/20231201/
#
# wget -L https://dumps.wikimedia.org/wikidatawiki/entities/20231213/wikidata-20231213-lexemes.json.bz2 https://dumps.wikimedia.org/ruwiktionary/20231201/ruwiktionary-20231201-pages-meta-current.xml.bz2
# bzcat wikidata-20231213-lexemes.json.bz2 | wc -l # 1198580
# bzcat wikidata-20231213-lexemes.json.bz2 | head -n 2
# bzcat ruwiktionary-20231201-pages-meta-current.xml.bz2 | wc -l # 196257893
# bzcat ruwiktionary-20231201-pages-meta-current.xml.bz2 | head -n 100
# bzgrep '<page>' ruwiktionary-20231201-pages-meta-current.xml.bz2 | wc -l # 2814450
# time python3 readwiktionary.py ruwiktionary-20231201-pages-meta-current.xml.bz2 ruwiktionary-20231201-pages-meta-current.xml.bz2 # real 11m15.868s # user 9m36.938s # sys 0m5.656s
@vadimkantorov
vadimkantorov / buildopusenczoo.sh
Last active November 30, 2023 11:18
A zoo of fast and slow opusenc versions
# bash buildopusenczoo.sh cc &> buildopusenczoo.sh.cc.txt
# bash buildopusenczoo.sh muslc &> buildopusenczoo.sh.muslc.txt # need to yum/apt install musl-tools
# cp /proc/cpuinfo cpuinfo.txt
if [ "$1" = "cc" ]; then
export SUFFIX=cc
else
export CFLAGS="-U_FORTIFY_SOURCE"
export LDFLAGS="--static -static -static-libgcc -lm -lc"
export CC=musl-gcc
@vadimkantorov
vadimkantorov / csvsubset.py
Last active January 10, 2024 17:40
Filter a subset of columns and of rows from a CSV file
# -*- coding: utf-8 -*-
#
# Usage:
# python csvsubset.py mycsv.csv --ignorecase --grep "donat" --key "First Name" "Last Name" "Email" "Mailing Postal Code" "Mailing City" --output-encoding=utf-16le > mycsvsubset.csv
import sys
import csv
import argparse
import re
@vadimkantorov
vadimkantorov / postproczoom.sh
Last active November 16, 2023 23:14
Postprocess a Zoom recording video, deleting some intro, overlaying text and lightboxes (to hide participants' faces) and optionally freezing a frame near the end
# unanswered so far :(
# https://video.stackexchange.com/questions/36992/concat-a-video-and-an-audio-using-ffmpeg-freezing-the-last-video-frame-without
# https://superuser.com/questions/1816897/freeze-a-frame-using-ffmpeg
# here are two different approaches:
# 1: boxoverlay #
# - drawtext on video top for the whole video
# - two different box overlays at different video intervals
ffmpeg -i video.mp4 -ss 00:03:10 -vf "drawtext=fontsize=30:fontcolor=pink:x=w*0.4:y=h*0.01:text='https\://exil-solidaire.fr',drawbox=x=0.83*in_w:y=0.48*in_h:w=0.17*in_w:h=0.35*in_h:color=lightblue:t=fill:enable='gt(t,189)',drawbox=y=0.045*in_h:w=in_w:h=in_h:color=lightblue:t=fill:enable='gt(t,1570)'" intro_and_qa_boxoverlay.mp4
@vadimkantorov
vadimkantorov / csvsetdiff.py
Last active November 21, 2023 16:40
Prints set difference of a CSV file with another CSV file based on specified join fields
# -*- coding: utf-8 -*-
#
# Usage:
# python csvsetdiff.py mycsv1.csv - mycsv2.csv --key1 "Email" --key2 "Email acheteur" > mycsvsetdiff.csv
# python3 csvsetdiff.py Contacts.csv - adherez.csv --delimiter1="," --delimiter2=";" --key1 "Email" "Email Home" --key2 "Email payeur" "Champ complémentaire 2 Email" --encoding=utf-16 --delimiter=$'\t' --ignorecase --translate .= > test3.csv
import sys
import csv
import argparse
@vadimkantorov
vadimkantorov / torch_global_isfinite_hook.py
Last active November 16, 2023 13:35
An example of detecting NaNs / infs in module outputs for some basic debugging
import torch
class Good(torch.nn.Module):
def forward(self, x):
return x + torch.ones_like(x)
class Bad(torch.nn.Module):
def forward(self, x):
return x + torch.full_like(x, float('nan'))
; https://github.com/gnu-octave/octave/blob/default/liboctave/numeric/lo-arpack-proto.h
; https://docs.octave.org/latest/Standalone-Programs.html
; nm -D /lib/x86_64-linux-gnu/libarpack.so.2.1.0 | grep dsaupd
; ltrace -x '*' echo hi
; the following line fails :(
; LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/octave/6.4.0 ltrace --config libarpack.ltrace.conf -x '*@libarpack*' ./standalonebuiltin
typedef complex64 = struct(double, double);