Skip to content

Instantly share code, notes, and snippets.

View vadimkantorov's full-sized avatar
💭
looking for an internship for summer/fall 2021

Vadim Kantorov vadimkantorov

💭
looking for an internship for summer/fall 2021
View GitHub Profile
@vadimkantorov
vadimkantorov / git.sh
Last active May 5, 2024 13:38
Various git tricks
# fork your own repo, example of my repo https://github.com/vadimkantorov/eventmap
git clone git@github.com:vadimkantorov/eventmapexample.git
cd eventmapexample
git remote add upstream git@github.com:vadimkantorov/eventmap.git
git pull upstream gh-pages
git checkout gh-pages
git push -u origin gh-pages
# update git to latest version on ubuntu
sudo add-apt-repository -y ppa:git-core/ppa
@vadimkantorov
vadimkantorov / yaml_loads.py
Last active May 5, 2024 00:14
Simple string-valued parser for YAML supporting
# supports only strings, dicts and lists
# does not support multiline strings as the first list-item key `- run: |`
# does not support record parsing into a dict: `- {asd: foobar, foo: "bar"}`
def yaml_loads(content):
procval = lambda val: (val[1:-1] if len(val) >= 2 and ((val[0] == val[-1] == '"') or (val[0] == val[-1] == "'")) else val.split('#', maxsplit = 1)[0].strip()) if val else ''
lines = content.strip().splitlines()
res = {}
@vadimkantorov
vadimkantorov / feed_xml.py
Created April 26, 2024 21:16
[WIP] Generate a RSS feed.xml from a posts collection
import xml.dom.minidom
def feed_write(ctx, path, generator_name = 'minimapython', generator_uri = 'https://github.com/vadimkantorov/minima', generator_version = 'https://github.com/vadimkantorov/minimapython'):
site = ctx.get('site', {})
site__lang = site.get('lang')
page__url__absolute_url = ''
root__absolute_url = ''
site__time__date_to_xml_schema = ''
page__url__absolute_url__xml_escape = ''
@vadimkantorov
vadimkantorov / ctc_alignment_targets.py
Last active April 9, 2024 03:03
An implementation of CTC re-formulation via cross-entropy with pseudo-labels, following "A Novel Re-weighting Method for Connectionist Temporal Classification"
# CTC vanilla and CTC via crossentropy are equal, and their gradients as well. In this reformulation it's easier to experiment with modifications of CTC.
# References on CTC regularization:
# "A Novel Re-weighting Method for Connectionist Temporal Classification", Li et al, https://arxiv.org/abs/1904.10619
# "Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets", Feng et al, https://www.hindawi.com/journals/complexity/2019/9345861/
# "Improved training for online end-to-end speech recognition systems", Kim et al, https://arxiv.org/abs/1711.02212
import torch
import torch.nn.functional as F
## generate example data
@vadimkantorov
vadimkantorov / timezone_localtime_to_cet.py
Last active March 9, 2024 13:56
Given a list of "lat,lng" prints a local timestamp (20h) in all of the timezones
# python -m pip install timezonefinder pytz --user
import timezonefinder
import pytz
import datetime
latlnglist = '''
43.0010092,41.0208743
42.9972303,41.0089412
43.0125911,40.9705287
42.9991332,41.0408331
@vadimkantorov
vadimkantorov / example_leaflet_openstreetmap.html
Last active March 9, 2024 11:43
Example of using LeafletJS map with OpenStreetMap tiles to display a list of events using circle markers and simple popups
<html><body>
<link href="https://tile.openstreetmap.org/{z}/{x}/{y}.png" id="link_tiles" />
<!--
<link rel="stylesheet" href="https://unpkg.com/leaflet@1.8.0/dist/leaflet.css"
integrity="sha512-hoalWLoI8r4UszCkZ5kL8vayOGVae1oxXe/2A4AO6J9+580uKHDO3JdHb7NzwwzK5xr/Fs0W40kiNHxM9vyTtQ=="
crossorigin=""/>
<script src="https://unpkg.com/leaflet@1.8.0/dist/leaflet.js"
@vadimkantorov
vadimkantorov / socialmediacard.py
Created February 28, 2024 16:00
Fetches meta/og social media tags from a URL (based on https://gist.github.com/vstoykov/6028987 and upgraded for python3)
# based on https://gist.github.com/vstoykov/6028987
# python socialmediacard.py 'https://meduza.io/feature/2024/02/28/ya-sdelayu-vse-chtoby-zlo-otstupilo-a-prekrasnoe-buduschee-prishlo'
import html.parser
import urllib.request
class SeoParser(html.parser.HTMLParser):
CONTENT_TAGS = ('p', 'h1', 'h2', 'h3', 'h4')
ALLOWED_INLINE_TAGS = ('b', 'u', 'strong', 'em', 'br')
@vadimkantorov
vadimkantorov / sitemap.py
Last active February 23, 2024 23:29
Print all URLs of a standardized XML sitemap
# https://sitemaps.org/protocol.html
import sys
import xml.dom.minidom
import urllib.request
def sitemapindex_urlset_concat(url):
sitemapindex = xml.dom.minidom.parse(urllib.request.urlopen(url))
for sitemap in sitemapindex.getElementsByTagName('sitemap'):
urlset = xml.dom.minidom.parse(urllib.request.urlopen(sitemap.getElementsByTagName('loc')[0].firstChild.nodeValue))
@vadimkantorov
vadimkantorov / readwiktionary.py
Last active February 19, 2024 21:49
Read Wiktionary dump in Python
# https://dumps.wikimedia.org/wikidatawiki/entities/ https://dumps.wikimedia.org/ruwiktionary/ https://dumps.wikimedia.org/ruwiktionary/20231201/
#
# wget -L https://dumps.wikimedia.org/wikidatawiki/entities/20231213/wikidata-20231213-lexemes.json.bz2 https://dumps.wikimedia.org/ruwiktionary/20231201/ruwiktionary-20231201-pages-meta-current.xml.bz2
# bzcat wikidata-20231213-lexemes.json.bz2 | wc -l # 1198580
# bzcat wikidata-20231213-lexemes.json.bz2 | head -n 2
# bzcat ruwiktionary-20231201-pages-meta-current.xml.bz2 | wc -l # 196257893
# bzcat ruwiktionary-20231201-pages-meta-current.xml.bz2 | head -n 100
# bzgrep '<page>' ruwiktionary-20231201-pages-meta-current.xml.bz2 | wc -l # 2814450
# time python3 readwiktionary.py ruwiktionary-20231201-pages-meta-current.xml.bz2 ruwiktionary-20231201-pages-meta-current.xml.bz2 # real 11m15.868s # user 9m36.938s # sys 0m5.656s
@vadimkantorov
vadimkantorov / perlin.py
Last active February 15, 2024 10:36
Perlin noise in PyTorch
# ported from https://github.com/pvigier/perlin-numpy/blob/master/perlin2d.py
import torch
import math
def rand_perlin_2d(shape, res, fade = lambda t: 6*t**5 - 15*t**4 + 10*t**3):
delta = (res[0] / shape[0], res[1] / shape[1])
d = (shape[0] // res[0], shape[1] // res[1])
grid = torch.stack(torch.meshgrid(torch.arange(0, res[0], delta[0]), torch.arange(0, res[1], delta[1])), dim = -1) % 1