Skip to content

Instantly share code, notes, and snippets.

View vmarkovtsev's full-sized avatar

Vadim Markovtsev vmarkovtsev

View GitHub Profile
@vmarkovtsev
vmarkovtsev / draw_clones.py
Last active August 29, 2015 14:07
clonedigger CPD XML visualization
import os
import sys
import matplotlib
matplotlib.use('cairo')
from matplotlib import pyplot
from matplotlib.colors import LinearSegmentedColormap
import numpy
from scipy.cluster.hierarchy import linkage, leaves_list
import xmltodict
Большая подборка иконочных веб-шрифтов (free) и их генераторов.
https://www.google.com/design/icons/
(Лицензия: CC BY 4.0)
https://linearicons.com/free
(Лицензия: Custom)
https://octicons.github.com/
@vmarkovtsev
vmarkovtsev / ml_sapi_usecases.md
Last active August 21, 2017 17:34
ML Spark API usecases

Domains

First of all. ML has two quite different activity domains:

  1. Running something on many repositories.
  2. Running something on a single repository

Depending on the size of (2), it makes or does not make sense to launch Spark. For example, consider the topic model application scenario:

Recently, GitHub introduced the change in how atx headers are parsed in Markdown files.

##Wrong

Correct

While this change follows the spec, it breaks many existing repositories. I took the README dataset which we created at source{d} and ran a simple

@vmarkovtsev
vmarkovtsev / identifier_split.py
Created May 26, 2017 10:39
Identifier splitting algorithm from the paper "Topic modeling of public repositories at scale using names in source code"
import re
NAME_BREAKUP_RE = re.compile(r"[^a-zA-Z]+")
def extract_names(token):
token = token.strip()
prev_p = [""]
def ret(name):
r = name.lower()
if len(name) >= 3:
@vmarkovtsev
vmarkovtsev / hercules.go
Created September 27, 2018 08:08
Imports from internal packages
package hercules
import (
"gopkg.in/src-d/go-git.v4"
"gopkg.in/src-d/go-git.v4/plumbing/object"
"gopkg.in/src-d/hercules.v4/internal/core"
"gopkg.in/src-d/hercules.v4/internal/plumbing"
"gopkg.in/src-d/hercules.v4/internal/plumbing/identity"
"gopkg.in/src-d/hercules.v4/internal/plumbing/uast"
"gopkg.in/src-d/hercules.v4/internal/yaml"
@vmarkovtsev
vmarkovtsev / config.json
Last active March 29, 2019 17:31
GitLab developer experience clusters
{
"embeddings": [
{
"tensorName": "GitLab developer experience clusters",
"tensorShape": [
173,
258
],
"tensorPath": "https://gist.githubusercontent.com/vmarkovtsev/4bd05fdd761e49a7bbbca87381e65c9c/raw/38801e9740529ac2c9ac7884fc3e8623bb37caad/data.tsv",
"metadataPath": "https://gist.githubusercontent.com/vmarkovtsev/4bd05fdd761e49a7bbbca87381e65c9c/raw/38801e9740529ac2c9ac7884fc3e8623bb37caad/meta.tsv"
@vmarkovtsev
vmarkovtsev / id2vec_legacy.ipynb
Created October 6, 2017 14:11
Source code identifier embeddings - legacy demonstration
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
from typing import Tuple, Union
import numpy as np
from skimage.draw import line_aa
import tensorflow as tf
def create_motion_blur_kernel(dim: int, angle: float) -> tf.Tensor:
# Define a disk which contains the dim x dim square
radius = np.sqrt(0.5 * dim**2)
import matplotlib.pyplot as plt
plt.rcParams["image.cmap"] = "hot"
# 25x25 kernel pointing straight to the east
dim = 25
kernel = np.squeeze(create_motion_blur_kernel(dim, (90/180) * np.pi).numpy()[:, :, 0, 0])
plt.contourf(np.linspace(-dim//2, dim//2, dim), np.linspace(-dim//2, dim//2, dim), kernel)
plt.colorbar()
plt.show()