Skip to content

Instantly share code, notes, and snippets.

View marianboda's full-sized avatar

Marian Boda marianboda

  • Bratislava, Slovakia
View GitHub Profile
@marianboda
marianboda / .gitconfig
Last active July 21, 2021 16:05
dotfiles
[alias]
st = status -s
ls = log --pretty=format:"%C(yellow)%h%Cred%d\\ %Creset%s%Cblue\\ [%cn]" --decorate
ll = log --pretty=format:"%C(yellow)%h%Cred%d\\ %Creset%s%Cblue\\ [%cn]" --decorate --numstat
lds = log --pretty=format:"%C(yellow)%h\\ %ad%Cred%d\\ %Creset%s%Cblue\\ [%cn]" --decorate --date=short
le = log --oneline --decorate
git config --global alias.lo 'log --date=format:"%Y-%m-%d %H:%M" --pretty=format:"%ad %C(yellow)%h%Creset %an: %Cblue%s%Creset"'
@marianboda
marianboda / rfcount.sh
Created May 3, 2016 06:33
Recursively count files for every directory inside current dir
find . -maxdepth 1 -type d -print0 | xargs -0 -I {} sh -c 'echo $(find "{}" | wc -l) \\t "{}"' | sort -rn | less

version 1.0.3 #Spark Logo + Python Logo

Text Analysis and Entity Resolution

####Entity resolution is a common, yet difficult problem in data cleaning and integration. This lab will demonstrate how we can use Apache Spark to apply powerful and scalable text analysis techniques and perform entity resolution across two datasets of commercial products.

Entity Resolution, or "[Record linkage][wiki]" is the term used by statisticians, epidemiologists, and historians, among others, to describe the process of joining records from one data source with another that describe the same entity. Our terms with the same meaning include, "entity disambiguation/linking", duplicate detection", "deduplication", "record matching", "(reference) reconciliation", "object identification", "data/information integration", and "conflation".

Entity Resol

@marianboda
marianboda / .bash_profile
Created February 23, 2015 20:29
OS X .bash_profile
if [ -x /usr/bin/tput ] && tput setaf 1 >& /dev/null; then
c_git_clean=$(tput setaf 2)
c_git_dirty=$(tput setaf 1)
c_git_semi_dirty=$(tput setaf 3)
c_reset=$(tput sgr0)
else
c_git_clean=
c_git_dirty=
c_reset=
c_git_semi_dirty=
a =
name: 'n1'
items: [
{
name: 'n1.1'
items: [
{name: 'n1.1.1', items: []}
{
name: 'n1.1.2', items: [
{ name: 'n1.1.2.1', items: []}

Front-end build setup

Tools used

  • [node.js][node] - JavaScript Runtime
  • [Gulp][gulp] - Task Runner
  • [Bower][bower] - Package manager

Configure environment