Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

View daudahmad's full-sized avatar
🎯
Focusing

Daud Ahmad daudahmad

🎯
Focusing
View GitHub Profile
@daudahmad
daudahmad / cloudSettings
Last active January 3, 2019 00:34
Visual Studio Code Settings Sync Gist
{"lastUpload":"2018-12-19T02:29:36.253Z","extensionVersion":"v3.2.4"}
@daudahmad
daudahmad / 0_reuse_code.js
Last active September 8, 2015 15:01
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
Cygwin and Windows git stopped playing nicely together after a recent Windows update. There’s a variety of recommendations for how to fix this this on StackOverflow and elsewhere, but this post actually makes the most sense. In a nutshell, msys-1.0.dll (installed into your Program Files\Git\bin directory) is not built to be position independent. Use the dll rebaser to get it to load at a new address, like so:
$ rebase.exe -b 0x50000000 msys-1.0.dll
# a useful sorting function to have, concatenates key and value of a tuple and sorts based on that, useful if keys are not unique
def sortFunction(tuple):
""" Construct the sort string (does not perform actual sorting)
Args:
tuple: (rating, MovieName)
Returns:
sortString: the value to sort with, 'rating MovieName'
"""
key = unicode('%.3f' % tuple[0])
value = tuple[1]
  • Business people meet the developers daily
  • Motivated teams
  • Self organizing teams, less micromanagement
  • Face-to-face conversations
  • Retrospective sessions to tune and adjust team behaviours
  • Team as a whole takes ownership to deliver
@daudahmad
daudahmad / lab3.md
Last active August 29, 2015 14:27 — forked from marianboda/lab3.md

version 1.0.3 #Spark Logo + Python Logo

Text Analysis and Entity Resolution

####Entity resolution is a common, yet difficult problem in data cleaning and integration. This lab will demonstrate how we can use Apache Spark to apply powerful and scalable text analysis techniques and perform entity resolution across two datasets of commercial products.

Entity Resolution, or "[Record linkage][wiki]" is the term used by statisticians, epidemiologists, and historians, among others, to describe the process of joining records from one data source with another that describe the same entity. Our terms with the same meaning include, "entity disambiguation/linking", duplicate detection", "deduplication", "record matching", "(reference) reconciliation", "object identification", "data/information integration", and "conflation".

Entity Resol