Skip to content

Instantly share code, notes, and snippets.

View daudahmad's full-sized avatar
🎯
Focusing

Daud Ahmad daudahmad

🎯
Focusing
View GitHub Profile
@daudahmad
daudahmad / 0_reuse_code.js
Last active September 8, 2015 15:01
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
@daudahmad
daudahmad / lab3.md
Last active August 29, 2015 14:27 — forked from marianboda/lab3.md

version 1.0.3 #Spark Logo + Python Logo

Text Analysis and Entity Resolution

####Entity resolution is a common, yet difficult problem in data cleaning and integration. This lab will demonstrate how we can use Apache Spark to apply powerful and scalable text analysis techniques and perform entity resolution across two datasets of commercial products.

Entity Resolution, or "[Record linkage][wiki]" is the term used by statisticians, epidemiologists, and historians, among others, to describe the process of joining records from one data source with another that describe the same entity. Our terms with the same meaning include, "entity disambiguation/linking", duplicate detection", "deduplication", "record matching", "(reference) reconciliation", "object identification", "data/information integration", and "conflation".

Entity Resol