Skip to content

Instantly share code, notes, and snippets.

@jzajpt
Last active May 18, 2020 10:54
Show Gist options
  • Save jzajpt/7fde372470faad636422a7c2396b2d8a to your computer and use it in GitHub Desktop.
Save jzajpt/7fde372470faad636422a7c2396b2d8a to your computer and use it in GitHub Desktop.

Programming Exercise - Grouping

The goal of this exercise is to identify rows in a CSV file that may represent the same person based on a provided Matching Type (definition below).

The resulting program should allow us to test at least three matching types:

  • one that matches records with the same email address
  • one that matches records with the same phone number
  • one that matches records with the same email address OR the same phone number

Guidelines

  • Only use code that you have license to use
  • Don't hesitate to ask us any questions to clarify the project

Resources

CSV Files

Three sample input files are included. All files should be successfully processed by the resulting code.

Matching Type

A matching type is a declaration of what logic should be used to compare the rows.

For example: A matching type named same_email might make use of an algorithm that matches rows based on email columns.

Interface

At a high level, the program should take two parameters. The input file and the matching type.

Output

The expected output is a copy of the original CSV file with the unique identifier of the person each row represents prepended to the row.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment