Skip to content

Instantly share code, notes, and snippets.

@jnsprnw
Last active August 21, 2017 18:32
Show Gist options
  • Save jnsprnw/0f4b24e7d9f80a9b1fa6b22a6db65e77 to your computer and use it in GitHub Desktop.
Save jnsprnw/0f4b24e7d9f80a9b1fa6b22a6db65e77 to your computer and use it in GitHub Desktop.
Cross-reference values in OpenRefine based on multiple criteria

Cross-reference values in OpenRefine based on multiple columns

Let’s say you have these two tables:

SourceTable

name type age height
Kant Cat 2 50
Hegel Cat 2 10
Descartes Cat 5 30
Marx Dog 2 50
Nietzsche Dog 7 50

TargetTable

type age height
Dog 7 50
Cat 2 10

In your TargetTable you want to find the names for your two pets based on the table SourceTable. Commonly you would use this command to make a lookup based on the column type in TargetTable:

cell.cross("SourceTable", "type")[0].cells["name"].value

But by selecting only the first result ([0]) you would end up with the incorrect results since the age and height do not match. Instead you could use this command to filter through the results and exclude rows based on the conditions in and and then selecting the first result.

filter(
  cell.cross("SourceTable", "type"),
  row,
  and(
    row.cells["age"] == cells['age'],
    row.cells["height"] == cells['height']
  )
)[0].cells["name"].value

Of course, you could add more conditions to the and-function to check more columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment