Skip to content

Instantly share code, notes, and snippets.

@noklam
Last active August 14, 2020 01:58
Show Gist options
  • Save noklam/737fb0eaf56afd8bf0def565b7aa57fc to your computer and use it in GitHub Desktop.
Save noklam/737fb0eaf56afd8bf0def565b7aa57fc to your computer and use it in GitHub Desktop.
Join nearest key for time series pandas dataframe
# # Using pandas merge_as_of (similar to a merge join) to do inexact join (join the nearest key instead)
import pandas as pd
# %%
table_a = pd.DataFrame([('2020-01-01'), ('2020-01-03'), ('2020-01-06')],
columns=['PK'])
table_b = pd.DataFrame([('2020-01-01', 'A'), ('2020-01-02', 'A'),
('2020-01-04', 'B'), ('2020-01-05', 'B')],
columns=['FK', 'Category'])
# %%
table_a['value'] = 10
# %%
"""
Perform an asof merge. This is similar to a left-join except that we
match on nearest key rather than equal keys.
Both DataFrames must be sorted by the key.
"""
# %%
table_a = table_a.sort_values('PK')
table_b = table_b.sort_values('FK')
table_a['PK'] = pd.to_datetime(table_a['PK'])
table_b['FK'] = pd.to_datetime(table_b['FK'])
pd.merge_asof(table_a, table_b, left_on='PK', right_on='FK', direction='backward')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment