Skip to content

Instantly share code, notes, and snippets.

@wpm
Last active May 31, 2023 10:57
Show Gist options
  • Save wpm/e4f4001e6252264dec28 to your computer and use it in GitHub Desktop.
Save wpm/e4f4001e6252264dec28 to your computer and use it in GitHub Desktop.
Pandas multi-table join
import pandas
"""
Join an arbitrary number of data frames, using a multi-index label for each data frame.
For example say you have three data frames each of which lists the classroom and
number of students a teacher has in a given period.
Classroom Students
Teacher
Mary 53A 19
John 99B 25
You want to combine them into a single data frame with a top level index indicating the period.
Period 1 Period 2 Period 3
Classroom Students Classroom Students Classroom Students
Teacher
Mary 53A 19 18B 27 36D 12
John 99B 25 23C 15 15B 30
Do this by placing each input data frame under a top level index, and then joining all the tables
using the reduce function.
"""
def merge_reduce(fs, label):
for i, f in enumerate(fs, 1):
f.columns = pandas.MultiIndex.from_tuples([("%s %d" % (label, i), c) for c in f.columns])
return reduce(lambda m, f: m.join(f), fs)
p1 = pandas.DataFrame(
data={"Teacher": ["Mary", 'John'], "Classroom": ['53A', '99B'], "Students": [19, 25]}).set_index("Teacher")
p2 = pandas.DataFrame(
data={"Teacher": ["Mary", 'John'], "Classroom": ['18B', '23C'], "Students": [27, 15]}).set_index("Teacher")
p3 = pandas.DataFrame(
data={"Teacher": ["Mary", 'John'], "Classroom": ['36D', '15B'], "Students": [12, 30]}).set_index("Teacher")
m = merge_reduce([p1, p2, p3], "Period")
print(m)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment