Skip to content

Instantly share code, notes, and snippets.

@luisprox
Last active April 21, 2021 01:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save luisprox/76087afb8d6930a202bcc03f90ee119e to your computer and use it in GitHub Desktop.
Save luisprox/76087afb8d6930a202bcc03f90ee119e to your computer and use it in GitHub Desktop.
[Combining dataframes with pandas] Summary of pandas merge, join and concat functions to combine data #python #pandas #dataanalysis

Merge

The same as SQL join operation.

How

Select the type of merge through how parameter.

inner

This is the default option. Resulting dataframe will have only the rows with same values in the selected key column. If more than one key column is selected, all selected keys must be the same for the row to be selected.

outer

Resulting dataframe will have all rows from both dataframes. Key column(s) with different values will remain, and empty fields will receive NaN values.

left

Resulting dataframe will have all rows from left dataframe, while discarding non matching rows from the right dataframe.

right

Resulting dataframe will have all rows from right dataframe, while discarding non matching rows from the left dataframe.

Key Columns

Select which columns will be the keys through on parameter.

default

All columns with the same name will be used as keys.

select specific columns

Specify columns names as a list to the on parameter. Columns with the same names that are not specified will be renamed with _x and _y sufixes.

Join

The same as a left merge on indices. If you set key columns as indices, join will result the same as a left merge on those columns. It is more efficient and faster than the equivalent merge command.

Key Columns

The left dataframe key column is always the index. You can select the key columns from the right dataframe with on parameter.

Concat

Concatenation is like "stitching" or appending two dataframes.

axis

rows concatenation

Concatenation is done through appending rows by default. The rows from one dataframe are "appended" to the other. If there are different columns in each dataframe, NaN values will be filled.

columns concatenation

To concatenate columns, select option axis=1. In this case, .concat() will "append" adtitional columns to the rows. Empty fields from different rows in each dataframe will be filled with NaN

how

outer

Concatenation will preserve all data by default, as a set union, which is the join='outer' option.

inner

To eliminate data that does not have a match, as a set intersection, use the option join='inner'.

Append

There is an .append() function which is the same as a default .concat() with default valus (outer join over row axis).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment