Skip to content

Instantly share code, notes, and snippets.

@michaelguia
Created March 6, 2017 22:26
Show Gist options
  • Save michaelguia/a87d76eb6722a90893f375bff87260f7 to your computer and use it in GitHub Desktop.
Save michaelguia/a87d76eb6722a90893f375bff87260f7 to your computer and use it in GitHub Desktop.
Polynomial features labeled in a dataframe
def PolynomialFeatures_labeled(input_df,power):
'''Basically this is a cover for the sklearn preprocessing function.
The problem with that function is if you give it a labeled dataframe, it ouputs an unlabeled dataframe with potentially
a whole bunch of unlabeled columns.
Inputs:
input_df = Your labeled pandas dataframe (list of x's not raised to any power)
power = what order polynomial you want variables up to. (use the same power as you want entered into pp.PolynomialFeatures(power) directly)
Ouput:
Output: This function relies on the powers_ matrix which is one of the preprocessing function's outputs to create logical labels and
outputs a labeled pandas dataframe
'''
poly = pp.PolynomialFeatures(power)
output_nparray = poly.fit_transform(input_df)
powers_nparray = poly.powers_
input_feature_names = list(input_df.columns)
target_feature_names = ["Constant Term"]
for feature_distillation in powers_nparray[1:]:
intermediary_label = ""
final_label = ""
for i in range(len(input_feature_names)):
if feature_distillation[i] == 0:
continue
else:
variable = input_feature_names[i]
power = feature_distillation[i]
intermediary_label = "%s^%d" % (variable,power)
if final_label == "": #If the final label isn't yet specified
final_label = intermediary_label
else:
final_label = final_label + " x " + intermediary_label
target_feature_names.append(final_label)
output_df = pd.DataFrame(output_nparray, columns = target_feature_names)
return output_df
output_df = PolynomialFeatures_labeled(input_df,2)
@jeffcav
Copy link

jeffcav commented Jul 28, 2023

Awesome, works perfectly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment