Created
March 6, 2017 22:26
-
-
Save michaelguia/a87d76eb6722a90893f375bff87260f7 to your computer and use it in GitHub Desktop.
Polynomial features labeled in a dataframe
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def PolynomialFeatures_labeled(input_df,power): | |
'''Basically this is a cover for the sklearn preprocessing function. | |
The problem with that function is if you give it a labeled dataframe, it ouputs an unlabeled dataframe with potentially | |
a whole bunch of unlabeled columns. | |
Inputs: | |
input_df = Your labeled pandas dataframe (list of x's not raised to any power) | |
power = what order polynomial you want variables up to. (use the same power as you want entered into pp.PolynomialFeatures(power) directly) | |
Ouput: | |
Output: This function relies on the powers_ matrix which is one of the preprocessing function's outputs to create logical labels and | |
outputs a labeled pandas dataframe | |
''' | |
poly = pp.PolynomialFeatures(power) | |
output_nparray = poly.fit_transform(input_df) | |
powers_nparray = poly.powers_ | |
input_feature_names = list(input_df.columns) | |
target_feature_names = ["Constant Term"] | |
for feature_distillation in powers_nparray[1:]: | |
intermediary_label = "" | |
final_label = "" | |
for i in range(len(input_feature_names)): | |
if feature_distillation[i] == 0: | |
continue | |
else: | |
variable = input_feature_names[i] | |
power = feature_distillation[i] | |
intermediary_label = "%s^%d" % (variable,power) | |
if final_label == "": #If the final label isn't yet specified | |
final_label = intermediary_label | |
else: | |
final_label = final_label + " x " + intermediary_label | |
target_feature_names.append(final_label) | |
output_df = pd.DataFrame(output_nparray, columns = target_feature_names) | |
return output_df | |
output_df = PolynomialFeatures_labeled(input_df,2) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you very much for this function. it is vert helpful