Created
April 29, 2020 11:51
-
-
Save cereniyim/d1df215db5a7f64e1c59a1dcffef96f7 to your computer and use it in GitHub Desktop.
seacrh a given keyword in a feature
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def extract_features_from_description(df, | |
column_name, | |
new_feature_name, | |
extract_words): | |
# function to extract features from the column_name | |
# searches column_name feature for a given list | |
# ASSUMPTION: There is no NA values | |
# in the description feature | |
check_regex = (r'\b(?:{})\b' | |
.format('|' | |
.join( | |
map(re.escape, | |
extract_words)))) | |
df[new_feature_name] = (df[column_name] | |
.str | |
.contains(check_regex, | |
regex=True) | |
.astype('uint8')) | |
return df |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment