This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
aug_rf = RandomForestClassifier() | |
aug_rf.fit(X_aug_train, y_aug_train) | |
aug_rf_pred = aug_rf.predict(X_test) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sklearn.ensemble import RandomForestClassifier | |
base_rf = RandomForestClassifier() | |
base_rf.fit(X_raw_train, y_raw_train) | |
base_rf_pred = base_rf.predict(X_test) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
X_synth = synth_df.loc[:, synth_df.columns != "FraudFound"] | |
X_aug_train = X_raw_train.append(X_synth) | |
y_synth = synth_df["FraudFound"] | |
y_aug_train = y_raw_train.append(y_synth) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
num_to_gen = sum(y_raw_train == 0) - sum(y_raw_train == 1) | |
synth_df = synthesizer.sample(n_samples=num_to_gen, | |
condition_on={ | |
"FraudFound": { | |
"categories": [1] | |
} | |
}).to_pandas() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
synthesizer = RegularSynthesizer() | |
synthesizer.fit(data, metadata=metadata, | |
condition_on=["FraudFound"]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from ydata.connectors import LocalConnector | |
from ydata.connectors.filetype import FileType | |
from ydata.synthesizers.regular.model import RegularSynthesizer | |
from ydata.labs import DataSources | |
from ydata.metadata import Metadata | |
connector = LocalConnector() | |
data = connector.read_file(path='car_claims_training.csv', file_type=FileType.CSV) | |
metadata = Metadata(data) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
car_claims_training = X_raw_train.copy() | |
car_claims_training['FraudFound'] = y_raw_train | |
car_claims_training.to_csv('car_claims_training.csv', index=False) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sklearn.model_selection import train_test_split | |
X_raw = car_claims_prepared.loc[:, car_claims_prepared.columns != "FraudFound"] | |
y_raw = car_claims_prepared["FraudFound"] | |
X_raw_train, X_test, y_raw_train, y_test = train_test_split(X_raw, y_raw, test_size=0.1, random_state=1) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
car_claims_raw = pd.read_csv('car_claims_raw.csv') | |
car_claims_raw_temp = encode_fraud(car_claims_raw) | |
car_claims_prepared = encode_categorical(car_claims_raw_temp) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from category_encoders import TargetEncoder | |
def encode_categorical(df): | |
te = TargetEncoder() | |
cols = df.select_dtypes('object').columns | |
for col in cols: | |
df.loc[:, col] = te.fit_transform(X=df[col], y=df['FraudFound']) | |
return df |
NewerOlder