This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def extract_feature_means(audio_file_path: str) -> pd.DataFrame: | |
| # config settings | |
| number_of_mfcc = c.NUMBER_OF_MFCC | |
| # 1. Importing 1 file | |
| y, sr = librosa.load(audio_file_path) | |
| # Trim leading and trailing silence from an audio signal (silence before and after the actual audio) | |
| signal, _ = librosa.effects.trim(y) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| REGION="europe-west1" | |
| ZONE="europe-west1-b" | |
| TEMPLATE_ID="download_production_table" | |
| dev_dataproc_assets_bucket="gs://your-dataproc-assets-bucket/production/" | |
| dev_project=your-gcp-project-id | |
| upload_assets: | |
| gsutil cp main.py ${dev_dataproc_assets_bucket} --region ${REGION} --project ${dev_project} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| REGION=europe-west1 | |
| ZONE=europe-west1-b | |
| CLUSTER_NAME=dev-cluster | |
| SERVICE_ACCOUNT=your_service_account_name@your-gcp-project.iam.gserviceaccount.com | |
| BUCKET_NAME=your-dataproc-staging-bucket | |
| gcloud dataproc clusters create ${CLUSTER_NAME} \ | |
| --region ${REGION} \ | |
| --zone ${ZONE} \ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| jobs: | |
| - pysparkJob: | |
| args: | |
| - dataset | |
| - entity_name | |
| - gcs_output_bucket | |
| - materialization_gcp_project_id | |
| - materialization_dataset | |
| - output_parquet | |
| - is_partitioned, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pyspark.sql.functions import * | |
| from pyspark.context import SparkContext | |
| from pyspark.sql.session import SparkSession | |
| import sys | |
| YES_TOKEN = "Yes" | |
| sc = SparkContext.getOrCreate() | |
| spark = SparkSession(sc) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #integer and string parameters, used with hp.choice() | |
| bootstrap_type = [{'bootstrap_type':'Poisson'}, | |
| {'bootstrap_type':'Bayesian', | |
| 'bagging_temperature' : hp.loguniform('bagging_temperature', np.log(1), np.log(50))}, | |
| {'bootstrap_type':'Bernoulli'}] | |
| LEB = ['No', 'AnyImprovement'] #remove 'Armijo' if not using GPU | |
| grow_policy = [ | |
| {'grow_policy':'SymmetricTree'}, | |
| # {'grow_policy':'Depthwise'}, | |
| {'grow_policy':'Lossguide', |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def ensemble_search(params): | |
| X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=22) | |
| model = EnsembleModel(params) | |
| evaluation = [(X_test, y_test)] | |
| model.fit(X_train, y_train, | |
| eval_set=evaluation, | |
| early_stopping_rounds=100, verbose=False) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| class EnsembleModel: | |
| def __init__(self, params): | |
| """ | |
| LGB + XGB + CatBoost model | |
| """ | |
| self.lgb_params = params['lgb'] | |
| self.xgb_params = params['xgb'] | |
| self.cat_params = params['cat'] | |
| self.lgb_model = LGBMClassifier(**self.lgb_params) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ###### log transform these columns ########## | |
| log_cols = {'cont5':'log', 'cont8':'log', 'cont7':'log'} | |
| train_copy = FW.FE_transform_numeric_columns(train_copy, log_cols) | |
| test_copy = FW.FE_transform_numeric_columns(test_copy, log_cols) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ### create groupby aggregates of the following numerics | |
| agg_nums = ['cont1','cont3'] | |
| groupby_vars = ['cat2','cat4'] | |
| train_add, test_add = FW.FE_add_groupby_features_aggregated_to_dataframe(train[agg_nums+groupby_vars], | |
| agg_types=['mean','std'], | |
| groupby_columns=groupby_vars, | |
| ignore_variables=[] , test=test[agg_nums+groupby_vars]) | |
| # join the dataframes with the aggregated features to the main training and testing set dataframes | |
| train_copy = train.join(train_add.drop(groupby_vars+agg_nums, axis=1)) |
NewerOlder