Skip to content

Instantly share code, notes, and snippets.

@jrzaurin
jrzaurin / lightgbm_vs_dl_fb_comments.csv
Last active June 13, 2021 09:24
LightGBM vs DL for the Facebook comments volume dataset
model rmse r2 runtime best_epoch_or_ntrees
lightgbm 5.529 0.823 6.52 687.0
tabmlp 5.908 0.798 250.48 43.0
tabtransformer 5.926 0.797 533.39 27.0
tabresnet 6.214 0.777 70.47 9.0
tabnet 6.428 0.761 935.02 59.0
@jrzaurin
jrzaurin / fb_comments_tabtransformer.csv
Last active June 13, 2021 09:22
Results for the Facebook Comments Volume dataset with TabTransformer
We can make this file beautiful and searchable if this error is corrected: It looks like row 6 should actually have 28 columns, instead of 19. in line 5.
embed_dropout,full_embed_dropout,shared_embed,add_shared_embed,frac_shared_embed,input_dim,n_heads,n_blocks,dropout,ff_hidden_dim,transformer_activation,mlp_hidden_dims,mlp_activation,mlp_batchnorm,mlp_batchnorm_last,mlp_linear_first,with_wide,lr,batch_size,weight_decay,optimizer,lr_scheduler,base_lr,max_lr,div_factor,final_div_factor,n_cycles,val_loss_or_metric
0.0,False,False,False,8,16,2,4,0.1,,relu,None,relu,False,False,False,False,0.0005,1024,0.0,Adam,CyclicLR,0.0005,0.01,25,10000.0,10.0,33.0956
0.0,False,False,False,8,16,2,4,0.1,,relu,None,relu,False,False,False,False,0.0005,4096,0.0,AdamW,OneCycleLR,0.001,0.01,25,1000.0,5.0,33.1283
0.0,False,False,False,8,16,2,4,0.1,,relu,None,relu,False,False,False,False,0.001,1024,0.0,Adam,ReduceLROnPlateau,0.001,0.01,25,10000.0,5.0,33.2175
0.0,False,False,False,8,16,2,4,0.1,,relu,same,relu,False,False,False,False,0.001,1024,0.0,Adam,ReduceLROnPlateau,0.001,0.01,25,10000.0,5.0,33.4698
0.0,False,False,False,8,16,4,4,0.1,,relu,None,relu,False,False,False,False,0.001,10
@jrzaurin
jrzaurin / fb_comments_tabnet.csv
Last active June 13, 2021 09:22
Results for the Facebook Comments Volume dataset with Tabnet
n_steps step_dim attn_dim ghost_bn virtual_batch_size momentum gamma dropout embed_dropout lr batch_size weight_decay lambda_sparse optimizer lr_scheduler base_lr max_lr div_factor final_div_factor n_cycles val_loss_or_metric
5 16 16 False 128 0.98 1.5 0.0 0.0 0.03 512 0.0 0.0001 AdamW ReduceLROnPlateau 0.001 0.01 25 10000.0 5 35.8122
3 16 16 False 128 0.98 1.5 0.2 0.0 0.03 512 0.0 0.0001 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5 37.6417
5 16 16 False 128 0.98 1.5 0.0 0.0 0.03 512 0.0 0.0001 AdamW ReduceLROnPlateau 0.001 0.01 25 10000.0 5 38.9771
5 16 16 False 128 0.98 1.5 0.2 0.0 0.03 512 0.0 0.0001 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5 39.5899
5 16 16 False 128 0.98 1.5 0.0 0.0 0.03 256 0.0 0.0001 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5 40.9462
@jrzaurin
jrzaurin / fb_comments_tabresnet.csv
Last active June 13, 2021 09:22
Results for the Facebook Comments Volume dataset with TabResnet
blocks_dims blocks_dropout mlp_hidden_dims mlp_activation mlp_dropout mlp_batchnorm mlp_batchnorm_last mlp_linear_first embed_dropout lr batch_size weight_decay optimizer lr_scheduler base_lr max_lr div_factor final_div_factor n_cycles val_loss_or_metric
[100, 100, 100] 0.1 None relu 0.1 False False False 0.0 0.0005 512 0.0 Adam CyclicLR 0.0005 0.03 25 10000.0 10.0 34.4972
[100, 100, 100] 0.1 None relu 0.1 False False False 0.0 0.0005 512 0.0 AdamW CyclicLR 0.0005 0.03 25 10000.0 10.0 34.8520
[100, 100, 100] 0.1 None relu 0.1 False False False 0.0 0.0005 512 0.0 Adam CyclicLR 0.0005 0.03 25 10000.0 10.0 34.95044
[100, 100, 100] 0.1 None relu 0.1 False False False 0.0 0.0005 512 0.0 Adam CyclicLR 0.0005 0.01 25 10000.0 10.0 35.1667
[100, 100, 100] 0.1 None relu 0.1 False False False 0.0 0.0005 512 0.0 AdamW CyclicLR 0.0005 0.01 25 10000.0 10.0 35.2503
@jrzaurin
jrzaurin / fb_comments_tabmlp.csv
Last active June 13, 2021 09:21
Results for the Facebook Comments Volume dataset with TabMlp
mlp_hidden_dims mlp_activation mlp_dropout mlp_batchnorm mlp_batchnorm_last mlp_linear_first embed_dropout lr batch_size weight_decay optimizer lr_scheduler base_lr max_lr div_factor final_div_factor n_cycles val_loss_or_metric
[100,50] relu 0.1 False False True 0.0 0.001 512 0.0 RAdam ReduceLROnPlateau 0.001 0.01 25 10000.0 5.0 32.5931
[100,50] relu 0.1 False False False 0.0 0.001 512 0.0 RAdam ReduceLROnPlateau 0.001 0.01 25 10000.0 5.0 33.3515
[200, 100] relu 0.1 False False False 0.0 0.001 256 0.0 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5.0 33.4140
[200, 100] relu 0.1 False False False 0.1 0.001 256 0.0 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5.0 33.5679
[200, 100] relu 0.1 False False False 0.0 0.001 512 0.0 RAdam ReduceLROnPlateau 0.001 0.01 25 10000.0 5.0 33.6284
@jrzaurin
jrzaurin / lightgbm_vs_dl_nyc_taxi.csv
Last active June 13, 2021 09:20
LightGBM vs DL for the NYC tax trip duration dataset
model rmse r2 runtime best_epoch_or_ntrees
lightgbm 262.710 0.804 42.721 504.0
tabmlp 271.342 0.791 568.431 24.0
tabresnet 292.891 0.757 471.265 24.0
tabtransformer 336.582 0.679 5779.031 54.0
tabnet 376.053 0.599 1844.472 15.0
@jrzaurin
jrzaurin / nyc_taxi_tabtransformer.csv
Last active June 13, 2021 09:19
Results for the NYC Taxi rode duration dataset with TabTransformer
We can make this file beautiful and searchable if this error is corrected: It looks like row 6 should actually have 28 columns, instead of 16. in line 5.
embed_dropout,full_embed_dropout,shared_embed,add_shared_embed,frac_shared_embed,input_dim,n_heads,n_blocks,dropout,ff_hidden_dim,transformer_activation,mlp_hidden_dims,mlp_activation,mlp_batchnorm,mlp_batchnorm_last,mlp_linear_first,with_wide,lr,batch_size,weight_decay,optimizer,lr_scheduler,base_lr,max_lr,div_factor,final_div_factor,n_cycles,val_loss_or_metric
0.0,False,False,False,8,16,4,4,0.1,,relu,None,relu,False,False,False,False,0.01,1024,0.0,Adam,ReduceLROnPlateau,0.001,0.01,25,10000.0,5,180162.4086
0.0,False,False,False,8,16,4,4,0.1,,relu,None,relu,False,False,False,False,0.01,256,0.0,Adam,ReduceLROnPlateau,0.001,0.01,25,10000.0,5,186017.1888
0.0,False,False,False,8,16,4,4,0.1,,relu,None,relu,False,False,False,False,0.01,512,0.0,Adam,ReduceLROnPlateau,0.001,0.01,25,10000.0,5,196144.0674
0.0,False,False,False,8,32,8,4,0.4,,relu,None,relu,False,False,False,False,0.01,1024,0.0,Adam,ReduceLROnPlateau,0.001,0.01,25,10000.0,5,357869.3703
0.0,False,False,False,8,64,16,4,0.4,,relu,None,relu,False,False,False
@jrzaurin
jrzaurin / nyc_taxi_tabnet.csv
Last active June 13, 2021 09:18
Results for the NYC Taxi rode duration dataset with Tabnet
n_steps step_dim attn_dim ghost_bn virtual_batch_size momentum gamma dropout embed_dropout lr batch_size weight_decay lambda_sparse optimizer lr_scheduler base_lr max_lr div_factor final_div_factor n_cycles val_loss_or_metric
5 8 8 False 128 0.75 1.5 0.0 0.0 0.01 1024 0.0 0.0001 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5 144819.1190
5 8 8 False 128 0.98 1.5 0.0 0.0 0.01 1024 0.0 0.0001 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5 146057.8078
5 8 8 False 128 0.5 1.5 0.0 0.0 0.01 1024 0.0 0.0001 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5 146201.3771
5 16 16 False 128 0.98 1.5 0.0 0.0 0.01 1024 0.0 0.0001 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5 146461.7343
5 8 8 False 128 0.25 1.5 0.0 0.0 0.01 1024 0.0 0.0001 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5 148636.8888
@jrzaurin
jrzaurin / nyc_taxi_tabresnet.csv
Last active June 13, 2021 09:18
Results for the NYC Taxi rode duration dataset with TabResnet
blocks_dims blocks_dropout mlp_hidden_dims mlp_activation mlp_dropout mlp_batchnorm mlp_batchnorm_last mlp_linear_first embed_dropout lr batch_size weight_decay optimizer lr_scheduler base_lr max_lr div_factor final_div_factor n_cycles val_loss_or_metric
same 0.5 auto relu 0.2 False False False 0.0 0.01 2048 0.0 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5 97015.1182
same 0.2 auto relu 0.1 False False False 0.0 0.01 1024 0.0 AdamW ReduceLROnPlateau 0.001 0.01 25 10000.0 5 98266.4310
same 0.5 auto relu 0.2 False False False 0.0 0.04 2048 0.0 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5 100332.3569
same 0.2 auto relu 0.1 False False False 0.0 0.01 1024 0.0 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5 103006.5603
same 0.5 auto relu 0.2 False False False 0.0 0.01 2048 0.0 AdamW ReduceLROnPlateau 0.001 0.01 25 10000.0 5 105967.2628
@jrzaurin
jrzaurin / nyc_taxi_tabmlp.csv
Last active June 13, 2021 09:18
Results for the NYC Taxi rode duration dataset with TabMlp
mlp_hidden_dims mlp_activation mlp_dropout mlp_batchnorm mlp_batchnorm_last mlp_linear_first embed_dropout lr batch_size weight_decay optimizer lr_scheduler base_lr max_lr div_factor final_div_factor n_cycles val_loss_or_metric
auto relu 0.1 False False True 0.0 0.01 1024 0.0 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5.0 79252.7786
auto relu 0.1 False False True 0.0 0.01 1024 0.0 AdamW ReduceLROnPlateau 0.001 0.01 25 10000.0 5.0 79440.6025
auto relu 0.1 False False False 0.1 0.01 1024 0.0 Adam ReduceLROnPlateau 0.001 0.01 25 10000.0 5.0 79477.5653
auto relu 0.1 False False False 0.1 0.01 1024 0.0 AdamW ReduceLROnPlateau 0.001 0.01 25 10000.0 5.0 79710.8550
auto relu 0.1 False False False 0.0 0.01 1024 0.0 AdamW ReduceLROnPlateau 0.001 0.01 25 10000.0 5.0 80214.7197