David Zhao DavidykZhao

## multi-face.ipynb

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                DavidykZhao
                / multi-face.ipynb
            
            
              Created
              November 21, 2020 19:55
                — forked from yang-zhang/multi-face.ipynb
            
              
                Multi-task Deep Learning Experiment using fastai Pytorch
              
          
      Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## ensemble_search.py
class ensemble_search:
    def __init__(self, X_train, y_train, X_test, y_test,
                 size_pop=20, epochs=5, verbose=True):

        self.X_train = X_train
        self.y_train = y_train
        self.X_test = X_test
        self.y_test = y_test
        self.size_pop = size_pop
        self.epochs = epochs

## LightGBM Bookmarks
Detailed Information about LGBM Parameters

https://medium.com/@pushkarmandot/https-medium-com-pushkarmandot-what-is-lightgbm-how-to-implement-it-how-to-fine-tune-the-parameters-60347819b7fc

LGBM Hpyerparameter Optimisation and Visualisation

https://github.com/WillKoehrsen/hyperparameter-optimization/blob/master/Bayesian%20Hyperparameter%20Optimization%20of%20Gradient%20Boosting%20Machine.ipynb

https://www.kaggle.com/willkoehrsen/intro-to-model-tuning-grid-and-random-search

## Pyspark.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                DavidykZhao
                / Pyspark.md
            
            
              Last active
              September 9, 2020 13:26
            
              
                [Pyspark related] #Pyspark
              
          
    Pyspark


Trivia knowledge:

Some simple configs to put at the beginning of the notebook for expedition of execution
Check number of cores within the cluster programatically
A good systematic reference source I found useful most of times:
%sh allows us to execute shell commands on the driver
Magic Command: %run
Dbutils.widget


dbutils.fs.ls(..)


## rapids-colab.ipynb

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                DavidykZhao
                / rapids-colab.ipynb
            
            
              Created
              August 29, 2020 20:54
                — forked from gumdropsteve/rapids-colab.ipynb
            
              
                Script to Install RAPIDS in Google Colab
              
          
      Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## cross_validate_xgboost.py
def cross_validate_xgboost(train_data, train_output,
                           n_folds, param_grid,
                           type_dict,
                           fixed_param_dict = {'objective': 'binary:logistic', 'eval_metric': ['auc']},
                           metric_func_dict = {'auc': sklearn.metrics.roc_auc_score},
                           other_metrics_dict = None, keep_data = True, **kwargs):

    """
    Perform k-fold cross-validation with xgboost hyperparameters
    Get the average performance across folds and save all of the results

## r-to-python-data-wrangling-basics.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                DavidykZhao
                / r-to-python-data-wrangling-basics.md
            
            
              Created
              December 26, 2019 21:47
                — forked from conormm/r-to-python-data-wrangling-basics.md
            
              
                R to Python: Data wrangling with dplyr and pandas
              
          
    R to python data wrangling snippets

The dplyr package in R makes data wrangling significantly easier.
The beauty of dplyr is that, by design, the options available are limited.
Specifically, a set of key verbs form the core of the package.
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).
dplyr is organised around six key verbs:
	class ensemble_search:
	def __init__(self, X_train, y_train, X_test, y_test,
	size_pop=20, epochs=5, verbose=True):

	self.X_train = X_train
	self.y_train = y_train
	self.X_test = X_test
	self.y_test = y_test
	self.size_pop = size_pop
	self.epochs = epochs
	Detailed Information about LGBM Parameters

	https://medium.com/@pushkarmandot/https-medium-com-pushkarmandot-what-is-lightgbm-how-to-implement-it-how-to-fine-tune-the-parameters-60347819b7fc

	LGBM Hpyerparameter Optimisation and Visualisation

	https://github.com/WillKoehrsen/hyperparameter-optimization/blob/master/Bayesian%20Hyperparameter%20Optimization%20of%20Gradient%20Boosting%20Machine.ipynb

	https://www.kaggle.com/willkoehrsen/intro-to-model-tuning-grid-and-random-search
	def cross_validate_xgboost(train_data, train_output,
	n_folds, param_grid,
	type_dict,
	fixed_param_dict = {'objective': 'binary:logistic', 'eval_metric': ['auc']},
	metric_func_dict = {'auc': sklearn.metrics.roc_auc_score},
	other_metrics_dict = None, keep_data = True, **kwargs):

	"""
	Perform k-fold cross-validation with xgboost hyperparameters
	Get the average performance across folds and save all of the results