Skip to content

Instantly share code, notes, and snippets.

@fmnobar
Created March 21, 2023 02:20
Show Gist options
  • Save fmnobar/e81e4846e8de6521e861fca16171b92d to your computer and use it in GitHub Desktop.
Save fmnobar/e81e4846e8de6521e861fca16171b92d to your computer and use it in GitHub Desktop.
Bayesian Optimization Step-by-Step Implementation
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "354a3185-9c30-4eac-8511-4d661e989c0e",
"metadata": {},
"source": [
"# Bayesian Optimization - Intro and Implementation From Scratch\n",
"\n",
"## Step 1 - Import Libraries\n",
"\n",
"This step is straightforward. We start with importing some of the necessary libraries, as follows:\n",
"\n",
"- `numpy` for numerical computing, which is one of the usual suspects in Data Science\n",
"- `scipy.stats` is a library for statistical functions\n",
"- `load_iris` is a function from `scikit-learn` that loads the Iris dataset\n",
"- `GaussianProcessRegressor` is a class from `scikit-learn` that implements a Gaussian process regression model\n",
"- `Matern` is a class from `scikit-learn` that implements the Matern kernel for the Gaussian process\n",
"\n",
"```\n",
"import numpy as np\n",
"import scipy.stats as sps\n",
"from sklearn.datasets import load_iris\n",
"from sklearn.gaussian_process import GaussianProcessRegressor\n",
"from sklearn.gaussian_process.kernels import Matern\n",
"```\n",
"\n",
"With our libraries imported, let's go ahead and define the objective function. \n",
"\n",
"## Step 2: Define Objective Function\n",
"\n",
"The objective function takes a set of hyperparameters parameters of C and gamma as input and returns the negative accuracy of a support vector classifier with RBF kernel on the Iris dataset. C is the regularization parameter and gamma is the kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. Details of the kernel coefficient are not critical for our process and can be found [here](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html). Then, the Iris dataset is loaded using `load_iris` and we split the data into train and test sets. Once the data is ready, a support vector classifier is trained and the negative accuracy on the test set is returned.\n",
"\n",
"```\n",
"def objective(params):\n",
" C, gamma = params\n",
" X, y = load_iris(return_X_y=True)\n",
" np.random.seed(0)\n",
" indices = np.random.permutation(len(X))\n",
" X_train = X[indices[:100]]\n",
" y_train = y[indices[:100]]\n",
" X_test = X[indices[100:]]\n",
" y_test = y[indices[100:]]\n",
" from sklearn.svm import SVC\n",
" clf = SVC(C=C, gamma=gamma)\n",
" clf.fit(X_train, y_train)\n",
" return -clf.score(X_test, y_test)\n",
"\n",
"```\n",
"\n",
"## Step 3: Define Parameter Bounds\n",
"\n",
"At this step, we define the boundaries of the search space for the hyperparameters. We create a NumPy array bounds of shape (2,2), where each row corresponds to a hyperparameter and each column corresponds to the lower and upper bound of that hyperparameter. In our case, the first hyperparameter is C and the second one is gamma, both of which are used in training the support vector classifier. \n",
"\n",
"The purpose of setting bounds is to restrict the hyperparameter search space to avoid testing values that are unlikely to be optimal and to focus the optimization on the most promising regions of the hyperparameter space. We defined the boundaries randomly for this exercise but in tasks where the range of hyperparameters is known, this becomes an important factor. \n",
"\n",
"```\n",
"bounds = np.array([[1e-3, 1e3], [1e-5, 1e-1]])\n",
"\n",
"```\n",
"\n",
"## Step 4: Define Acquisition Function\n",
"\n",
"This step defines the acquisition function that we had discussed before and determines the next point to be evaluated in the search space. In this specific example, the acquisition function is the Expected Improvement (EI) function. It measures the expected improvement in the objective function over the current best observation, given the current surrogate model (Gaussian process). The acquisition function is defined as follows:\n",
"\n",
"1. The Gaussian process predicts the mean and standard deviation at the point x using `gp.predict()`\n",
"2. The function finds the best objective function value observed so far (`f_best`)\n",
"3. Calculates the improvement over `f_best` as `improvement = f_best - mu`\n",
"4. Calculates the standard score `Z = improvement / sigma` if sigma is positive, or sets Z to 0 if sigma is 0\n",
"5. Uses the cumulative distribution function of the standard normal distribution (`sps.norm.cdf`) and the probability density function (`sps.norm.pdf`) to calculate the expected improvement (`ei`) at point x\n",
"6. Returns the expected improvement\n",
"\n",
"```\n",
"def acquisition(x):\n",
" mu, sigma = gp.predict(x.reshape(1,-1), return_std=True)\n",
" f_best = np.min(y_samples)\n",
" improvement = f_best - mu\n",
" with np.errstate(divide='warn'):\n",
" Z = improvement / sigma if sigma > 0 else 0 \n",
" ei = improvement * sps.norm.cdf(Z) + sigma * sps.norm.pdf(Z) \n",
" ei[sigma == 0.0] == 0.0 \n",
" return ei\n",
"```\n",
"\n",
"## Step 5: Initialize Samples and Surrogate Function\n",
"\n",
"Before starting the Bayesian optimization loop, we need to initialize the Gaussian process surrogate model with some initial samples. As discussed previously, the surrogate function is used to approximate the unknown objective function in order to optimize it efficiently. The Gaussian process is a probabilistic model that defines a prior over functions. It allows for Bayesian inference to be used to update the model as new data is acquired. Specifically, `x_samples` are initial points randomly sampled from the search space defined by the bounds array. The `y_samples` are the corresponding objective function evaluations of these initial points. These samples are used to train the Gaussian process and improve its surrogate modeling.\n",
"\n",
"The steps we take here are:\n",
"\n",
"1. Wet the number of initial samples (`n_init_samples`) to 5\n",
"2. Generate `n_init_samples` random samples from the parameter space defined by bounds using `np.random.uniform` (bounds[:, 0] and bounds[:, 1] extract the lower and upper bounds for each parameter, respectively). The size parameter specifies the shape of the output array, which is (n_init_samples, bounds.shape[0]) in this case. This generates an array of shape (n_init_samples, 2) where each row represents a set of parameters\n",
"\n",
"3. Evaluate the objective function objective(x) for each set of parameters in `x_samples` using a list comprehension and store the results in the array `y_samples`\n",
"\n",
"4. Initialize a Gaussian process regressor with a Matern kernel with parameter `nu=2.5`, which is a smoothness parameter that controls the flexibility of the model. Matern kernel is a popular choice for Bayesian optimization due to its ability to handle noisy and discontinuous functions\n",
"\n",
"```\n",
"# Initialize samples and surrogate function (Gaussian process with Matern kernel)\n",
"n_init_samples = 5 \n",
"x_samples = np.random.uniform(bounds[:, 0], bounds[:, 1], size=(n_init_samples, bounds.shape[0])) \n",
"y_samples = np.array([objective(x) for x in x_samples]) \n",
"kernel = Matern(nu=2.5) \n",
"gp = GaussianProcessRegressor(kernel=kernel)\n",
"\n",
"```\n",
"\n",
"## Step 6: Run Bayesian Optimization Loop\n",
"\n",
"We have finally arrived at the Bayesian optimization loop. In this step, the Bayesian optimization loop is run for a specified number of iterations (`n_iter`). In each iteration, the Gaussian process model is updated with the existing samples (i.e. `x_samples` and `y_samples`), using the gp.fit() method. Then, the next sample to be evaluated by the objective function is chosen by optimizing the acquisition function using a large number of random points (i.e. `x_random_points`), generated from the parameter space. The acquisition function is evaluated at each of these points and the point with the maximum acquisition function value is chosen as the next sample (i.e. `x_next`). The acquisition function value at this point is recorded as the `best_acq_value`. Finally, the objective function is evaluated at the chosen point and the resulting value is added to the existing samples by updating `x_samples` and `y_samples`. The process is repeated for the specified number of iterations (i.e. `n_iter`) and the results for each iteration are printed.\n",
"\n",
"```\n",
"# Run Bayesian optimization loop for n_iter iterations \n",
"n_iter = 10 \n",
"for i in range(n_iter): \n",
" # Update Gaussian process with existing samples \n",
" gp.fit(x_samples, y_samples) \n",
"\n",
" # Find the next sample by optimizing acquisition function \n",
" x_next = None \n",
" best_acq_value = -np.inf \n",
" \n",
" # Sample a large number of random points from the parameter space \n",
" n_random_points=10000 \n",
" x_random_points=np.random.uniform(bounds[:, 0], bounds[:, 1], size=(n_random_points,bounds.shape[0])) \n",
"\n",
" # Evaluate acquisition function at each point and find the maximum \n",
" acq_values=np.array([acquisition(x) for x in x_random_points]) \n",
" max_acq_index=np.argmax(acq_values) \n",
" max_acq_value=acq_values[max_acq_index] \n",
"\n",
" if max_acq_value > best_acq_value: \n",
" best_acq_value=max_acq_value \n",
" x_next=x_random_points[max_acq_index] \n",
"\n",
" print(f\"Iteration {i+1}: next sample is {x_next}, acquisition value is {best_acq_value}\") \n",
"\n",
" # Evaluate objective function at next sample and add it to existing samples \n",
" y_next=objective(x_next) \n",
" x_samples=np.vstack((x_samples,x_next)) \n",
" y_samples=np.append(y_samples,y_next)\n",
"\n",
"```\n",
"\n",
"## Step 7: Print Results\n",
"\n",
"Lastly, we print the best parameters and the best accuracy found during the Bayesian optimization loop. The best parameters are the ones that correspond to the minimum value of the objective function, which is why `np.argmin` is used to find the index of the sample with the minimum value of `y_samples`.\n",
"\n",
"```\n",
"# Print final results \n",
"best_index=np.argmin(y_samples) \n",
"best_x=x_samples[best_index] \n",
"best_y=y_samples[best_index] \n",
"\n",
"print(f\"Best parameters: C={best_x[0]}, gamma={best_x[1]}\") \n",
"print(f\"Best accuracy: {-best_y}\")\n",
"```\n",
"\n",
"Code block below puts everything together."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "e33ce0e0-2b3e-472a-b2ae-9da1ec88888f",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/mafarzad/opt/anaconda3/lib/python3.9/site-packages/sklearn/gaussian_process/kernels.py:430: ConvergenceWarning: The optimal value found for dimension 0 of parameter length_scale is close to the specified upper bound 100000.0. Increasing the bound and calling fit again may find a better value.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Iteration 1: next sample is [0.07469936 0.02543147], acquisition value is [1.16428264e-05]\n",
"Iteration 2: next sample is [3.81199483e+02 9.68627918e-02], acquisition value is [0.01092942]\n",
"Iteration 3: next sample is [9.99977952e+02 5.18619413e-02], acquisition value is [0.0022596]\n",
"Iteration 4: next sample is [3.31927803e+02 4.15319881e-03], acquisition value is [0.00178856]\n",
"Iteration 5: next sample is [7.24678205e+02 9.39208240e-02], acquisition value is [0.00227819]\n",
"Iteration 6: next sample is [4.66016062e+02 9.90568419e-02], acquisition value is [0.00112711]\n",
"Iteration 7: next sample is [7.83905693e+02 9.69585414e-02], acquisition value is [0.00054979]\n",
"Iteration 8: next sample is [6.56939305e+02 2.96978687e-03], acquisition value is [0.00039799]\n",
"Iteration 9: next sample is [9.00367551e+02 9.83276538e-02], acquisition value is [0.00026291]\n",
"Iteration 10: next sample is [4.23032560e+02 9.57563609e-03], acquisition value is [0.00024746]\n",
"Best parameters: C=576.1577582605024, gamma=0.05920827270787119\n",
"Best accuracy: 0.94\n"
]
}
],
"source": [
"# Bayesian optimization from scratch, Iris data set\n",
"\n",
"# Import libraries\n",
"import numpy as np\n",
"import scipy.stats as sps\n",
"from sklearn.datasets import load_iris\n",
"from sklearn.gaussian_process import GaussianProcessRegressor\n",
"from sklearn.gaussian_process.kernels import Matern\n",
"\n",
"# Define objective function (accuracy of a classifier)\n",
"def objective(params):\n",
" # Unpack parameters\n",
" C, gamma = params\n",
" \n",
" # Load iris data set\n",
" X, y = load_iris(return_X_y=True)\n",
" \n",
" # Split data into train and test sets\n",
" np.random.seed(0)\n",
" indices = np.random.permutation(len(X))\n",
" X_train = X[indices[:100]]\n",
" y_train = y[indices[:100]]\n",
" X_test = X[indices[100:]]\n",
" y_test = y[indices[100:]]\n",
" \n",
" # https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html\n",
" # Train a support vector classifier with RBF kernel\n",
" # C: regularization parameter, gamma: kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’\n",
" from sklearn.svm import SVC\n",
" clf = SVC(C=C, gamma=gamma)\n",
" clf.fit(X_train, y_train)\n",
" \n",
" # Return the negative accuracy on test set\n",
" return -clf.score(X_test, y_test)\n",
"\n",
"# Define bounds for parameters\n",
"bounds = np.array([[1e-3, 1e3], [1e-5, 1e-1]])\n",
"\n",
"# Define acquisition function (expected improvement)\n",
"def acquisition(x):\n",
" # Evaluate surrogate function (Gaussian process) at x\n",
" mu, sigma = gp.predict(x.reshape(1,-1), return_std=True)\n",
" \n",
" # Find the best observed value so far\n",
" f_best = np.min(y_samples)\n",
"\n",
" # Calculate improvement over best value\n",
" improvement = f_best - mu\n",
" \n",
" # Calculate expected improvement under normal distribution assumption \n",
" with np.errstate(divide='warn'):\n",
" Z = improvement / sigma if sigma > 0 else 0 \n",
" ei = improvement * sps.norm.cdf(Z) + sigma * sps.norm.pdf(Z) \n",
" ei[sigma == 0.0] == 0.0 \n",
" return ei\n",
"\n",
"# Initialize samples and surrogate function (Gaussian process with Matern kernel)\n",
"n_init_samples = 5 \n",
"x_samples = np.random.uniform(bounds[:, 0], bounds[:, 1], size=(n_init_samples, bounds.shape[0])) \n",
"y_samples = np.array([objective(x) for x in x_samples]) \n",
"kernel = Matern(nu=2.5) \n",
"gp = GaussianProcessRegressor(kernel=kernel)\n",
"\n",
"# Run Bayesian optimization loop for n_iter iterations \n",
"n_iter = 10\n",
"for i in range(n_iter): \n",
" # Update Gaussian process with existing samples \n",
" gp.fit(x_samples, y_samples) \n",
"\n",
" # Find the next sample by optimizing acquisition function \n",
" x_next = None \n",
" best_acq_value = -np.inf \n",
" \n",
" # Sample a large number of random points from the parameter space \n",
" n_random_points=10000 \n",
" x_random_points=np.random.uniform(bounds[:, 0], bounds[:, 1], size=(n_random_points,bounds.shape[0])) \n",
"\n",
" # Evaluate acquisition function at each point and find the maximum \n",
" acq_values=np.array([acquisition(x) for x in x_random_points]) \n",
" max_acq_index=np.argmax(acq_values) \n",
" max_acq_value=acq_values[max_acq_index] \n",
"\n",
" if max_acq_value > best_acq_value: \n",
" best_acq_value=max_acq_value \n",
" x_next=x_random_points[max_acq_index] \n",
"\n",
" print(f\"Iteration {i+1}: next sample is {x_next}, acquisition value is {best_acq_value}\") \n",
"\n",
" # Evaluate objective function at next sample and add it to existing samples \n",
" y_next=objective(x_next) \n",
" x_samples=np.vstack((x_samples,x_next)) \n",
" y_samples=np.append(y_samples,y_next)\n",
"\n",
"# Print final results \n",
"best_index=np.argmin(y_samples) \n",
"best_x=x_samples[best_index] \n",
"best_y=y_samples[best_index] \n",
"\n",
"print(f\"Best parameters: C={best_x[0]}, gamma={best_x[1]}\") \n",
"print(f\"Best accuracy: {-best_y}\")"
]
},
{
"cell_type": "markdown",
"id": "13c3f86a-cac0-4640-8ae2-58a40fa32d8c",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment