Skip to content

Instantly share code, notes, and snippets.

@c-bata
Last active January 11, 2023 02:26
Show Gist options
  • Save c-bata/16d654471393f59e064482ffb8b52bf8 to your computer and use it in GitHub Desktop.
Save c-bata/16d654471393f59e064482ffb8b52bf8 to your computer and use it in GitHub Desktop.

Summary

Optuna uses CachedStorage, a wrapper class of BaseStorage interface, since the API calls of BaseStorage interface tends to be expensive. However, the implementation of CachedStorage is too complex and it's actually not a wrapper of BaseStorage since we've introduced some private storage APIs for CachedStorage like storage._check_and_set_param_distribution() and storage._get_trials().

So I implemented a prototype of a new simple caching mechanism to remove CachedStorage from Optuna. The change is only about 50 lines, but it is more efficient in many situations than CachedStorage. https://github.com/optuna/optuna/compare/master...c-bata:add-simple-inmemory-cache?expand=1

For the benchmark of the new caching mechanism, I prepared the same benchmark scenario with optuna/optuna#1140

Benchmark Script

import math
import time
import optuna


def objective(trial: optuna.Trial) -> float:
    return sum([
        math.sin(trial.suggest_float('param-{}'.format(i), 0, math.pi * 2))
        for i in range(30)
    ])


if __name__ == "__main__":
    storage_url = "mysql+pymysql://optuna:password@127.0.0.1:3306/optuna"
    storage = optuna.storages.RDBStorage(storage_url)
    # storage = optuna.storages._CachedStorage(storage)

    sampler = optuna.samplers.TPESampler(seed=1)
    study = optuna.create_study(storage=storage, sampler=sampler)
    optuna.logging.set_verbosity(optuna.logging.ERROR)

    start = time.time()
    study.optimize(objective, n_trials=100)
    elapsed = time.time() - start
    print(f"Number of trials: {len(study.get_trials())}")
    print(f"Best params: {study.best_params}")
    print(f"Elapsed: {elapsed:.4f}s")

    optuna.delete_study(study_name=study.study_name, storage=storage)
  • Number of studies: 1
  • Number of trials: 100
  • Number of parameters: 30
  • Database: MySQL 8.0 on Docker ( $ docker run -d --rm -p 3306:3306 -e MYSQL_USER=optuna -e MYSQL_DATABASE=optuna -e MYSQL_PASSWORD=password -e MYSQL_ALLOW_EMPTY_PASSWORD=yes --name optuna-mysql mysql:8.0)

Please note that you need to comment out https://github.com/optuna/optuna/blob/master/optuna/storages/__init__.py#L45-L46 to check the performance without CachedStorage.

Benchmark Results

branch RDBStorage with CachedStorage RDBStorage without CachedStorage
master 81.8842s 368.3950s
add-simple-inmemory-cache 49.0378s 53.0252s

Please note that you need to remove these lines to check the performance without CachedStorage.

@c-bata
Copy link
Author

c-bata commented Jan 11, 2023

branch RDBStorage with CachedStorage RDBStorage without CachedStorage
v3.0.5 112.2084s 390.2305s
w/o all trials cache 99.5154s (x1.12 faster) 285.4613s (x1.36 faster)
w/ all trials cache 62.4925s (x1.54 faster) 90.9808s (x4.28 faster)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment