Skip to content

Instantly share code, notes, and snippets.

@hmaarrfk
Created September 20, 2018 01:43
Show Gist options
  • Save hmaarrfk/75b74be9c6249f560799dfce5e99d665 to your computer and use it in GitHub Desktop.
Save hmaarrfk/75b74be9c6249f560799dfce5e99d665 to your computer and use it in GitHub Desktop.
Benchmarking for scikit-image

Writing benchmarks for scikit-image

If you are requesting a change to be made for the sake of performance, it helps to provide a benchmark showing the improvement of the algorithm in the usecase that you are considering.

To create a benchmark, you should.

  1. Install airspeed velocity
pip install asv

or

 conda install -c conda-forge asv 

from your development environment.

  1. Add a new file to the folder benchmarks. Here a short example that shows the structure of the file.
import numpy as np
from skimage.feature import greycomatrix
from skimage import img_as_ubyte


class GreyCoMatrixSuite:
    """Benchmark for the greycomatrix in scikit-image."""
    
    # All parameters combinations will be tests.
    params = [[(50, 50), (100, 100)],  # shape
              [True, False],           # symmetric
              [True, False],           # normed
             ]
              
    # These are friendly names that will appear on the graphs.
    param_names = ['shape', 'symmetric', 'normed']

    def setup(self, shape, symmetric, normed):
        # Unless you need random to show the performance of a
        # particula algorithm, it is probably fastest to
        # allocate the array as ``full``.
        # This ensures that the memory is directly available to
        # routine without continuously pagefaulting
        # in this case, I want to make sure that we are hitting all
        # combinations of distances in the covariance matrix.
        # self.image - np.full(shape, fill_value=1)
        self.image = img_as_ubyte(np.random.random(shape))

    # You need to include the shape parameter even if you don't use it
    # in your function
    def time_greycomatrix(self, shape, symmetric, normed):
        greycomatrix(self.image, [1], [0, np.pi/4, np.pi/2, 3*np.pi/4],
                     symmetric=symmetric, normed=normed)
  1. Run asv machine from within the skimage directory.
  2. Update the version of master in your cloned repository
git pull upstream master
  1. Run your benchmark in development mode to catch any errors
asv dev -b time_greycomatrix
  1. Run your benchmarks in your existing environment to track your progress
asv run -E existing -b time_greycomatrix
  1. Compare your results to those of the master branch
asv continuous -E conda:3.6 -b time_greycomatrix master HEAD

Creating the environments for the first time takes a long time.

@MattWenham
Copy link

  • Mention the filename at step 2 and that it will be used in steps 5-7.
  • Does the last def need to have the same name as the filename? I set the filename to benchmark_greycomatrix.py and the function to benchmark_greycomatrix and got a No benchmarks selected error. Setting both to time_greycomatrix worked as per your example.
  • You may want to add a warning that multiple environments are created in the .asv folder which can quickly take up multiple GB...!

@hmaarrfk
Copy link
Author

  1. I don't think the filename matters so much. It does some funny pattern detection. I think GreyCoMatrixSuite would work and GreyCoMatrixSuite.time_greycomatrix
  2. Filename of the benchmark file doesn't matter so much. Follow the project convention I guess.
  3. Man, maybe that is why my computer was crashing. scikit-image project has too many dependencies. I just use run with existing as to not create huge projects. Admitidly, I was benchmarking numpy which is much lighter.

@MattWenham
Copy link

My .asv folder is currently 10.9 GB (11,756,526,196 bytes), which I can get down to 7.77 GB (8,351,432,704 bytes) with NTFS compression. Not a trivial amount...

@hmaarrfk
Copy link
Author

I'm not sure it actually takes up that much space. They are mostly hard links and windows has a hard time detecting that they don't all take a distinct amount of space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment