Skip to content

Instantly share code, notes, and snippets.

@rusty1s
Last active January 27, 2023 10:02
Show Gist options
  • Save rusty1s/ea5bf941ef623748e224c3265ed38e8b to your computer and use it in GitHub Desktop.
Save rusty1s/ea5bf941ef623748e224c3265ed38e8b to your computer and use it in GitHub Desktop.
[Community Sprint] Improving Code Coverage

We are kicking off our third community sprint!

This community sprint resolves around improving test coverage across the PyG code base. Currently, our tests cover 85.68% of all code in PyG. The goal of the community sprint is to bump this number into the high 90s (and to get yourself more familiar with the various parts of the code base).

The sprint begins Friday Janurary 27th and will last 2 weeks. If you are interested in helping out, please also join our PyG slack channel #community-sprint-code-coverage for more information.

🚀 Improving Code Coverage

Example

Take a look at the current code coverage report of PyG. For example, we can see that we never test the copy() function of the InMemoryDataset class, see here:

As such, we create a test_in_memory_dataset_copy() test function in test/data/test_dataset.py to add a corresponding test:

def test_in_memory_dataset_copy():
    data_list = [Data(x=torch.randn(5, 16)) for _ in range(4)]
    dataset = MyTestDataset(data_list)

    copied_dataset = dataset.copy()
    # Tests that actually do a copy:
    assert id(copied_dataset) != id(dataset)

    # Test that the copied dataset holds the same objects:
    assert len(copied_dataset) == len(dataset) == 4
    
    # Tests that the data is identical:
    for copied_data, data in zip(copied_dataset, dataset):
        assert torch.equal(copied_data.x, data.x)

Furthermore, we see in the code coverage report that copy() utilizes different code paths, depending on whether the dataset should be filtered before copying. As such, we test this functionality as well:

def test_in_memory_dataset_copy():
    ...
    
    copied_dataset = dataset.copy([1, 2])
    assert len(copied_dataset) == 2
    assert torch.equal(copied_dataset[0].x, data_list[1].x)
    assert torch.equal(copied_dataset[1].x, data_list[2].x)

We can check that everything works by running pytest test/data/test_dataset.py -k test_in_memory_dataset_copy:

test/data/test_dataset.py .

========================== 1 passed, 9 deselected in 0.07s =========================

Guide to contributing

See here for a basic example to follow.

  1. Ensure you have read our contributing guidelines.
  2. Claim the test you want to improve. More information on this will follow soon.
  3. Implement the test changes as in pyg-team/pytorch_geometric#6523. For this, look closely at the parts of a model and function you want to cover. Think about test cases that would increase the coverage. If you stumble upon a bug in untested code paths, try to fix the bug on your own, create a GitHub issue or discuss it with us in our PyG slack channel #community-sprint-code-coverage.
  4. Open a PR to the PyG repository and name it: "[Code Coverage] {model_name/function_name}". Afterwards, add your PR number to the "Improved code coverage" line in CHANGELOG.md.

Tips for making your PR

  • If you are unfamiliar with how the current test pipeline works, you can read more about it here. We use pytest to run all tests.
  • The corresponding tests of PyG models and functions can be found in the test/ directory. For example, tests for torch_geometric/utils/isolated.py can be found in test/utils/test_isolated.py. You can run individual test files via pytest test/utils/test_isolated.py. You can run individual test functions via pytest test/utils/test_isolated.py -k test_contains_isolated_nodes.
  • You can use @pytest.mark.parametrize('arg_name', [1, 2, 3]) to test different configurations inside your test. See here for an example.
  • There exists special test decorators for testing in torch_geometric/testing/decorators.py, e.g., to only run with specific packages installed via the @withPackage('networkx') decorator.
  • For code paths that are nearly impossible to test, consider adding a # pragma: no cover comment, e.g., @overload routines

Tests to update

This list may be incomplete. If you still find a function with missing code coverage, please let us know or add them on your own.

  • data/lightning/datamodule.py
  • data/data.py
  • data/datapipes.py
  • data/dataset.py
  • data/feature_store.py
  • data/graph_store.py
  • data/hetero_data.py
  • data/storage.py
  • data/temporal.py
  • data/view.py
  • explain/explainer.py
  • explain/gnn_explainer.py
  • explain/attention_explainer.py
  • explain/captum_explainer.py
  • explain/pg_explainer.py
  • loader/dataloader.py
  • loader/imbalanced_sampler.py
  • loader/temporal_dataloader.py
  • loader/neighbor_sampler.py
  • profile/profile.py
  • profile/profiler.py
  • profile/utils.py
  • sampler/utils.py
  • utils/assortativity.py
  • utils/convert.py
  • utils/loop.py
  • utils/negative_sampling.py
  • utils/num_nodes.py
  • transforms/compose.py
  • transforms/add_self_loops.py
  • transforms/largest_connected_components.py
  • transforms/linear_transformation.py
  • transforms/random_link_split.py
  • transforms/remove_traning_classes.py
  • transforms/sign.py
  • transforms/svd_feature_reduction.py
  • transforms/to_device.py
  • transforms/gcn_norm.py
  • nn/conv/appnp.py
  • nn/conv/eg_conv.py
  • nn/conv/fa_conv.py
  • nn/conv/gen_conv.py
  • nn/conv/hetero_conv.py
  • nn/conv/pna_conv.py
  • nn/conv/rgat_conv.py
  • nn/conv/transformer_conv.py
  • nn/aggr/scaler.py
  • nn/dense/linear.py
  • nn/models/correct_and_smooth.py
  • nn/models/dimenet.py
  • nn/models/label_prob.py
  • nn/models/rect.py
  • nn/models/schnet.py
  • nn/models/tgn.py
  • nn/inits.py
  • experimental.py
  • home.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment