Skip to content

Instantly share code, notes, and snippets.

@pryce-turner
Last active August 14, 2023 18:02
Show Gist options
  • Save pryce-turner/298ab8bb7f8bb7ee1b2507dc068c938e to your computer and use it in GitHub Desktop.
Save pryce-turner/298ab8bb7f8bb7ee1b2507dc068c938e to your computer and use it in GitHub Desktop.
Testing Tasks in Flyte

Testing Tasks in Flyte

Motivation

There are a million ways to test code (and a million names to call those test types, often overlapping). Coacervate has come a long way and been through a few rewrites (when does a project earn a version number??) since I wrote my first tests. After settling on Flyte and modularizing things properly instead of cramming everything into a hacky docker image, it came down to write my first tests... again... hopefully for the last time.

Flyte offers a few utilities to make mocking external services more ergonomic. There are also some great examples of unit tests in Flytekit's own repo that leverage the ubiquitous pytest framework. While these are great for writing tests covering more granular functionality of a codebase, I chose to write e2e/integration tests first. The main reasons for this are two-fold:

  1. The more foundational tasks of my workflow are much less likely to change when compared to the smaller utility functions that support them. I know what goes in and what needs to come out, testing what glues them together can come later.
  2. It was easier! I wrote a fair amount of mocking functions over the course of developing the workflow so I wouldn't have to rerun the whole thing to test one step. Although these functions themselves will need tests eventually, they came in handy when defining inputs and outputs.

Local or Remote?

The next question regarding the tests that cover the heavy-lifting in my workflow was: can I run these locally? Despite being a container-first framework, Flyte offers excellent local development options for a faster iteration cycle such as abstracting away the object store to your /tmp folder and allowing you to pass workflow args right in on the command-line.

The only downside to this approach is that your dependencies must be present in your local python virtual-environment and non-python dependencies have to exist in your PATH. The Coacervate Requestor depends on a few binaries, jar files, and an appropriate Java runtime. I could install these to my local machine, but why reinvent the wheel? I have a perfectly good image I can use with all of that baked in. Overhead be damned, instead of dealing with "it works on my machine", these tests are going to be closer to production behavior.

Strategy

Leveraging Flyte's Subworkflows I settled on the strategy detailed below:

Alongside my workflows dir I have a tests dir that contains a test_step.py for every major step of my workflow. Here's test_genotype.py:

from flytekit import workflow
from run.tasks.calling import genotype
from run.tasks.utils import get_dir, dir_to_vcfs
from run.tests.helpers import compare_dirs
from run import config

@workflow
def test_genotype_wf():
    db_dir = get_dir(dirpath='s3://my-s3-bucket/test-assets/combine-region-expected')
    actual = genotype(db_dir=db_dir, reg='chr21', ref_loc=config['reference_location'])
    expected = get_dir(dirpath='s3://my-s3-bucket/test-assets/genotype-expected')
    equivalent = compare_dirs(actual=actual, expected=expected)

Not much to it. I pull in some imports, most notably the task being tested, some helpers, and define the workflow. All the workflow is doing is pulling in some inputs, running the task in isolation, and comparing it to some expected outputs. The get_dir convenience function is present throughout the project, translating str paths in the object store to FlyteDirectory at the task boundary. Of note is also compare_dirs, one of a handful of comparison tasks that evaluate the consistency of the outputs. Here's it's code:

@task(container_image=config['current_image'])
def compare_dirs(actual: FlyteDirectory, expected: FlyteDirectory) -> bool:
    actual.download()
    expected.download()
    return len(filecmp.dircmp(actual, expected).diff_files) == 0

It's just a very basic Flyte-flavored wrapper around Python's built-in filecmp library; returning True if no files differ between expected and actual directories.

As for the aforementioned Subworflows, I collect each of the workflows from each test_step.py module into one test-runner called register.py:

from flytekit import workflow
from run.tests.test_index_cram import test_index_cram_wf
from run.tests.test_split_cram import test_split_cram_wf
from run.tests.test_golem_call_variants import test_golem_call_variants_wf
from run.tests.test_combine_region import test_combine_region_wf
from run.tests.test_genotype import test_genotype_wf
from run.tests.test_gather_vcfs import test_gather_vcfs_wf

@workflow
def registered():
    test_index_cram_wf()
    test_split_cram_wf()
    test_golem_call_variants_wf()
    test_combine_region_wf()
    test_genotype_wf()
    test_gather_vcfs_wf()

The imports are pretty verbose, and there are more parsimonious ways of naming all these things, but the end result looks good in the management GUI and I can easily comment out tests I'm not focusing on to debug something problematic.

image

Closing Thoughts

There are many more tests to write, especially those covering small, but oh-so-crucial, glue functions holding all the big movers together. Luckily I'll be able to run those locally using pytest so I can blast them out quickly with every small change. However, having these bigger ones squared away with known inputs and outputs lets me change the task signature with the confidence that nothing is going awry.

Prior to Flyte, I managed my workflows with Snakemake, which I love. However, as I moved towards orchestrating things with K8s, it made sense to transition to something that natively supports it. Abstracting files away to an object store was a tricky change at first, but I've found it's now a huge bonus, especially with writing these tests. I had to make some awkward choices before while writing my tests, to get my test assets on my local filesystem to most closely resemble the production layout. However, with Minio in the background, I can just point any given test to any given asset and it doesn't care what's above or below it. Another benefit here is I can just tar up the test assets prefix, version it, and unpack it anywhere I want to run these tests.

As I mentioned at the top, there's a million ways to test your code. This may not be the absolute best approach, but I think it will fit nicely into a larger testing suite. I want to quickly shout-out Jay Ganbat for his thoughts in the Flyte slack when I first started mulling over this stuff. As always, comments and constructive criticism are encouraged below! Please let me know if you find this useful and/or how you would improve it. Happy testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment