Created
March 29, 2016 19:16
-
-
Save bridgethillyer/4cb1d572adb7acb1975713b7ce015219 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Jepsen | |
* client w/# of threads running on test machine | |
* nemesis - orchestration of the faults on the nodes | |
* operations that.... | |
onyx-peers.jobs.basic-test | |
* jobs | |
* no nemesis | |
* close | |
* read ledger back | |
* invoke client manually on each client | |
* run checker on the events | |
- which properties have held over the test | |
* write to bookkeeper ledgers | |
* close the ledgers | |
* wait for completion of jobs | |
* read peer log back | |
* pass full history | |
Checker | |
* takes test-setup, peer-config, #peers, #jobs | |
* look at history of all events run on client | |
* is that history valid? | |
* plays back log | |
* count # peers on replica | |
* check pulse of peers in Zookkeeper | |
* check invariants to make sure they are all true | |
* job-invariants | |
- check whether ther was an exception for the job | |
- loks up all ledgers that were read back | |
- checks results that they are from the correct job | |
- so segment did not go through the wrong job | |
- values that were written were the ones that were read | |
- the Checker knows which invariatns should be applied per job | |
onyx-jepsen.onyx-basic-test | |
* :random-halves nemesis (covered in blog) | |
* :awake-ms - ms peers will be connected before partitions occcur | |
* generator | |
- generator to set frequency at which jobs are submitted | |
- staggered at 1/10 s - so bursts, then nothing | |
- start-stop-nemesis-seq - partitions halves, stops, starts, etc | |
* jepsen-test | |
- os, client, generator, nemesis, etc | |
- checker is always the same (adds different invariants based on the job) | |
onyx-jepsen.onyx-client | |
* sets up client with all these parameters (ledger-handle, etc) | |
* 5 clients, see different events, but have a full history of what happened to the cluster | |
* client gets client events | |
* nemesis gets nemesis events | |
* end | |
- heal network | |
- read ledger | |
- read jobs? | |
- then Checker | |
onyx-jepsen.simple-job | |
* build-job | |
- each client has ledger id | |
- one thread writing to one ledger | |
- when a client sees a submit-job event, it builds the job based on the client ledger | |
- generate a job based on those ledgers | |
- add-read-ledgers | |
- write ledger per client | |
- one task per ledger - with ledger id (eg. read-ledger-5) | |
- creates links in workflow | |
* simple-job will end up looking like the visualization in the blog post | |
* 5 clients that can write to bookkeeper ledger, can create jobs | |
- then read linear job history | |
- checker checks | |
Orchestration | |
* docker images customized for Onyx with Zookkeeper installed | |
* 5 nodes | |
* adding to container saves time in test setup | |
* upload uberjarred peers | |
* run-peer script | |
- starts jar | |
- launch-prod-peers entrypoint | |
- with aeron settings to make sure aeron doesn't die during startup | |
onyx-jepsen.onyx-aggregation-test | |
* also add a segment with a unique ID | |
- random age and event ID | |
* add window-job | |
* build-window-state-job | |
- window on annotate job which is global | |
- uses conj aggregation | |
- task has a uniqueness key | |
- shouldn't see a segment twice, but should see all segments | |
- peer partitioned by - segment should be seen on new peer | |
- trigger - every one element - only written at :job-completed event | |
- add entry to bookkeeper ledger | |
* window-state-job-invariants | |
- at playback, look for bookkeeper entries, finds trigger ledger | |
- does it have all my results? only once? | |
- then all the invariants from the simple test | |
Future work: | |
* granular trigger work | |
* improve the testing of the windows | |
* does it do the trigger - only once? | |
Outputs | |
* store directory - the directory per test, timestamped | |
* copy any logs as well as stdout | |
* then (by hand) add issues to onyx-issues-log.txt along with teimestamp, directories for context | |
- jepsen history: results.edn - all kinds of info in there | |
- onyx.log - use timestamps to correlate | |
- flight recorder files | |
Testing workflow | |
* ./scripts/start-containers.sh | |
- starts docker-in-docker | |
* then run the test script (in README) | |
* wait for it to finish | |
* it reports success/failure | |
* then /store directory is shared with localhost | |
- so look at those files to see what happened | |
* future: maybe make the test running a single step, which would help with CI |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment