Skip to content

Instantly share code, notes, and snippets.

@EnigmaCurry
Created March 20, 2015 14:52
Show Gist options
  • Save EnigmaCurry/d53eccb55f5d0986c976 to your computer and use it in GitHub Desktop.
Save EnigmaCurry/d53eccb55f5d0986c976 to your computer and use it in GitHub Desktop.

Keep trunk ready to release all the time

  • Daily Release Candidate

    • Every night, tag a release in a TE branch, feed this tag into both functional and performance tests
  • Functional Tests

    • Nightly, run utest/dtest against the tagged RC
    • Provide a webpage for devs to input their current story branches to run functional regressions on, this should be done as soon as commits come in, or at least nightly. We can keep the existing IRC interface, but a webpage will be nicer for someone to view / add / remove things from.
    • Email notifications of regressions
    • Implement areas of bad coverage - https://docs.google.com/spreadsheets/d/1KgKIHgFxL0nGJqjfRE0kCpuWyjFpOHUtirGIRH1so-M/edit?usp=sharing
  • Performance Tests

    • Finish collecting the stress profiles from CASSANDRA-8503
    • Nightly, run stress profiles against the tagged RC. Test should be done at least on two different cluster types (blade-11 vs bdplab, ssd vs spinning)
    • Provide a webpage for devs to input their current branches to run nightly performance regressions on (assuming commits made.) Developers can opt-in to the same profiles that are run for the RC, or they should be able to define their own test definition using cstar_perf.
    • Email notifications of statistical regressions (some % deviation from baseline compared to last X runs)
  • Product owner

    • Twice weekly, product owner will check functional and performance metrics run by CassCI overnight. Notify devs of regressions. This is for things that the daily automatic notifications miss.
    • Identify flakey tests, create tickets to address them, bring it up to TE team in standup to discuss.
  • User oriented testing Test a cluster the way a user would under load

    • Validating client - CASSANDRA-8986
      • counters
      • lwt
      • ttl
      • topology aware CL.ONE queries for individual node assessment
    • cluster state tests
      • Fail
      • Replace node with node with different IP to detect stale state from old nodes
      • Restart entire cluster\
      • Repair - delete files to give repair real work
      • Failed disks
      • Partitions - unidirectional, bi-directional
      • Restart individual node, gracefully and not
      • Cleanup
      • Scrub
      • Prior version upgrade
      • Mixed version
      • Parse logs and report errors/warns and mark test failed
      • Exclusions for errors considered acceptable
      • Improve logging where appropriate
      • Every test ends with a full restart and CL replay followed by validation all data is present after CL replay
      • cqlsh and nodetool operate against cluster
      • Network blips
@belliottsmith
Copy link

The cluster should tested for the "user oriented" tests should (almost) always be initialised with a prior version of C*, with a random collection of parameters to sstable config. Most users upgrade, so we should be testing everything behaves correctly in this scenario, which is poorly covered by regression tests

Copy link

ghost commented Mar 20, 2015

I agree, I'm guessing upgrading from a previous minor version, as opposed to clean install, would be much more common. In fact a set of stables with all sorts of strange and wonderful data types and histories would be great. (That whole, won't start after an upgrade scenario is not fun)

@snazy
Copy link

snazy commented Mar 23, 2015

Just a nit in the spreadsheet: UFTest is another utest for UDF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment