giacomocavalieri/test_ideas.md

## test_ideas.md

      
    Raw
  

              test_ideas.md
            
          
    Some ideas on test and assert

I'm just trying to put this down to clear my mind and get a better feeling of how it could work.
There may be some new ideas or horrible takes, you've been warned!
What this could look like

test as a keyword can be used to define any top level function f : fn() -> Nil that
can be auto discovered and run by gleam test.
Assertions in the form of let assert or assert could be carried out both directly
inside the test function or inside some helpers that can later be called by a test function.
Example 1

// Inside the module "basic_math_tests"
// Simple test
test one_plus_one_is_two() {
  assert 2 == 1 + 2
}
Possible output:
✗ basic_math_tests > one_plus_one_is_two
  ↳ assert 2 == 3

  1 │ test one_plus_one_is_two() {
  2 │   assert 2 == 1 + 2
    │   ^^^^^^ failing assertion 
    ┆
In this simple cases the test and assert really shines: it gives you a clear picture of what values were compared and shows the piece of failing code.
Example 2

// Example from the stdlib tests
test reverse() {
  assert [] == [] |> list.reverse
  assert [1] == [1] |> list.reverse
  assert [2, 1] == [2, 1] |> list.reverse 
  assert [3, 2, 1] == [1, 2, 3] |> list.reverse

  // TCO test
  list.range(0, recursion_test_cycles)
  |> list.reverse
}
Possible output:
✗ list_test > reverse
  ↳ assert [2, 1] == [1, 2]

  1 │ test reverse() {
    ┆
  3 │   assert [2, 1] == [2, 1] |> list.reverse
    │   ^^^^^^ failing assertion
    ┊
Same as before, it's really nice to have the compared values and a clear picture of the failing assertion. The only gripe I could have is that the expected value is left in the assertion to play nicely with the pipe.
Example 3

test single_grapheme_operators() {
  [
    #("+", token.Plus),
    #("-", token.Minus),
    #("*", token.Minus),
    #("/", token.Slash),
  ]
  |> list.map2(first_token_is)
}

fn first_token_is(source: String, expected token: Token) {
  let lexer = lexer.new(source)
  let assert Ok(next_token, _) = lexer |> lexer.next
  assert token == next_token
}
Example output:
✗ lexer_test > single_grapheme_operators
  ↳ assert token.Minus == token.Star

     ┆
  10 │ fn first_token_is(source: String, expected token: Token) {
     ┆
  14 │   assert token == next_token
     │   ^^^^^^ failing assertion
     ┊
Once again we have a clear indication of the failing test single_grapheme_operator and an indication of where the test failed. However, things start to break down a little since we lose the context of which invocation of first_token_is failed. Manually tracking back to the source of the error by looking at the asserted values could be really tedious and time consuming.
As Louis pointed out assert could also print more useful information such as concrete values, here are some rules that I think could greatly improve the test's output:

If there is a function call / data constructor in the assertion and it is passed some non literal arguments, their values could be displayed
The concrete values of the arguments passed to the function where the assertion takes place could be displayed

So in this case the output could become:
✗ lexer_test > single_grapheme_operators
  ↳ failed calling first_token_is("*", token.Minus)
    assert token.Minus == token.Star

     ┆
  10 │ fn first_token_is(source: String, expected token: Token) {
     ┆
  14 │   assert token == next_token
     │   ^^^^^^ failing assertion
     ┊
Now we also have information about the concrete function call that resulted in a failure so it is much easier to find the culprit for the failing test.
To showcase rule 2. imagine this example:
test test_function() {
    test_helper(fn(x, y) { Bar(1) })
}

fn test_helper(f) {
  // Some elaborate setup
  assert Foo(1, 2, 3) = f(x, y)
  // Some elaborate teardown 
}
Here the output could be:
✗ module/name > test_function
  ↳ failed calling test_helper(<function>)
    assert Foo(1, 2, 3) == Bar(1)

     ┆
   5 │ fn test_helper(f) {
     ┆
  20 │   assert Foo(1, 2, 3) == f(x, y)
     │   ^^^^^^ failing assertion
     ┊

    f was called with
    x = [1, 2]
    y = "baz"
Having the concrete values of x and y can be really helpful to debug a failing test and avoid adding a lot of io.debug() and navigate through a messy output to see what went wrong.
Example 4

// A test that calls some helpers that carry out the assertions
test db_queries() {
  // Reuse the same connection throughout multiple tests
  use connection <- with_dummy_connection
  test_query_1(connection)
  test_query_2(connection)
  test_query_3(connection)
}

fn with_dummy_connection(cont: fn(sql.Connection) -> Nil) -> Nil {
    let connection = ...
    cont(connection)
}

fn test_query_1(connection) {
  let assert Ok(result) = connection |> sql.run("query")
  // Any other kind of assertion on the result...
}

// the other test_query functions I couldn't be bothered sketching out
Here the output could be:
✗ sql_test > db_queries
  ↳ failed calling test_query_2(<connection printed as a string>)
    let assert Ok(result) == Error(QuerySyntaxError(...))

     ┆
  15 │ fn test_query_1(connection) {
  16 │   let assert Ok(result) == connection |> sql.run("query")
     │   ^^^^^^^^^^ failing assertion
     ┊

    sql.run was called with
    connection = <connection printed as a string>

// Here the second argument to sql.run (i.e. "query") is not shown since
// it is a literal value already shown in the source preview
The possibility to also use let assert to pattern match on a structure and still get nice error messages and diffs would be really sweet.
These examples play out kind of nicely but I still have some open questions:

What happens if the assertions are in some deeply nested functions? Just showing the last function where the assertion failed may not be helpful. In all these examples the asserting function is simply called and doesn't appear in  a deeply nested function call stack
What happens with assertions in the user code that may fail during the test execution? Are those failures to be treated and displayed like the failed assertions shown above?

Interaction with a possible panic annotation

I really like Hayleigh proposal for a @panic annotation to mark assertions so here I'm just jotting down some ideas on how it could interact with tests to change the displayed output.
Let's consider once again the third example but with a little twist:
test single_grapheme_operators() {
  [
    #("+", token.Plus),
    #("-", token.Minus),
    #("*", token.Minus),
    #("/", token.Slash),
  ]
  |> list.map2(first_token_is)
}

fn first_token_is(source: String, expected token: Token) {
  let lexer = lexer.new(source)

  @panics("The lexer should have lexed at least one token")
  let assert Ok(next_token, _) = lexer |> lexer.next

  @panics("The lexer lexed a different token from what was expected")
  assert token == next_token
}
That could be used to enrich the failing test output with some explanatory message to get a better feeling of what the assertion was trying to assert:
✗ lexer_test > single_grapheme_operators
  ↳ failed calling first_token_is("*", token.Minus)
    assert token.Minus == token.Star

     ┆
  10 │ fn first_token_is(source: String, expected token: Token) {
     ┆
  14 │   assert token == next_token
     │   ^^^^^^ The lexer lexed a different token from what was expected
     ┊
Concurrent/sequential test running

My idea is that any test function is assumed to be independent from any other and can be run concurrently. A way to force sequential execution would simply be to wrap many asserting function calls inside a single test function like I showed in example 4.
I don't know if more complex use cases would require something more advanced but to me it looks like a sensible behviour and also not so hard to explain.
Test organization

To sum up, a complete gleam test report could be broken down in modules:
✓ string_test
  ✓ count
  ✓ is_empty
  ✓ pop_grapheme
  ✓ reverse

✗ list_test
  ✓ append
  ✓ fold
  ✗ reverse
  ✓ zip


✗ list_test > reverse
  ↳ assert [2, 1] == [1, 2]

  in list_test.gleam

  1 │ test reverse() {
    ┆
  3 │   assert [2, 1] == [2, 1] |> list.reverse
    │   ^^^^^^ failing assertion
    ┊
One could also tweak how the output is shown in many different ways (e.g. only display the name of the module and number of tests if all are passing, etc.) but I believe this is just a small detail.
A more pressing question would be, is it necessary to have a way to further group tests? Are there cases that would benefit from further grouping tests inside a module to change how the reporting looks? Something like:
✗ sql_test
  ✗ selection_queries
    ✓ select_star
    ✓ multiple join
    ✗ left outer join

  ✓ update_queries
    ✓ update_with_filter
    ✓ update_test_1

  ✓ other_test_1
  ✓ other_test_2
I feel that this may not be super important but, again I don't think I have the proper experience to be making assumptions I'm really curious to hear what you think!
More open questions


What should happen if someone tries to manually invoke a test function? Does that even make sense? I'd say no, it shouldn't be allowed
What happens if any of the things that need to be printed out is a function? What would be shown in that case? (Maybe just an f = <function> like in Example 3?)
Would the behaviour shown in the previous examples be enough for most cases? Or would it be useful to also show a stack trace to precisely pinpoint in which function the assertion failed? However one could argue that if the text function call chain is so complex that to track down where the assertion fails one also needs the stack trace it could be a sign of poor test organization and be discouraged
What happens when printing more elaborate data structures? Could be really bad, also the output would likely match the representation of the underlying target and not a nice pretty printed string (I'm thinking about sets, maps etc.) (or there could be some sensible default pretty printing for data structures defined by the standard library or records)