maueroats/commentary-typed-racket-issue-496.md

## commentary-typed-racket-issue-496.md

      
    Raw
  

              commentary-typed-racket-issue-496.md
            
          
    Types for Nonempty Lists

This is a writeup of the issues I found changing the type signatures
of max and min to require at least one input. The idiom of
applying these functions to "rest" args means a number of other type
signatures need to be enhanced to propagate assurances that list are
nonempty.
Discussion

The discussion (last active
2013)
linked from GitHub issue 496 includes the question of whether people
should be asked to change from the idiom of applying max to a list,
and also the question of to what extent Typed Racket can deduce that
the applications are safe.
I looked at this question some and this report includes some
conclusions and enough details for another person to pick up the
project.
There is one recommendation which does not fit elsewhere: standardize
a type abbreviation for a nonempty list. Perhaps (Nonempty-Listof A).
Branches


typed-racket
math
plot

Conclusions

To summarize situations in which Typed Racket
does not deduce a list is nonempty:

Early exit with an error: (when (empty? xs) (signal-error ...)
This would particularly an issue if signal-error were not
in the same compilation unit.
Lists known to be the same length, with only one explicitly checked nonempty:
this situation occurs in statistics (counts and weights) and
graphics (coordinates x/y/z).
Lists are derived from sequences, for which there is no type-level
way of specifying they are nonempty.

Details


The utility function cumulative-sum in the math/statistics package takes in a list and produces a nonempty list.
The type signature for reverse does not indicate that it preserves nonempty lists.
The helper function for foldl must have the correct
nonempty-preserving signature.

The changes required to propagate information about nonempty lists
seem extensive. I have some notes below.
Partial List

I read Section 4.10, and I listed the functions that I think would need
updated signatures if one wanted to track non-emptiness of lists.
The list is (to me) suprisingly short. I am leaving this list here as
a map for someone who wanted to be thorough.

map -- completed in PR
reverse -- completed in PR
sort: easy, two more lines
cons: one of the type signatures is (-> a b (Pairof a b)). Is TR
now smart enough to substitute b=(Listof a) and deduce that cons
produces nonempty lists from this signature alone? If so the other
is redundant.
build-list: easy
make-list: easy
add-between: easy
append: type signature already preserves nonempty lists (in first argument)
append*: this may not be realistic to preserve nonempty lists unless
every item in the input list is required to be nonempty?
flatten: type system probably cannot encode a general condition that gives
a nonempty output here. Look for use before doing.
remove-duplicates: easy
cartesian-product: easy, but look for use before doing. Any empty list in the
input should produce an empty output, so the strongest signature
would say all inputs nonempty lists produces a nonempty list.

In addition, forms of for, especially for/list, would need
annotation by the user - so a convenient way to describe the output
type would be good.
Map and Reverse

The type of map preserves nonempty lists of 1, but not nonempty
lists when more than one input is given! This is fixed in a PR.
Welcome to Racket v7.7.0.9.
> map
- : (All (c a b ...)
      (case->
       (-> (-> a c) (Pairof a (Listof a)) (Pairof c (Listof c)))
       (-> (-> a b ... b c) (Listof a) (Listof b) ... b (Listof c))))
#<procedure:map>

Less strangely, reverse does not have a type signature that
preserves nonempty lists. (Added in PR.)
More Problems I: Simple Issues Plotting

The plot-lib package contains a number of type errors once max is
changed to require at least one input argument. The first obstacle is
the cumulative-sum function, which needs to have a type signature
indicating that it will always output a nonempty list.
(: cumulative-sum (-> (Listof Real) (List* Real (Listof Real))))

As mentioned above, there are two issues. The first issue is that the
original foldl accumulator function has a signature which does not
assert it always produces a nonempty list (it does) - blame the standard cons
signature? The second issue
is that reverse does not have a signature asserting it takes
nonempty lists to nonempty lists.
Early Exit

Typed Racket is able to infer that the list xs is nonempty in the else
clause after a test:
(if (empty? xs)
    (raise-argument-error ...)
    (do-work-on-nonempty xs))

Typed Racket is not able to infer that the list xs is nonempty after a
when test that terminates the function.
(when (empty? xs) (raise-argument-error ...))
(do-work-on-nonempty xs)

When I encountered this problem, I transformed the early exit into one
branch of a cond, so the types were known to be nonempty in the
a later branch. I used cond instead of if because it was easier to
bracket the existing multi-line code with cond.
The early exit idiom is sometimes combined with the issue that
multiple lists are known to be the same length, so only one of them is
tested to be nonempty.
Multiple lists of the same length

This section explains two instances where multiple lists are known to
be of the same length, so the reader can see what would be required
for Typed Racket to deduce or use this information automatically.
Plotting package (plot-lib)


vector-field-render-fun in plot2d/point.rkt needs to know that
all of the lists involved are nonempty. The key step is a
for*/lists that creates 7 lists of the same length.  After
checking to see if one of them is empty, the remainder are used
without checking them.
vector-field-render-fun, more: for the compiler to prove that the lists
are all nonempty, it needs to know that linear-seq is
returning a nonempty list. (The samples parameter is a positive
integer here, so this is deducible from the existing linear-seq
with sufficient interprocedural analysis, but...)
linear-seq in common/sample.rkt has a signature that permits the
number of samples to be 0, in which case the empty list is
returned. On one hand I understand the reasoning --- mathematically,
this makes sense---, but that is going
to make it quite a bit harder for the compiler to understand that in
all other cases, the empty list is not produced. Solving the problem
in this particular instance is "easy": another
type signature can cover the case when the num parameter is known
to be a Positive-Integer.

Statistics package (math/statistics)

This section shows a detail where the compiler would in theory be able
to deduce that two lists have the same length, and hence are either
both empty or nonempty, but does not.
Examine the statistics-utils.rkt file from the math/statistics package.
There are two lists of values returned from
sequences->weighted-samples. They are the same length, but the
compiler does not know this.
The information to make a more detailed analysis is available to the
compiler. The penultimate step in sequences->weighted-samples is to
call a function that produces an error unless the lengths are the
same.
This becomes an issue in the function
sequences->normalized-weighted-samples, where the code tests to see
if the first returned value is nonempty and then assumes (knows) the
second is also nonempty. Applying max (with a signature requiring at
least one element in the list) to the second value (needlessly)
results in a type error.
Corrections


Multi-input map now preserves nonempty lists.
Reverse now preserves nonempty lists.
plot-lib/plot/math.rkt: rect-join and rect-meet now work
correctly given no inputs, producing (empty-rect 0) and
(unknown-rect 0) respectively.

Other Fixed Issues

This section documents the other fixes needed before Racket would
build with raco setup without errors.


plot-lib/plot/plot2d/line.rkt: (the density function) inserted
magic 1e-308 as default h when xs is empty (should be impossible)


plot-lib/plot/common/kde.rkt: (the kde function) in the else
clause, calling sort-samples with a nonempty xs produces two
nonempty lists of the same length (see notes below), so the application of max to
the weights will always be ok. Fix: I inserted a 0.0 as the default for
max-dist to appease the compiler.


plot-lib/plot/common/plot-device.rkt: The max-label-x-size
function needs to know (somehow) that the labels input is
nonempty. Fix: the max size is 0 in the empty case.


plot-lib/plot/plot3d/flv3-center: Should require at least one
vector. I think this is a mistake/carelessness in the code. Fix:
require a nonempty input list.


math/math-lib/.../matrix/matrix-operator-norm.rkt: I think the
issues with matrices stem from (1) are 0 dimensional matrices
allowed? (2) row or column-wise application of the L_1 norm: if you
allow 0 dimensional matrices, you need to say what L_1 of that
matrix is (presumably -inf.0 if that's allowed).
(matrix-op-inf-norm, matrix-op-1norm):
Fix: I added a default max of -inf.0 if there are no columns. This should
cause the assertion in the code to fail. (Is that desirable?)