marcmartino/sublist.md

## sublist.md

      
    Raw
  

              sublist.md
            
          
    An Adventure in Optimizing Functional Code Speed

Continuing through my learning/exploration of elm I was tasked with coming up with a function for determining if a given list is an equal, sublist, superlist, or unrelated to another.
My initial solution looked like this
type ListComparison
    = Equal
    | Superlist
    | Sublist
    | Unequal


isSublist : List a -> List a -> Bool
isSublist xs ys =
    let
        xLen =
            List.length xs

        yLen =
            List.length ys
    in
    if xLen == yLen then
        xs == ys

    else if xLen > yLen then
        False

    else
        isSublist xs (List.drop 1 ys) || isSublist xs (List.take (yLen - 1) ys)


sublist : List a -> List a -> ListComparison
sublist xs ys =
    let
        sublistChecks : ( Bool, Bool )
        sublistChecks =
            ( isSublist xs ys, isSublist ys xs )
    in
    case sublistChecks of
        ( True, True ) ->
            Equal

        ( True, False ) ->
            Sublist

        ( False, True ) ->
            Superlist

        ( False, False ) ->
            Unequal
At first glance this is a totally reasonable approach, recursively going through all the possible list pairings to detect a sublist.
The issue came in when running the exercise test cases:
This test failed because it threw an exception: "RangeError: Maximum call stack size exceeded"
My code was too slow. It makes sense when I really thought about what was happening-

List.length is an O(n) operation. Lists are linked lists really after all and for every iteration of isSublist I'm traversing the entirety of both lists just to grab the lengths
List.take (yLen - 1) is essentially O(n) as well
The number of stack frames getting used for the recursion in isSublist is super high. Worst case O(n²)

Speeding Up

Low hanging fruit here would be reapproaching the sublist matching to check if our list is a "prefix" of the second list and if it's not just check it again against the tail of the second list
isPrefixOf : List a -> List a -> Bool
isPrefixOf xList yList =
    case ( xList, yList ) of
        ( [], _ ) ->
            True

        ( _, [] ) ->
            False

        ( x :: xs, y :: ys ) ->
            x == y && isPrefixOf xs ys


isSublist : List a -> List a -> Bool
isSublist xList yList =
    case ( xList, yList ) of
        ( [], [] ) ->
            True

        ( _, [] ) ->
            False

        ( xs, y :: ys ) ->
            isPrefixOf xs (y :: ys) || isSublist xs ys
This approach is going to be clearly a ton more efficient- eschewing all of the List.length and List.take calls without losing exhaustiveness and we're using the short circuiting of the logical operators to our advantage.
This was not enough...

This approach was able to get around one of the Maximum call stack size exceeded errors, but not the other. This is going to take a more sizeable change to speed things up further.
The indelible edmundnoble#9148 walked me through tail call optimization (TCO) and helped to refine my existing intuition on the matter. What it actually is (in languages like .hs and .ml), what it is in a language like .js and .elm, and that kind of speed increase we would expect if we tap into TCO for my elm sublist code.
Speedy Sublist Validation

type ListComparison
    = Equal
    | Superlist
    | Sublist
    | Unequal


isPrefixOf : Bool -> List a -> List a -> Bool
isPrefixOf isPrefix xList yList =
    case ( isPrefix, xList, yList ) of
        ( False, _, _ ) ->
            False

        ( _, [], _ ) ->
            True

        ( _, _, [] ) ->
            False

        ( _, x :: xs, y :: ys ) ->
            isPrefixOf (x == y) xs ys


isSublist : List a -> List a -> Bool
isSublist =
    isSublistRec False


isSublistRec : Bool -> List a -> List a -> Bool
isSublistRec sublistFound xList yList =
    case ( sublistFound, xList, yList ) of
        ( True, _, _ ) ->
            True

        ( _, [], [] ) ->
            True

        ( _, _, [] ) ->
            False

        ( _, xs, y :: ys ) ->
            isSublistRec (isPrefixOf True xs (y :: ys)) xs ys


sublist : List a -> List a -> ListComparison
sublist xs ys =
    let
        sublistChecks : ( Bool, Bool )
        sublistChecks =
            ( isSublist xs ys, isSublist ys xs )
    in
    case sublistChecks of
        ( True, True ) ->
            Equal

        ( True, False ) ->
            Sublist

        ( False, True ) ->
            Superlist

        ( False, False ) ->
            Unequal
As you can see, we're accepting some added complexity in our logic for having proper tail calls. Were before the test cases were failing after 6.5s, the new updated code completes all cases in less than 150ms.
This was an amazing learning experience for me and I hope it can be a shred of help to someone else. Again big thanks for Edmundnoble's patience explaining lots of this to me!