Skip to content

Instantly share code, notes, and snippets.

@dbaynard
Last active December 17, 2018 22:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dbaynard/9736e1e7c78da94f13da3ea6ed45f96f to your computer and use it in GitHub Desktop.
Save dbaynard/9736e1e7c78da94f13da3ea6ed45f96f to your computer and use it in GitHub Desktop.
A correct-by-construction AST representation of typographic tables, using diagonalization
-- |
-- Module : diagonal-span
-- Copyright : David Baynard 2018
-- License : BSD-3-Clause OR Apache-2.0
--
-- Maintainer : haskell@baynard.me
-- Stability : experimental
-- Portability : unknown
--
-- A proposal for a correct by construction abstract syntax tree for pandoc
-- Tables.
--
-- Briefly, tables are constructed diagonally, i.e. in the following order
-- (well, the following order when pattern matching):
--
-- 1 3 6 10 14
-- 2 5 9 13 17
-- 4 8 12 16 19
-- 7 11 15 18 20
--
-- This means it is possible to indicate when a table stops expanding *by
-- construction* and thereby ensure that only valid tables can be
-- constructed.
--
-- The resulting tables all have the same type, independent of their type.
--
-- TODO
-- Truly w/o dependent types? Possibly need them when parsing? The table type
-- itself has no concept of its own size; that may make things easier?
--
-------------------------------------------------------------------------------
--
-- What is a table?
--
-- In https://www.joelonsoftware.com/2012/01/06/how-trello-is-different/
-- Joel Spolsky describes how Excel became popular:
--
-- ‘…everyone was using Excel for tables, not calculations.
--
-- Bing! A light went off in my head.’
--
-- The minimal definition of a table I could come up with was
-- a two dimensional grid, containing cells, which has a starting point at
-- one particular corner.
--
-- The current representation of tables in pandoc is too strict on what
-- might be a valid table — specifically it links the definition of a cell
-- into the structure of a table, and thereby cannot represent cells which
-- span multiple, well, cells.
--
-- If a table is just a grid, where cells can take multiple spaces (in
-- multiple shapes) then the problem of constructing unique and valid
-- tables reduces to constructing unique and valid grids, and then filling
-- them with cells. Both can be solved by diagonalisation.
--
-- Diagonalisation is a technique to process arrays of data from a corner
-- outwards, rather than along each row.
--
-- Most haskell implementations of functions which diagonalize lists of lists
-- have to handle infinite lists, and so can only rely on the structure of
-- a generic array in order to stream values. They break down the diagonal
-- construction into expansion of the top left corner, and filling the
-- rest.
--
-- This AST representation uses the structure of diagonal lists, with some
-- type level information, to ensure that only valid grids can be created.
-- Specifically, a table begins by expanding diagonally, then hits
-- a barrier of height, width, or both, continues (in the case of only one
-- barrier) to expand in the free direction, then stops expanding and fills
-- the remaining spots in the table.
--
-- A GADT ('T') ensures that this order is followed.
--
-- Adding cells and annotations to follow… but for now: a cell can be
-- represented at the first grid spot it occupies according to the diagonal
-- traversal order, and subsequent spots can be references. Each diagonal
-- row only needs to know what was in the previous row (indeed, only two
-- spots in the previous row)in order to know whether a grid spot is occupied
-- by a cell or free for a new cell.
{-# OPTIONS_GHC -fplugin GHC.TypeLits.Normalise #-}
{-# OPTIONS_GHC -fplugin GHC.TypeLits.KnownNat.Solver #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
module Main where
-- Core
import GHC.TypeLits
import GHC.Exts (IsList(..))
import qualified Data.Function as F (fix)
import Data.Kind (Constraint)
import Data.Proxy (Proxy(..))
-- Parsing
main = undefined
-- | Does the element extend the data structure in two dimensions, grow the
-- height while maintaining a fixed width (or vice versa) or simply fill in
-- the remaining space in the fixed-size table?
--
-- Used as promoted types of kind 'Growing'
data Extending
= Filling -- ^ Fill in the remaining space in the table
| Height -- ^ Extend the height of the table, keeping the width fixed
| Width -- ^ Extend the width of the table, keeping the height fixed
| Diagonal -- ^ Extend both width and height, i.e. along the diagonal
deriving (Show)
-- | Ensures the tested type is present in the list supplied
type family Or (oneOf :: [k]) (test :: k) :: Constraint where
Or (x ': xs) x = ()
Or (_ ': xs) x = Or xs x
-- | Length indexed list using mb21's notation (https://github.com/jgm/pandoc/issues/1024#issuecomment-354859321)
-- but with GHC’s Nat
data List (n :: Nat) a where
(:-) :: a -> List (n - 1) a -> List n a
Nil :: List 0 a
infixr 5 :-
deriving instance Eq a => Eq (List n a)
deriving instance Show a => Show (List n a)
instance Functor (List n) where
fmap f (x :- xs) = f x :- fmap f xs
fmap f Nil = Nil
instance Foldable (List n) where
foldr f z (x :- xs) = x `f` foldr f z xs
foldr f z Nil = z
-- | A List with a length that depends on its type, but where that type is
-- unknown.
--
-- Needed for recursion on (List n)
--
-- The 'KnownNat n' constraint is needed to use the length of the list at
-- runtime. Implementations require the GHC.TypeList.KnownNat.Solver
-- typechecker plugin.
data SomeList a where
SomeList :: KnownNat n => List n a -> SomeList a
deriving instance Show a => Show (SomeList a)
deriving instance Functor SomeList
deriving instance Foldable SomeList
-- | We cannot extract the length of a 'SomeList' as a type, so any use
-- which requires the length must use 'withSomeList'.
--
-- The return type may not feature 'n'.
withSomeList :: forall a r . SomeList a -> (forall n . KnownNat n => List n a -> r) -> r
withSomeList (SomeList l) f = f l
instance IsList (SomeList a) where
type Item (SomeList a) = a
fromList = F.fix fromListF
toList = foldr (:) []
fromListF :: forall a . ([a] -> SomeList a) -> [a] -> SomeList a
fromListF _ [] = SomeList Nil
fromListF f (x:xs) = f xs `withSomeList` \(l :: List n a) ->
SomeList @(n + 1) $ x :- l
-- | We can extract the length of a 'List n a' in O(1) at runtime.
llength :: forall n a . KnownNat n => List n a -> Integer
llength _ = natVal $ Proxy @n
-- | Likewise from a 'SomeList a'
slength :: SomeList a -> Integer
slength s = s `withSomeList` llength
--------------------------------------------------------------------------------
-- | Correct by construction diagonalized tables.
--
-- This newtype synonym corresponds to whole tables.
newtype Table a = Table { unTable :: T 0 'Diagonal a }
deriving (Show)
-- | Constructing correct by construction diagonalized tables.
--
-- The type is indexed by the size of the most recent 'List' and whether the
-- table is expanding diagonally whilst walking down the resulting ADT,
-- growing in only one of width or height, or remaining the same size (as
-- the last corner is filled).
--
-- Construction of new values begins by growing the table with an empty 'List'
--
-- > Nil :+: …
--
-- Terminate with 'End'
--
-- > … ::: End
--
-- For other structures use the constructors with appropriate Lists.
--
-- A diagonally extending table may continue diagonally, stop (and fill), or
-- switch to either direction. A table extending in one direction may only
-- continue or stop and fill. A filling table may only fill, or complete.
--
-- A function to construct the unit table would look like this:
--
-- > \a -> Nil :+: a :- Nil ::: End
--
-- Note that all Operator constructors are infixr 3 so no parentheses are
-- needed with the 'List' constructors.
data T (n :: Nat) (extend :: Extending) a where
-- | Grow the table height and width by 1, by cons-ing a new diagonal List
-- of length one greater than the previous
(:+:) :: List n a -> T (n + 1) extend a -> T n 'Diagonal a
-- | Grow the table width by 1 at fixed height by cons-ing a new diagonal List
-- to the right of the previous list
(:-:) :: Or '[ 'Filling, 'Width] extend => List n a -> T n extend a -> T n 'Width a
-- | Grow the table height by 1 at fixed width by cons-ing a new diagonal List
-- below the previous list
(:|:) :: Or '[ 'Filling, 'Height] extend => List n a -> T n extend a -> T n 'Height a
-- | Fill the remaining table space by cons-ing a new diagonal list below
-- and to the right of the previous list
(:::) :: List n a -> T (n - 1) 'Filling a -> T n 'Filling a
End :: T 0 'Filling a
infixr 3 :+:
infixr 3 :-:
infixr 3 :|:
infixr 3 :::
deriving instance Show a => Show (T n extend a)
--------------------------------------------------------------------------------
data SomeTable a where
SomeTable :: T n extend a -> SomeTable a
deriving instance Show a => Show (SomeTable a)
withSomeTable :: forall a r . SomeTable a -> (forall n extend . T n extend a -> r) -> r
withSomeTable (SomeTable t) f = f t
tableFromDiagonal :: forall a . [[a]] -> Maybe (Table a)
tableFromDiagonal = undefined -- withSomeTable Table . f
where
f :: [[a]] -> SomeTable a
f [] = SomeTable End
-- f (x:xs) = f xs `withSomeTable` \(t :: T n extend a) ->
-- SomeTable
--------------------------------------------------------------------------------
-- From http://h2.jaguarpaw.co.uk/posts/polymorphic-recursion-combinator/
newtype Forall f = Forall { unForall :: forall a . f a }
fix :: forall f . ((forall a . f a) -> forall a . f a) -> forall a . f a
-- fix f = unForall (F.fix (\x -> Forall (f (unForall x))))
fix f = unForall . F.fix $ \x -> Forall $ f $ unForall x
-- fix f = unForall . F.fix $ Forall . f . unForall
--------------------------------------------------------------------------------
-- Based on Daniel Wagner’s universe package
makeCartesianWith :: (a -> b -> c) -> [a] -> [b] -> [[c]]
makeCartesianWith f xs ys = [[ f x y | x <- xs] | y <- ys]
-- | Diagonalize a list of lists
--
-- This function considers a column of rows as a list of lists.
-- The resulting array is treated as having three sections:
--
-- 1. The top left corner of processed cells,
-- 2. The cells in rows for which processing has begun, but not yet
-- completed, and
-- 3. The rows which have yet to be processed.
--
-- The first phase of the algorithm picks off the head of each row,
-- beginning with the first, then picking the second and the first, then
-- the third, second, first, and so on, until there are no more rows.
--
-- Within an iteration, values from lower rows are taken first, as the list
-- of processed rows is reversed.
--
-- The second phase then delegates to the list transpose function, which
-- does the same — removing the head from each list, then repeating.
--
-- As the first phase progresses, the divide between the rows for which
-- processing has begun, and those yet to be processed, progresses one row
-- per recursive call.
--
-- Phase 1:
--
-- @
-- ===============
-- - - - - - - - -
-- - - - - - - - -
-- - - - - - - - -
-- - - - - - - - -
--
-- 0|- - - - - - -
-- ===============
-- - - - - - - - -
-- - - - - - - - -
-- - - - - - - - -
--
-- 1|- - - - - - -
-- 0 1|- - - - - -
-- ===============
-- - - - - - - - -
-- - - - - - - - -
--
-- 2|- - - - - - -
-- 1 2|- - - - - -
-- 0 1 2|- - - - -
-- ===============
-- - - - - - - - -
--
-- 3|- - - - - - -
-- 2 3|- - - - - -
-- 1 2 3|- - - - -
-- 0 1 2 3|- - - -
-- @
--
-- Phase 2
--
-- @
-- 3 4|- - - - - -
-- 2 3 4|- - - - -
-- 1 2 3 4|- - - -
-- 0 1 2 3 4|- - -
--
-- 3 4 5|- - - - -
-- 2 3 4 5|- - - -
-- 1 2 3 4 5|- - -
-- 0 1 2 3 4 5|- -
--
-- 3 4 5 6|- - - -
-- 2 3 4 5 6|- - -
-- 1 2 3 4 5 6|- -
-- 0 1 2 3 4 5 6|-
--
-- 3 4 5 6 7|- - -
-- 2 3 4 5 6 7|- -
-- 1 2 3 4 5 6 7|-
-- 0 1 2 3 4 5 6 7
--
-- 3 4 5 6 7 8|- -
-- 2 3 4 5 6 7 8|-
-- 1 2 3 4 5 6 7 8
-- 0 1 2 3 4 5 6 7
--
-- 3 4 5 6 7 8 9|-
-- 2 3 4 5 6 7 8 9
-- 1 2 3 4 5 6 7 8
-- 0 1 2 3 4 5 6 7
--
-- 3 4 5 6 7 8 9 10
-- 2 3 4 5 6 7 8 9
-- 1 2 3 4 5 6 7 8
-- 0 1 2 3 4 5 6 7
-- @
--
-- Order is top down, then right
--
-- Based on Daniel Wagner’s universe package
diagonals :: [[a]] -> [[a]]
diagonals = tail . go []
where
-- This recursive function takes as input the list of rows above the
-- first stage barrier (technically in reverse order), and rows below
-- (in forward order)
go :: forall b . [[b]] -> [[b]] -> [[b]]
go above below =
corner : case below of
-- Drop the barrier below the current row and recurse.
--
-- Each round shrinks the length of each row above the barrier by
-- one.
--
-- Note that this reverses the order of rows above the barrier as
-- it recurses.
row:below' -> go (row : above') below'
-- Transpose the remaining lists (themselves in reverse order from
-- the original list of lists).
[] -> transpose $ above'
where
-- Generate the next diagonal in the starting corner of the array
-- by combining the head of each list above the barrier.
--
-- The list comprehension ensures that any empty lists result in
-- no value in the resulting list, rather than an error.
corner :: [b]
corner = [ h | h:_ <- above ]
-- Take the tail of each list above the barrier.
--
-- The list comprehension ensures that any empty lists result in
-- no value in the resulting list, rather than an error.
above' :: [[b]]
above' = [ t | _:t <- above ]
-- | The 'transpose' function transposes the rows and columns of its argument.
-- For example,
--
-- >>> transpose [[1,2,3],[4,5,6]]
--
-- @
-- =====
-- 1 2 3
-- 4 5 6
--
-- 1 4
-- ===
-- 2 3
-- 5 6
--
-- 1 4
-- 2 5
-- ===
-- 3
-- 6
--
-- 1 4
-- 2 5
-- 3 6
-- ===
-- @
--
-- [[1,4],[2,5],[3,6]]
--
-- If some of the rows are shorter than the following rows, their elements are skipped:
--
-- >>> transpose [[10,11],[20],[],[30,31,32]]
--
-- @
-- ========
-- 10 11
-- 20
-- -- <- Note that this is skipped in the next round
-- 30 31 32
--
-- 10 20 30
-- ========
-- 11
-- 31 32
--
-- 10 20 30
-- 11 31
-- ========
-- 32
--
-- 10 20 30
-- 11 31
-- 32
-- ========
--
-- @
--
-- [[10,20,30],[11,31],[32]]
--
-- From base package.
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose ([] : xss) = transpose xss
transpose ((x:xs) : xss) = heads : transpose tails
where
heads = x : [ h | (h:_) <- xss]
tails = xs : [ t | (_:t) <- xss]
@defanor
Copy link

defanor commented Dec 1, 2018

Diagonals indeed seem attractive in this context, but if each new one would depend on the previous one (which propagates information from previous ones too, such as longer {col,row}spans), what are the advantages over row-by-row or column-by-column definition?

@mb21
Copy link

mb21 commented Dec 1, 2018

Definitely looks interesting... maybe you can fill in the main function with some example tables that are printed? For example, how are row- and col-spans handled? And what about jgm/pandoc#1024 (comment) ?

@dbaynard
Copy link
Author

dbaynard commented Dec 4, 2018

Thanks for feedback so far. I've searched for anything published on this, but I haven't yet found anything. As I mentioned on the issue, I hope the process of implementing this will help, whether or not this implementation itself does.

Diagonals indeed seem attractive in this context, but if each new one would depend on the previous one (which propagates information from previous ones too, such as longer {col,row}spans), what are the advantages over row-by-row or column-by-column definition?

For a diagonalized table, all valid tables (no matter their size) have the same type (the Table a synonym). The row-by-row examples I've seen relying on type indexes have different types for different size tables. I'm not yet able to actually produce a diagonalized table yet from an existing list, though (I may try with [] rather than List n for the elements of the T, first).

maybe you can fill in the main function with some example tables that are printed?

OK — there are a number of examples in the issue, though it would be very helpful to have a list of test cases.

For example, how are row- and col-spans handled?

I'm not quite sure about this, though there are positions within the resulting data structures that uniquely correspond to new rows/columns (e.g. :+: introduces a new column and a new row, :-: introduces a new column, and :|: a new row.

And what about jgm/pandoc#1024 (comment)?

a b b    letters are repeated to represent
a c d    cells spanning multiple rows or columns
e e d

So this would be something along the lines of

Nil :+:
  A :- Nil :+:
    a :- B :- Nil :+:
      E :- C :- b :- Nil :::
    e :- D :- Nil :::
  d :- Nil :::
End

The upper case letters indicate the first time in the data structure traversal that a cell is encountered.

I'm not yet sure what to do about the lower case points. One thought would be a data type for Cell that permits an empty spot or a direction.

An empty spot would be useful in actual tables — often a top corner may be empty (no border, even, e.g. the top left and bottom right cells of https://thediagram.com/4_2/table_a.html where there are borders which are often skipped) — and the length indexed Cell types might include 0 anyway.

Constructing the AST, it would be possible to indicate where a cell that would be in that position is merged with one in one of the preceding positions. The * positions allow for simple rectangular cells, the # for more complicated shapes.

- - - - -
- - * -
- * x
- #
-

When traversing the table we'd need to know the span size of a cell we encounter. Ideally, this should be based on the structure of the table. Perhaps there's a way to track the excess number of spots in the grid that a cell spanning multiple rows or columns needs, and then ensure through types (in a similar way to the T type) that there are always that number available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment