Fuuzetsu/jbracker-haddock-GSoC2013.md Secret

## jbracker-haddock-GSoC2013.md

      
    Raw
  

              jbracker-haddock-GSoC2013.md
            
          
    The following was proposed by jbracker but as jbracker was picked for a different project, I (Fuuzetsu) have been asked to foster this proposal and have since been accepted to do so.
Improve Haddock markup and capabilities

Based on the suggestion by Johan Tibell.
GSoC Questions


What is the goal of the project you propose to do?
The goal of this project is to extend Haddock with long missing features that have been requested several times.
Look at the goal section below for specific information about my goals.


In what ways will this project benefit the wider Haskell community?
Haddock is the standard tool for documenting Haskell. It is used on the central package platform
Hackage for generating all documentation. Improving this tool will benefit the Haskell community greatly,
in giving it a more powerful tool to document its code.


Can you give some more detailed design of what precisely you intend to achieve?
Please, refer to the goals section. It is specific about each goal.


What deliverables do you think are reasonable targets? Can you outline an approximate schedule of milestones?
The timeplan is based on the GSoC timeplan:

28. May - 16. Jun: Students get to know mentors, read documentation, get up to speed to begin working on their projects.

Milestone 1: 100% feature compatible transition to a new parser (part 1 of the goal).


17. Jun - 29. Jul: Mentors give students a helping hand and guidance on their projects.

Milestone 2: Lift current but unnecessary restrictions that are backward compatible (part 2 of the goal).


29. Jul - 02. Aug: Mentors and students can begin submitting mid-term evaluations.
02. Aug - 16. Sep: Mentors give students a helping hand and guidance on their projects.

Milestone 3: Implement as many extended features as possible. Priority lays on part 3.


16. Sep - 23. Sep: Suggested 'pencils down' date. Take a week to scrub code, write tests, improve documentation, etc.


What relevant experience do you have? e.g. Have you coded anything in Haskell? Have you
contributed to any other open source software? Been studying advanced courses in a related topic?

I have had several classes in functional programming (and more that just used Haskell).
I have worked on generalising the Diagrams library.
I worked on the Sunroof project for the last 6 months.


In what ways do you envisage interacting with the wider Haskell community during your project?
e.g. How would you seek help on something your mentor wasn't able to deal with?
How will you get others interested in what you are doing?

I could twitter about improvements, giving everybody a chance to see my progress.
I will be online in the Haskell IRC channel for discussion with the Haskell community about problems and suggestions (Nick: jbracker).
The Haskell mailing list would be a great reference for bigger problems and continuous discussions.
It might also be useful to store code on GitHub, because it offers a easy way to comment changes and discuss problems.


Why do you think you would be the best person to tackle this project?
I have been in active Haskell development over the last 8 months (I gathered a lot of experience during that time).
Most of the improvements I am suggesting for Haddock are things I have often been missing myself, which
highly motivates me to get them fixed once for all. I will have 3 months of spare time that I would gladly spend
on this project.


Goals

A disscussion with Simon Hengel has brought up the issue that Haddock is hard to extends and maintain, because of the currently used parser (Happy/Alex). Their limited lookahead and backtracking make it hard to implement many of the desired features. This is why I am suggesting to use a backtracking parser instead (preferably Attoparsec). The plan to convert and extend Haddock would have the following milestones:

First I reimplement the exisiting parser using the new one. This will result in a 100% feature compatible Haddock parser.
The second step would be to extend that parser by lifting current, but unnecessary restrictions of the parser:

Empty line between items of a list;
Missing title for images;
Support for GADTs and Type Functions;
This will keep it backward compatible and broaden the accepted Haddock.


Implement features that are not backward compatible, but unlikely to break existing Haddock comments, e.g. URL autolinking
Implement a flag to activate extended syntax support. This will
include implementing most of the suggestions listed in the
following sections.

Of course, there is a dependency issue when moving to another parser library. For Attoparsec we can avoid this
by adding the relevant Attoparsec sources to Haddock. The dependency to Data.Text remains a problem.
If it is added to the core libraries in future we do not have to worry about this. Otherwise we can just use
the ByteString based version of Attoparsec to avoid that dependency. I have discussed this with Simon and
there should be no problems with this approach.
In general a test driven development approach would be good, so new code is tested right away. I will try to write all tests for new code right away.
Backward Compatible Enhancements

The following enhancements would define the goal for step two of the previously suggested approach.
Haddock misses support to document widely used language extensions:


Implement support to document GADTs.
Right now the most recent version of Haddock (2.13.2) produces a parse error on this code:
-- | A tree GADT
data Tree a where
  -- | Leaf constructor
  Leaf :: a -- ^ Leaf content
       -> Tree a -- ^ The singleton tree
  -- | Fork constructor
  Fork :: Tree a -- ^ Left branch
       -> Tree a -- ^ Right branch
       -> Tree a -- ^ The tree
  -- | Special constructor
  Special :: String -- ^ It needs a string
	     -> Int    -- ^ It needs a integer
	     -> Tree String -- ^ It is specialised
I plan to implement all these comments.


Implement support to document type family instances.


Each module documentation should contain a list of all activated language extensions (and other important flags).
A extensions needed by a certain package can be listed on the package index page.


Right now entries of a list have to be separated by empty line. I do not see the necessity for this rule.
I think it would be more convenient to just enforce all contents of a list entry to be right of the column
the bullet it appeared in.


Markup

This section discusses the extensions to haddock that would be part of step 3 or 4.
Looking at the Haskell cafe discussion there are several different suggestions for markup languages:

Markdown
reStructuredText
Creole
Pandoc - This has to widespread dependencies for a core tool.

The major issue with Markdown is that it is ambigious and does not have a formal and commonly agreed syntax. The major advantage of the other two languages is that they have a well known syntax and semantic, but they seem to mighty for the context of source documentation.
Generally it would be in favorable to extend the existing Haddock language to support missing features. This will make it easy for long-time users to adjust. It will also keep Haddock more maintainable, since adding a completely new markup would mean twice the maintenance later.
Activation and Integration

Adding new syntax and features to the Haddock language may invalidate old documentation or cause unintended changes in its appearance. For that reason it is a good idea to only activate the additions if a pragma is in the sources:
{-# HADDOCK ExtendedSyntax #-}
Automated cross-linking

In general automatic cross-linking would be useful. The current semantic of ' and @ could be merged. Each will produce inline source code. Both would automatically insert a link if the content matches a valid identifier, type or module name. No need for the user to mark it manually.

This approach has the downside that it might link things that are not meant to be linked.
A full scale analysis if the enclosed code is actually valid Haskell is tricky,
difficult and may lead to more problems than do good.

Headers

There should be support for headers outside of the export list so documentation does not clutter that list.
This would make it easier to write a few paragraphs description for sections.
As the Markdown syntax (involving #) has some issue with other features of the language/compiler, I would
suggest an alternative syntax. There are several other wide used ways to mark headings:
Section
=======

Subsection
----------

or
=======
Section
=======

----------
Subsection
----------

or
= Section
== Subsection
=== Subsubsection
==== Ridiculously deep for documentation

The first one is used by Markdown and reStructuredText. The second is used in WikiCreole. I don't see problems
with ambiguity with either of these syntaxes. I would prefer the first or second one for main headings, because it is
clear and visibile while reading documentation. It also has the advantage of giving a visual division within the
source that is easy to spot, supporting a programmer writing documentation, while at the same time loosening
bigger blocks of code up.
For integration I would like to support the first or second version without special haddock marker:
-- Section
-- =======

-- ----------
-- Subsection
-- ----------

Deeper levels of headings can be expressed using the third syntax. I would only allow the third syntax in normal Haddock
code blocks.
Further Inline Markup

Haddock is missing the capability to typeset bold text. For this purpose I would suggest adding the syntax known from markdown:
__Important__

Though I am not yet sure if one or two underbars should be required.
I think the notation with * used by Markdown should not be supported, because it may cause conflicts with lists.
Proper Support for Images

There already seems to be support to embed images: <<img>>. The paths seem to be static.
One should be able to give image URL relative or as a complete URL. The root of a relative location would be given as a flag for Haddock.
Cabal: The images can be given as additional resources in the cabal file. Like this they will be included when producing an archive through cabal sdist.
Hackage: Hackage can use the getDataDir path that cabal creates to locate images in the archive and set the flag correctly.
Optionally Haddock should offer not just to link againt images,
but also to copy them from their sources into a common location
of the generated documentation. This would avoid breaking
relative links if the documentation is moved. This has to be handeled
separatly for absolute and relative paths. For links to other servers
copying may be turned on or off separatly.
The already existing syntax has a downside. It is not possible to provide and alt/title text. This could be fixed by introducing one of the following ways to embed images:
WikiCreole:
{{image.jpg|Title}}

Markdown:
![image.jpg](Title)

Adaptation of existing syntax combined with the syntax for links:
<<image.jpg Title>>

I would favor the adaptation, because it is close to the already existing syntax and fits in well with the syntax for links.
Optional Additional Goals

These goals can be tackled if time permits it.
Search in Documentation

A search in the documentation through JavaScript might be a nice feature.

Haddock can generate a JSON search index.

 * This would make documentation easily searchable through third party tools, without the need for a central search engine like Hoogle or Hayoo.
Documentation

A major point would also be to update documentation and add undocumented features (like the image tags <<img>>).
Markdown Ideas

These are ideas I previously thought about when adding Markdown. They should not be considered part of the goals.
Activation and integration

I would add markdown as an alternative markup language.
Like this users would not have to bother about learning what parts of markdown are
supported by Haddock and which are not. The idea of activating it by pragma sounds
very nice:
{-# HADDOCK Markdown #-}
I would not allow markdown or any other markup language on the same level as Haskell code (no literate Haskell style).
Automatic Cross-Linking

As far as I can see single-quoted strings do not need to be added.
The inline code tags are sufficient. If the marked code matches a
valid identifier, type or module name it can be linked automatically
without the user having to mark it manually:
-- | `(++)` appends two `String`s. More information 
--   can be found in the module `Data.String`. 
--   Here a small example: `"a" ++ "b"` = `"ab"`
(++) :: String -> String -> String
(++) = undefined
In this example (++), String and Data.String would be linked while the
rest would not be.

This approach has some downsides as disscussed above.

Headings

There is a problem with the C-preprocessor.
{-# LANGUAGE GADTs #-}
-- # My Header
#ifdef evil_c
-- ...
#endif
A single space can resolve this problem and if it arises there is still the alternative syntax for headings:
{-# LANGUAGE GADTs #-}
{- | 
My Header
=========
#ifdef evil_c
-- ...
#endif
-}
For every heading that is higher then level one that shoul be no conflicts anymore.
Related Links


Markup Languages:

Markdown
reStructuredText
Creole
Pandoc


Suggestions:

Johan Tibells suggestion


Discussions:

Tibells suggestion on Reddit
Tibells suggestion on Haskell Cafe