hdgarrood/week-1.markdown

## week-1.markdown

      
    Raw
  

              week-1.markdown
            
          
    Pursuit GSOC: Week 1

I'm going keeping a blog of progress on my Google Summer of Code project this
summer, which is various enhancements to the PureScript code search tool
Pursuit.
I have managed to get quite a bit done already, since I had been contributing
to PureScript (and Pursuit in particular) for a while before I even applied for
Google Summer of Code. So, for the first entry, I'm just going to go over
what's been done so far:


A basic filesystem-based database for storing packages. At the moment, the
only queries that need to be performed are "what versions are available for
this package, if any" and "give me the documentation for this version of this
package", and both of these are easily answerable with the current structure:
one directory per package, and each package directory contains one file for
each version of the package. The files themselves contain a JSON-serialized
Package GithubUser (see the relevant code inside the compiler), which
has all the information needed to render a package homepage, documentation,
and also Hoogle input files.


HTML documentation rendering, adapted from psc-pages and moved into Pursuit
itself. I also made a few minor changes to the rendering — for example,
data constructors are now displayed separately from instances, and instances
are grouped under the relevant types or type classes.


A pull request sent to the compiler itself, to allow collection of fixity
information for operators (that is, left- right-, or no associativity, and
precedence). This information can then be used in the generated HTML
documentation.


A mechanism for uploading packages. I wanted to make this as streamlined as
possible. Taking Haskell for example, you would run cabal sdist to create a
source distribution as a local file, and then you would visit the upload page
on Hackage in a browser, find your source distribution again using a file
browser, and upload it. This is a little awkward and I think it should be
possible to do better.
Given that I didn't want to manage a users and passwords inside Pursuit, the
approach I decided on was to allow unauthenticated uploads, but not to
actually publish packages until they have been verified by an authenticated
GitHub user. Currently the way this works is:

A library author runs psc-publish inside their PureScript package
directory, which produces a JSON-serialized version of the package to be
used on Pursuit.
Some command line tool POSTs that JSON to the Pursuit server (no such tool
exists just yet).
The Pursuit server generates a random verification URL, and replies with
that URL.
The user visits that URL in their browser, and is prompted to log in via
GitHub (with OAuth).
After the user is authenticated, the package is considered verified, the
GitHub user is recorded as being the package uploader, and the HTML
documentation for that package becomes visible.


Some kind of caching mechanism. The HTML documentation will very rarely
change, so we don't want to do a ton of JSON parsing and documentation
generating every time a page is requested. At the moment we have a very basic
system which the saves expensive bits of HTML to disk, and deletes them
whenever they expire.