A complete list of my contributions over the course of the project can be found at https://github.com/google/codeworld/pulls?utf8=%E2%9C%93&q=is%3Apr+author%3Astarsandspirals+created%3A%3C2019-08-26+created%3A%3C2019-09-03+created%3A%3E2019-05-27.
The aim of my project for Google Summer of Code was to create a GHC source plugin which provides a domain-specific language for embedding exercise requirements into Haskell source files, so that they can be checked and the results automatically reported. This plugin has potential applications for users in a broad range of educational contexts, such as university professors teaching functional programming classes using Haskell, community members writing tutorials for their libraries, and authors who are writing textbooks with exercises.
Much of the groundwork for this plugin was built on work that existed as a prototype within CodeWorld. This is a web-based educational environment for K-12 students to learn the basics of programming in a functional style, as well as being a more general environment for experimenting with Haskell that anyone can use. The requirements checker within CodeWorld was an additional step run as part of the compilation process, which used syb
and haskell-src-exts
to parse the code. This allowed teachers to easily implement rules such as patterns specified via regular expressions that would be matched as templates against the Haskell code submitted by their students.
My eventual goal was to take this system and rewrite it in the form of a GHC plugin. This would have two main benefits: firstly, it would allow users outside of the CodeWorld ecosystem to make use of the requirements checker, and secondly, it would allow for more complex checks to be performed at different stages of compilation. Previously the system could only analyse the raw source code and the syntax tree parsed by haskell-src-exts
, but by making use of the GHC API it is possible to check requirements after renaming, after typechecking, and potentially even carry out runtime tests.
The code for this project is available at https://github.com/google/codeworld/tree/master/codeworld-requirements. This folder is present in the main CodeWorld repository but is also a standalone Cabal package, and when installed the source plugin can be loaded at compile time by passing the -fplugin=CodeWorld.Requirements.RequirementsChecker
flag to GHC.
Requirements can be embedded in Haskell source files using REQUIRES or XREQUIRES comments, the high-level format of which is described in the next section. If such a comment is provided, then the plugin will output a :: REQUIREMENTS ::
block of information, which is then produced as a message by the compiler. The most complex piece for the user is the language used to specify requirements within the comments, which is described at the end.
There are two ways to embed requirements into a source file: REQUIRES comments, or XREQUIRES comments.
A REQUIRES comment looks like this:
{-
REQUIRES
Description: Some user-readable description.
Rules:
- << YAML-format description of 1st rule to check >>
- << YAML-format description of 2nd rule to check >>
- << YAML-format description of 3rd rule to check >>
-}
Description is a user-readable arbitrary string, which is used in the output to provide a section header summarizing this requirement. The Rules are YAML machine-readable descriptions of the formal rules that are verified. Details on the rules are below. A program can contain multiple REQUIRES blocks, one for each top-level requirement. Each requirement can then have any number of rules that must pass before that requirement is satisfied.
An XREQUIRES comment is an obfuscated variation on REQUIRES. Instead of a single requirement, it contains a compressed and base64-encoded list of requirements. (It is possible to include multiple XREQUIRES blocks, but it's not necessary.) It looks like this:
{-
XREQUIRES
eJyNkE9LxDAQxb/KkNMKQWxtd/Wo0IW96S56sUJmm+km0CQlyeKfT2+oDQq6
sKcZeL957zEvbNs8Pm22zQ5aJkbvDh6N4CCkiyFNtHLaBQTljoME6yLsCYyT
utckL1vWWgAwGDtFoXkfqYskF7MTh7paVqvy4l8o+XIoyroolieBwOGmXl3d
1olg/HfbO+iPtovaWRB9yiIBB7LkcdCfFCAqgs4Zk+QRYyRvc1eFYafNONAD
prOwmI7/2D+j17gfkhPGyYzSK9ATSOq1JQlvOqpJyOm5Ts6ZwfuP9Sx8JxUc
cuJpqDwHuj4Hqn4g9voF6VGoEg==
-}
A typical requirements output block looks something like this:
:: REQUIREMENTS ::
Obfuscated:
XREQUIRES
eJyNkE9LxDAQxb/KkNMKQWxtd/Wo0IW96S56sUJmm+km0CQlyeKfT2+oDQq6
sKcZeL957zEvbNs8Pm22zQ5aJkbvDh6N4CCkiyFNtHLaBQTljoME6yLsCYyT
utckL1vWWgAwGDtFoXkfqYskF7MTh7paVqvy4l8o+XIoyroolieBwOGmXl3d
1olg/HfbO+iPtovaWRB9yiIBB7LkcdCfFCAqgs4Zk+QRYyRvc1eFYafNONAD
prOwmI7/2D+j17gfkhPGyYzSK9ATSOq1JQlvOqpJyOm5Ts6ZwfuP9Sx8JxUc
cuJpqDwHuj4Hqn4g9voF6VGoEg==
[Y] First user-visible description
[Y] Second user-visible description
[N] Third user-visible description
Detailed description of what went wrong.
:: END REQUIREMENTS ::
The requirements block includes everything from :: REQUIREMENTS ::
to the matching :: END REQUIREMENTS ::
.
The block begins with an obfuscated version of the requirements, which can be used to replace the plain-text requirements (REQUIRES) with the obfuscated version (XREQUIRES) as discussed above. After this, there are top-level lines beginning with one of [Y]
, [N]
, or [?]
. These indicate, for each requirement, if the requirement is satisfied, not satisfied, or if there was a problem checking the requirement. In the latter two cases, there are further lines explaining what went wrong.
These are the current checks implemented, and how to specify them as rules in requirement comments.
-
containsMatch
Example:
containsMatch: template: | __func $any = $any cardinality: atLeast: 4 explanation: The code defines fewer than 4 functions.
Checks that the code contains a declaration matching the one given. The template is a standard Haskell declaration, except that it can contain certain wildcards. Any identifier beginning and ending with underscores, like
_x_
, is a placeholder that matches any chosen name. Additionally, there's some additional extra syntax:$any
matches any pattern or expression at all.$var
matches any variable name.$con
matches any constructor name.$lit
matches any literal expression.$num
,$char
, or$str
match specific types of literals.$(tupleOf [| some template |])
matches any tuple whose elements each match the template in Oxford brackets (that is, between[|
and|]
).$(contains [| some template |])
matches any pattern or expressions with a subexpression that matches the template in Oxford brackets (that is, between[|
and|]
).$(allOf [ [| first template |], [| second template |] ])
matches any expression that matches all of the child templates. Other logical combinators includeanyOf
andnoneOf
.
-
matchesExpected
Example:
matchesExpected: variable: var hash: 999999
Checks that the definition of var, once source locations are cleared, hashes to the given value (modulo 1 million).
-
notDefined
Example:
notDefined: var
Checks that there is no definition for a variable named var. If there is, it fails.
-
usesAllParams
Example:
usesAllParams: func
Checks that
func
makes use of all of its named parameters. If any parameters are not used in an equation, it fails. -
notUsed
Example:
notUsed: var
Checks that there are no references to var in the module.
-
maxLineLength
Example:
maxLineLength: 80
Checks that there are no lines longer than 80 characters.
-
noWarningsExcept
Example:
noWarningsExcept: [Defined but not used]
Checks that there are no warnings, except for those that match any of a list of given regular expressions. This list can be empty, in which case no warnings will be allowed.
-
typeSignatures
Example:
typeSignatures: true
Checks that all top-level declarations have accompanying type signatures.
-
blacklist
Example:
blacklist: [circle]
Checks that none of the symbols appearing in the blacklist are used. This rule and the whitelist rule below are checked after renaming, so it can be used in cases where a standard library symbol needs to be blacklisted but new symbols with the same name (such as ones in a
let
orwhere
construct) should still be allowed. -
whitelist
Example:
whitelist: [ellipse]
Checks that only symbols appearing in the whitelist are used. This rule and the blacklist rule above are checked after renaming, so it can be used in cases where a standard library symbol needs to be whitelisted but new symbols with the same name (such as ones in a
let
orwhere
construct) should not be allowed.
- Integration with CodeWorld
The more powerful source plugin version of the requirements checker has not yet been integrated with CodeWorld, which is still using the original version, although progress was made towards this by migrating from haskell-src-exts
to ghc-lib-parser
over the course of the project. This is due to some difficulties in getting the plugin functioning with GHCJS. The current state of this integration can be seen at google/codeworld#1074.
- More powerful rules
Now that checks can be performed after renaming and after typechecking just as easily as they can after parsing or on the raw source code, this opens the door for much more powerful rules to be implemented within the requirements checker, which could involve type unification or other static analysis already implemented in GHC. For example, http://hackage.haskell.org/package/inspection-testing shows some very powerful uses of GHC via plugins to prove things about code, which could provide inspiration.
- Runtime testing
This would be harder to implement even with the additional power of a source plugin, but is still theoretically possible. One way would be to have the plugin inject the testing code at the beginning of main
, and at the same time move requirements reporting from a diagnostic output at compile time to a runtime action. Requirements that are checked at compile time would be hard-coded into the result, but dynamic requirements (checked at runtime) would be evaluated on each execution.