Skip to content

Instantly share code, notes, and snippets.

@leocassarani
Created November 30, 2011 00:08
Show Gist options
  • Save leocassarani/1407277 to your computer and use it in GitHub Desktop.
Save leocassarani/1407277 to your computer and use it in GitHub Desktop.
Generating Ruby documentation using static type inference

With the growing size of software frameworks and libraries, documenting code is becoming more and more important. This is especially true for dynamic languages such as Ruby, where type information is entirely absent and developers often rely on documentation provided by the authors of these frameworks and libraries. Nevertheless, the quality and completeness of code documentation is still very low even for many well known and widely used frameworks (e.g. Ruby on Rails).

The goal of this project is twofold. First, to write a byte-code evaluator that will analyse Ruby code and will attempt to infer the type of all parameters and return values for all methods of every class in the given project. Second, to use that information, together with contextual information such as class and method names, to annotate the methods with a brief summary and type signatures using documentation comments.

Thus an undocumented class such as:

class Project
  def find_by_name(name)
    [...]
  end
end

will be annotated with the following comments (YARD format assumed):

class Project
  # Finds a project by name
  #
  # @param [String] name the project name
  # @return [Project] the project
  def find_by_name(name)
    [...]
  end
end

The developer may then manually expand upon and improve the generated comment, e.g. by documenting the case where a project with the given name cannot be found.

The implementation will be based upon the Rubinius VM, a self-hosted open-source Ruby VM implementation, and in particular its specific byte-code format. This will make it possible to analyse and propagate type information for the generic types provided by the Ruby kernel and standard library.

While the proposed approach may not work in the presence of code which makes heavy use of dynamic features, e.g. self-modifying code, the aim is to narrow down type information using a "best effort" approach by employing various heuristic (e.g. locality-aware frequency analysis of type usage) and resigning to a more generic type in the case where precise type information cannot be inferred.

The project will be co-supervised by Petr Hosek.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment