Skip to content

Instantly share code, notes, and snippets.

@freekh
Last active August 29, 2015 14:06
Show Gist options
  • Save freekh/da239692d68edd525992 to your computer and use it in GitHub Desktop.
Save freekh/da239692d68edd525992 to your computer and use it in GitHub Desktop.
Another take...

This is a different take on the open questions set out by Adam, but it is maybe a bit too different to what we have already set out to do so I wanted to keep it separate. Maybe there are some ideas here that can be incorporated as we move along.

Intro

The general idea is to use an execution engine that computes which and how to run user specified actions/tasks based which properties a user have set and the task that is invocated (by the user).

Tasks and properties have dependencies to other tasks and properties and it is whether these dependencies have been set and to what that defines whether or how a task should be run. For the execution engine, really only 3 concepts exists: tasks, properties and scopes (see definitions below). An important attribute of properties and tasks is that they are lazy (see later). For convience there are a couple of more concepts which are all built on top of these concepts: component, platform and plugin. Extending such a build system, would mean to define the things that can be done (tasks) and what properties are available and their defaults - the execution engine take care of the rest.

The intent of limiting the amount of concepts, is to make it easier to understand the build system (for plugin/advanced authors) and to make it safer and easier to extend (less moving parts).

The goal is that users describe the build (rather than how to build it), while being pure (for cacheability), i.e. no side-effects, and immutable (after the build has loaded; for parallelizability) and typesafe (for ease of integration). For example: I declare that my build is-a android application and there is both java and NDK source code and it should target: 4.0, 4.1, 4.2, 4.3, 4.4. In this setting, the task 'assemble' would verify that the requirements can be fulfilled, then execute all dependencies to the compilers, then the compilers (possibly in parallel), then package up, and so on and so on.

A secondary goal is to create an ubiquitous set of rules that efficiently scales from translating the intent of a user describing their build, to internal components like incremental compile (imagine being able to add incremental compile to any language by only describing how source files are related and how compiling one file works).

Definitions

Be warned: I have mixed up the naming scheme slightly from https://github.com/gradle/gradle/blob/master/design-docs/configuration-model-open-questions.md to something that I thought fit better. See the section below for the translation. Hopefully this grammar is just as (or more) intuitive - my apologies if it is not. The basic architecture comes from sbt, but has been heavily modified to be easier to use (than sbt) and to achieve our goals (users describes what a build is, instead of imperatively how to execute a build).

  • tasks: Defines a function which will execute each time it is referenced. May depend on other properties and tasks. A task is associated with a key/id, its dependencies and a type. A task may or may not have a cache method that computes its cache value (and persists it in a cache). If the cache value is the same as what the execution engine stored, the task will not execute. When a task is executed its dependencies will be evaluated first. There might be (?) multiple tasks with the same key in the same scope but with different (types of) dependencies. The execution engine will try to find the task with matching keys, scopes and tasks that can be evaluated. If a task has a dependency that exists in multiple sub-scopes, the task will execute for and one of the permutations of the scopes. You cannot change a property or another task when executing a task (pure).
  • properties: An idempotent value that are executed once at build startup and then stored. May depend on other properties, but not on tasks. Just as with tasks, properties have a key/id, dependencies and scopes. They also have a chache, but the value will only change and be loaded during an initialization phase of the build. Properties are therefore faster and should therefore be used where a task is not required.
  • scopes (could also be called variance): A property or task is always scoped, meaning the key will only exist in that particular context. Examples are the scope of a platform, the scope of a component, the scope of a build. Custom scopes can also be defined. There is a precedence of scopes: only one key may exists in a given scope, but this might be overridden by a platform scope (then component, then build). The precedence may also possibly be tweaked. The rationale for unique key/ids is that there is minimal ambiguity of which property/tasks is defined and where they are defined.
  • key/id (not sure what the best name was): Every property, tasks, scope, platform/plugin has a unique key per scope. As a build is loaded (its plugins/platforms are loaded), all available keys are put in a map, with a reference to the code that computes the value or executes something. The keys are what enables us to lazily execute build tasks/properties. The simple set of rules and the lazy loading should also make it possible to parallely load build scripts. Thus, the only overhead of a simple build (only one build script and 1 component) compared to a complex build (lots of build scripts, properties, tasks, components,....) should be the time it takes to calculate its keys, which should be fast. The goal is that a small change (1 line in 1 file) to a simple build should take approximately the same amount of time as the same change in a complex build (containing the same file).
  • plugin: A key which is defined in metadata of gradle or in a separate file or in a buildScript, and which downloads a jar and sets it on the classpath. For convience it is possible to associcate platforms with a plugin key, i.e. using the plugin key means referencing "the platform". In its essence, a plugin really is nothing but a reference to another build script defined somewhere else (in gradle, in some jars somewhere).
  • platform: A platform is a container of a set of component types and the component's common tasks and properties. It is also a scope.
  • component (alternative name: variant - you have a variant of the jvm which is a java library. plays nice if scopes defined here were called 'variance' because you would define the variance of your variant): A component is an instance of a platform component type (or a composite type of multiple component types), which defines a set of properties and tasks keys scoped to the component. Like platform is it also a scope, subscoped by the parent platform. A component may depend on (an)other component(s) which in essence simply means a that the tasks/properties of the child component is included into itself with the child components scope. A component also replaces the notion of project (at least, for a user it translates to the same "thing", i.e. the first thing that must be defined). The rationale behind the name is that a "component" (library, executable, application, ...) is a part of a larger whole: the platform, and the fact that a user can "use" a component, but she/he cannot "use" a project (a project is something a user "has").

Execution algorithm

When gradle is started it will scan for build scripts and load them up. After that the:

  1. First step will be to load the plugins (as we do now) and add them on the classpath.
  2. Then we do another pass of the build script and load all the keys, by creating an instance of the component(s). At this point, there are no values in the system. What we do have are the keys of the properties and tasks, including their dependencies and scopes. At this point we can create a hash the build scripts and create one hash for the combined hashes of the individual build scripts.
  3. Scope precedence rules are then applied. If a scope of a key is "higher" than another it will overwrite the key and link the overwritten key to use the new scope. If a task/property depends on a property that is defined in N scopes (that are not overwritten), we end up with N tasks/properties. (We will make it easy to "disable" tasks, so one can filter out tasks that we absolutely do not want to run).
  4. At this point we have the actual relationship of keys. A user will have executed a task (we can imagine there being a default 'task' which is 'help') and Gradle translates the task to a tree of tasks/properties that must be evaluated.
  5. The execution engine will then calculate if all the dependencies can be fullfilled by the properties/tasks that are defined (this part can be skipped if the cache entry of the build scripts is the same). If a required property/task is not defined, the user would get an error message, e.g. "property jvm.home is not defined." or "task sources for myJavaLibrary is not defined". Note that a task is defined in a scope can depend on a task/property in another scope (or it's own scope by default).
  6. At this point, the execution engine evaluates all properties (properties cannot depend on a task) that are required by the task. Properties may depend on other properties so they will be reqursively evaluated. Note that properties can be cached if they define a hash method. Properties are likely to be quite cachaeble (they do not change very often), so this step should be quite fast.
  7. Then the execution engine starts on the task the user specified by executing all the tasks that it depends on recursively (starting with the top most task in the task dependency tree, then continuing down to the task which was requested). The inputs of a task is hashed and cached as well and can be skipped if there is a cache hit. The build completes when all the work has been completed.

The arguable elegance from this approach really comes from how the dependencies of properties and tasks are computed. The algorithm is simple because there can only be one property/task key per scope with a given set of dependencies, but we can imagine this being quite flexible as well. By providing different dependencies in different scopes for properties and tasks, we, in effect, create a rule engine. As a simple example, imagine we want to compile using different toolChains for our windows and unix builds. To implement this, we have a toolChain task which requires an OS property, and a compile task which requires a toolChain task. If a user specifies 2 different OS properties in 2 different OS scopes, we will end up with 2 different toolChain tasks and 2 different compile tasks. Further continuing our example, the OS property can depend on an architecture property in a limited set of architecture scopes: if the architecture is set to a scope which the OS does not define (OS: sunos and architecture: arm) the execution will fail, prompting the user with an error message saying that there are no property 'os' = sunos for property 'architecture' = arm.

As a more comprehensive example, imagine I want to assemble my project consisting of multiple components (see the code below): a java (backend) library, targetting java8, and 2 cpp projects (frontend and middleware) both targetting x86 for windows and unix. The middleware has the same code for all platforms, but frontend has slightly different sources depending on which platform is built. The java backend uses the middleware through JNI to listen for the requests from the frontend, and the frontend uses the middleware to send messages.

NOTE: I am not sure how the scoping should look exactly in the DSL.

When a user runs 'compile' on the build script below, the 3 'compile' tasks (one for each component), will be evaluated. Since frontend and backend depends on middleware, they must execute before the middleware component is evaluated. The 'compile' task in the respective components requires that a sources task is defined, which therefore will be evaluated before. For the cpp projects 'compile' also require an OS to be defined because the 'toolChain' task, defined in the cpp components, depends on an 'os' property. If the toolChain tasks can evaluate, i.e. return a toolChain for both of these OSs, then we can execute 2 'compile' tasks as soon as all of the dependencies of the tasks have been evaluated. We can also imagine there being a task which the toolChain uses to set itself up, which might be set to empty if a toolChain cannot be setup on a given platform. Once the middleware has completed compile, we will start compiling the frontend and the backend in parallel. In the backend we can also execute 2 java compiles in parallel, because we have 2 binaryVersions. The thing to note is that the mechanics of parallel compile in the 2 components (frontend, backend) and the 2 OS (for middlware) and 2 binaryVersions (for backend) is the exact same mechanic.

//cpp is a plugin/platform (and also a scope) and loads some classes on the platform (defined in some settings)
//if it is hard to do we could also have: cpp = plugin("cpp") before this line somethign like that.
middleware = cpp.Library
frontend = cpp.Executable

//composite component:
//alternatively: composite([...]) or [...].composite or aggregate([...])
//dependending on semantics, we could also imagine having something like this: java.JNI(java.Library) 
backend = [java.Library, java.JNI]

//first up a very simple property:
resources = ["common/resources"] //all components have the same resources

//short hand for setting architecture to x86 (a type of architecture) in the x86 scope
//any task/property that requires architecture will now be fulfilled
architecture(x86)

//set os only for cpp components (cpp is also a platform thus also a scope)
cpp {
  os(windows, unix) //os in scope windows requires architecture in x86, if architecture was mips we would fail because there is no windows with architecture mips.
}


frontend { //scope is now 'frontend'
    //for sources we can imagine there being a sources.srcDir as well, but let us keep it simple for now
    //also note: sources is lazily evaluated (through some AST transform magic)
    sources = file(root).listFiles.filter(f -> f.endsWith("c")) //root is the root directory property and we declare a dependency on it here
    sources(test) ++= [root + "/cpp/frontend/test"] //test is a scope, defined under frontend.sources so and here we just append to them. could also define it like this: sources.test = sources ++ ["cpp/frontend/test"]
    dependencies += middleware //depend on middleware component
}

backend {
    sources(java) = ["java/main/.../"] //we have to scope to java here, because we have 2 sources defined in the backend in 2 scopes, with the same precedence. 
    resources ++= ["java/resources/.."] //to our backend we add some more resources (we are scoped to backend)
    
    sources(java, test) = java.sources ++ ["java/test/.../"] //test scope, we might want to change the position of a scope
    sources(native) = ["jni/c/main"]
    sources(native, test) = ["jni/c/test"]
    
    binaryVersion(java6, java7) //also set binary version, aka output class file version
    
    dependencies(java) += {
        "foo:bar:1.0"
        test { //java, test nested scopes (not sure about this, but it might be interesting)
          "loo:boo:2.0"
        }
    }
    dependencies(native) += {
        middleware //native stuff also depends on the middleware
    }
}

middleware {
    sources = ["/common/"]
    unix { //scope everything below to 'unix'
        sources ++= [".."] //extra unix sources
        sources(test) ++= ["..."] //test resources
    }
    sources(windows) ++= ["///"] //windows sources
    sources(test, windows) ++= ["/test//" ] //windows test sources
}

It is also possible to imagine implementing a generic incremental compile "framework" (sometime in the future, because there might be some pieces still missing) based on tasks, properties and scopes. The essence of an incremental compile is how sources relationships are analyzed and then simple specifying how to build them. The source analytics task depends on a task that on a source file/set of source files and outputs a relationship. In our imaginary example, each language defines a source relation task and a compile task. The compiler task depends on each of the source analytics tasks, and uses scopes to map the relationship. Thus, if a source file changes, the approriate source analytics task will re-execute and then the compiler task and so on and so on...

Key differences

Key differences from https://github.com/gradle/gradle/blob/master/design-docs/configuration-model-open-questions.md There is more or less a 1-1 mapping in terms of concepts, though I have changed their names and added some extra semantics to them (in order to be typesafe and immutable).

  • Tasks defined here combines the "rules" and tasks from the current model, but they cannot create new tasks, they cannot change properties (objects) or other tasks. They are pure and without side effect. This makes it easier to reason about and makes it possible to execute them in parallel. A user can only execute tasks (like before) based on keys/name of the tasks. The properties which are defined in the build script and the scope the user selects, determines which implementation of the task will execute. Each task can have an associated description.
  • Properties are used quite different from a model object in that they have a much simpler structure (in this doc, it is the sum of the properties and tasks that define the model). In fact tasks (as defined in the doc) and properties are almost the same thing, the main difference being how often they are evaluated (tasks each time, properties once on build load).
  • Key/ids in this doc is more or less the same as the "model identity". Note that you can also scope your key/ids.
  • I believe scope in this doc is related to "views", though I am not a 100% sure what the views concept defines in the other doc.
  • In the current model, AFAIUI a platform is a composite value describing the variance of a "platform" (for example: the jdk platform can vary on the target bytecode level, bootstrap classpath, and classpath), which a component has (a JavaBinarySpec (java library) has a platform). In this document a platform is something completely different, i.e. a set of components available (java for example includes the components (java) Application, (java) Library, (java) WebApplication) and the common tasks/properties shared among the components (compile task, toolchain task, target bytecode level, ...)
  • In this document a component is created "on" its platform (the code that defines a component is defined in the platform). The component defines a set of properties and tasks. Different tasks might be possible to execute depending on which properties are set. This is a generic way of defining the capabilities of a component.

Platform/plugin examples:

//file: org/gradle/plugins/jvm/JvmPlatform.java
public class JvmPlatform extends Platform {
    //when loading this class, we execute all methods that returns a Task/Property (could also actually use a constructor to do this): public JvmPlatform(KeyStore keyStore) { keyStore.addAll(binaryVersion(), ...); }

    /**
      * JavaDoc is also used by the execution engine to present information to the user about this property (binaryVersion)
      * 
      */
    //binaryVersion is the "key" for this property, its type is JavaVersion
    //NOTE: properties/tasks are lazy
    //one alternative is to describe them like this, or we could do reflection to avoid this type of deep nesting. Logically it would look like this though. A user could use java 8 lambdas to do the same in which case it would be much simpler to write/read.
    public final Property<JavaVersion> binaryVersion =
      new Property<JavaVersion>() {
        @Override public JavaVersion get() {
          return JavaVersion.current();
        }
      };
    //equivalent in ~java8:
    //property.create( () -> JavaVersion.current() );
    //We could also make these fields be methods (in the beginning I did it like that), but there is really no point in it so...
}


//file: org/gradle/plugins/java/JavaPlatform.java
public class JavaPlatform extends JvmPlatform { //JavaPlatform inherits all properties/tasks from JvmPlatform
    public JavaLibraryComponent Library() {
        return new JavaLibraryComponent();
    }
    
    public class JavaLibraryComponent() extends JvmPlatform.Component { //not sure if we can write it like this, but you get the picture
        //add more specific tasks (publish, ...) or properties to the tasks/properties shared in the current platform: JavaPlatform
    }

    public final Task<CompileResult> compile = 
      //realistically we would have Task1<A>, Task2<A,B>, etc etc. examples below are abbreviated for clearity (or use method overloading to reduce the pain a bit)
      new Task<CompileResult>(targetCompatibility, sources) { //need classesDir, etc etc as well, but you get the picture...
        @Override public CompileResult execute(JavaVersion binaryVersion, List<File> sources) { //a CompileResult can be returned only because it is hashable
            return doCompile(new JavaOptions(binaryVersion, sources))
        }

        @Override public Hash inputHashBasedHash() { //compute hash based on input hashes (most common use case)
            new Hash() {
                public Long get(Long targetCompatibilityHash, Long sourcesHash) {
                    return targetCompatibility + sourcesHash; //whatever...
                }
            }
        }
    };
    

    private Task<List<File>> sources(Scope scope, String currentDir) { //private so will not be executed (and the signature includes parameters)
        return new Task<List<File>>(baseDirectory) {
            @Override public List<File> execute(File baseDirectory) {
                return getAllJavaFiles(currentDir, baseDirectory);
            }

            @Override public Hash resultValueBasedHash() { //we need the result of this task to calculate its hash, normally this method returns the inputHashBasedHash, but in this case we cannot create a hash based on the input
                return new Hash() {
                    @Override public Long get(List<File> files) {
                        return hashFiles(files);
                    }
                }
            }

            //for files we will eventually need to store separate hashes for each file
            //the reason is that some tasks should be able to listen to file system changes and update itself based on that
            //if it is not possible we can simply have register itself as a listener on a directory or a set of file and then run the entire hash when there is a change
        };
    }
    
    public final Scope test = scope(this, "test"); //create a sub scope here

    public final ScopedTask<List<File>> sources = ScopedTasks.create( //want the same key/id for multiple scoped tasks. ScopedTask is subtype of Task (of course)
      sources(this, "main"), //'this' == the platform which is also a scope
      sources(test, "test")  //scoped to test
    );

    protected static CompileResult doCompile(JavaOptions options, List<File> files) { //this is protected or private and in any case does not return a task: will not be executed by the execution engine
        //we could split a platform up in a definition where tasks and settings are created and logic which only contains static methods. because of single inheritance of java, we would have JavaPlatformLogic extends JvmPlatform then JavaPlatform extends JavaPlatformLogic...
        //do work...
        return result;
    }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment