odersky/standalone-scala.md

## standalone-scala.md

      
    Raw
  

              standalone-scala.md
            
          
    Standalone Scala

Or: Simpler Scala Tooling

(Draft, 20-Feb-2020)
Today, almost all Scala tooling depends on a build tools. This has several undesirable consequences:

The beginning Scala programmer has to learn the language and a build tool.
There are several possible build tools (e.g. sbt, mill, Maven, gradle, bazel), which leads to confusion and fragmentation.
The historically most widely used build tool, sbt, has a reputation of being very complex, which contradicts our positioning of scala as a great tool for beginners.
The meaning of a Scala codebase is not self-contained, it depends on what the  build tool makes of it.
The standard commands such as scalac and scala are so primitive that they give a bad initial impression. In particular, having to reload the JVM and compiler on each run gives an undeserved reputation of slow compile times.

Compare with other well-liked ecosystems, such as Go or Python where there is usually not a
split between the language proper and a build tool, at least not for beginners and programmers wanting to do just simple stuff. Or with Rust with its build tool cargo which is both very simple and ubiquitous (in fact cargo and go play similar roles in their respective ecosystems).
To address these problems, we propose two measures:


Provide a way to express external dependencies directly in Scala programs, without going through a build tool. So far that feature exists only for Ammonite scripts and it is greatly appreciated there.


Unify the scala and scalac and scaladoc commands in a single scala command that
serves for compiling (including incrementally), testing, documenting and running code and scripts. As before, scala by itself starts a REPL.


These proposals are described in detail in the next two sections.
Dependency Handling

So far, Scala programs do not have a completely defined meaning by themselves since they depend on components (such as class files) given elsewhere. What class files a program depends on is determined by the current contents of the classpath, if compiled and run standalone, or the library dependencies given in an external build tool. An exception are Ammonite scripts that can contain imports referring to Maven artifacts. We propose to generalize this by admitting external imports in the language. An example of an external import is:
package myApp
import "org.scalameta :: munit : 0.4.3" as munit
This import defines the munit package to consist of all the class files and sub-packages that are defined under this package name in the Maven artifact org.scalameta :: munit : 0.4.3. External imports contain a string literal describing the external artifact and a qualified name defining the package. The defined name must match one of the packages in the artifact. For instance
in the import above, the name munit must match one of the toplevel packages defined in the Maven artifact.
External imports can be defined only at the toplevel. An external import is visible in the whole package in which it is defined, or, if it appears without a package clause, in the whole project. For instance, the program fragment above would declare the package name munit everywhere in the myApp package (but not outside it).
A sensible convention would be to collect all external imports of a project in a root file such dependencies.scala. This, however, is not required. One could also have external imports that are only visible in a subpackage by placing them
under that subpackage. However, a package name can only be associated with a single external dependency in a project. That is, if there are multiple import
clauses for that package name, they must all refer to the same external dependency.
With these conventions, package names are now resolved just like normal identifiers: An external import is a defining occurrence for a package name. Its visibility is package private to the package that contains the import. In other words, for the purposes of name resolution, the munit import above would be treated somewhat analogously to an object definition like
package myApp
private[myApp] object munit:
  ...
Note however a subtle difference: When imported, munit is a toplevel identifier. It cannot be referred to as myApp.munit, which is the case for the local object.
The imported name can also be qualified. Example:
import "org.typelevel :: cats-effect : 2.3.1" as cats.effect
This defines the package cats.effect, as coming from the given artifact. It would not conflict with another separately imported subpackage of cats.
Variations


An artifact can contain more than one toplevel package. Multiple packages can be defined by multiple external imports that have the same artifact descriptor. Alternatively, we could also allow an external import to define more than one package, like this:
import "mycompany :: multi-package : 1.2.3" as {packageA, packageB}

The external descriptor is just specified as a string literal. Its meaning is platform dependent. On the JVM, Maven artifacts are standard. Elsewhere, or in addition, one could also
interpret the string as a URL or as a file name, or as a directory name of a subproject.

The scala Command

The general form of a scala command is
scala <runtime-options> [subcommand]

Runtime options go to the runner (e.g. on the JVM they go to the java command).
Subcommands include compile, test, run, doc, clean. Subcommands can be followed
by further arguments and options. Options are generally left out in the examples that follow. If there is no subcommand, we start the interactive shell. I.e.
> scala

starts the REPL. Most subcommands can also written directly in the REPL, omitting the scala prefix.
The first use of scala with a subcommand will start a nailgun server that will serve all subsequent requests in a warm JVM instance. Subsequent scala commands will use the same server, with the runtime options as given initially.
If a subsequent scala command comes with new runtime options which are different from the ones given the last time, the nailgun server will be restarted with the new options.
> scala compile <directory>

compiles all .scala files in the given file directory and its subdirectories. Can use incremental compilation. Incremental builds are performed automatically before test and run commands.
> scala compile

if equivalent to scala compile ..
> scala compile file₁.scala ... fileᵢ.scala

compiles exactly the given scala files and nothing else.
> scala doc
> scala doc <directory>
> scala doc file₁.scala ... fileᵢ.scala

Generates documentation for .scala files in the currrent or given directory and its subdirectories,
or if individual files are given, for justs those files.
> scala test
> scala test <directory>
> scala test file₁.scala ... fileᵢ.scala

Tests all methods annotated with @Test in the current or given directory and its subdirectories (and in the current REPL script when in REPL), or if individual files are given, tests justs those files.
An option --only <regexp> restricts tests to methods with names matching the regular expression <regexp>.
> scala run a.b.Main <arguments>

Runs the static main method in the class with fully qualified name a.b.Main. Typically, that class is generated by the Scala compiler from a method annotated with @main. This command cannot be invoked in the REPL.
> scala run file.scala <arguments>
> scala run <directory> <arguments>

Runs the unique matching main method annotated in the given file, or in the given directory and its subdirectories. A method is a matching main method if it is annotated with @main and it can be passed the given arguments. If there is no matching main method, or there is more than one, an error is reported. In the REPL, the last matching main method in the REPL script takes precedence.
This command cannot be invoked in the REPL.
> scala a.b.Main <arguments>

Equivalent to scala run a.b.Main <arguments>. This corresponds to the current scala runner.
> scala file.scala <arguments>

Equivalent to scala run file.scala <arguments>. This is the simplest way to run Scala scripts.
> scala new
> scala new <directory>

Creates a Scala project at the current location or at the given directory. If the directory does not exist, one is created. A Scala project defines a target directory for build commands. By default, all build commands put their generated artifacts (such as classfiles, tasty files, or doc pages) into the closest enclosing target directory. That repository also holds the state needed to do incremental compilation. The target can be overridden as usual by the -d setting. If neither a target directory is found nor a -d option is given, the user is prompted whether a new project should be created.
> scala subproject
> scala subproject <directory>

Creates a subproject at the current location or at the given directory. If the directory does not exist, one is created. Subprojects have their own target directories. When compiling/testing/building
a project, sources in subprojects are not included. Subprojects can be referred to in dependencies
by pointing to their directory name. E.g. if a project has subprojects base and web, and base defines package pkg.core, the web project could refer to pkg.core by adding a dependency
import "../base" as pkg.core

Subproject paths in external imports are computed relative to the root of the (sub-)project containing the import. Incremental compilation also builds sub-projects as needed if their files have changed. (Subprojects could be left out in an initial version of the Scala command).
> scala build
> scala build <directory>

Creates a jar file target.jar next to the enclosing target subdirectory that contains the compiled versions of all .scala files in the current or given directory. The directory name can be used to drop tests from the build. For instance,
if our project directory has subdirectories main and test, the command scala build main would build a jar that does not include the tests.
> scala assemble fileName

Converts a previously built target.jar into an executable with given fileName.
> scala clean

Deletes all artifacts in the target directory, so that a clean compile is forced the next time.
> scala help

Prints info on available commands as well as what the current settings and target directory are.
> scala shutdown

Shuts down the nailgun-server. The server will also shutdown automatically after an extended period of inactivity.
Special Options

--watch

Applies the command continuously each time its inputs have changed, until another command is given.
Configurability

How to deal with configuration is left open for the moment. One could imagine a config file in the project root that defines default settings for various commands. Settings given for individual commands on the command line could override config settings. Or we delegate config completely to the
OS via shell variables. In each case, the algorithm to decide how to pass settings to a command should be as follows:

The default settings for a command are as given by config or shell variable. If none is given
they are empty.
The given settings are as given on the command line itself.
Given settings for the same option override default settings. For instance, if -source 3.1 is in the default settings for the compile command, but -source 3.2 is given on the command line,
then -source 3.2 applies.

Extensibility

Some of the described commands can be implemented with plugins to the Scala runner. For instance, we could forward the scala test command to an installed scala-test.jar plugin. New commands can be added in the same way. Generally, a command scala xyz would forward to an installed scala-xyz.jar plugin that has to conform to a standard interface. Plugins are run in separate classloader instances in the nailgun server.
Notes


One can generalize the fixed annotations @main and @Test to also support other user-defined annotations that are identified by a meta annotation as a main entry point or as a test.


A possible extension would also handle .java files.


The same functionality should also be available programmatically, including using a serial protocol usch as bsp,


The proposed command unifies Scala source files and scripts. Any scala file that contains a main method can be run as a script. To make this work well, we should drop the few remaining restrictions of source files vs scripts. In particular, source files should be allowed to have toplevel commands other than definitions.


How Is This Different from Yet Another Build Tool?

A build tool is by its nature a program that executes a program (the build script) that in turn generates a program. This double meta-programming aspect makes build tools very powerful but also presents a hurdle to get started.
The mental picture of non-expert programmers is quite different: They just want to deal with one program: the one they write in source. They want to write it, get feedback on errors, test parts of it and finally run it. The fact that there are different computer generated artifacts for that program (e.g. class files, jars, or tasty files) is of no concern and should be made invisible to them.
For more complicated tasks, the generated artifacts do become interesting as objects themselves, and that's when programmers are ready to use a build tool.
Besides executing programs that generate programs, build tools traditionally also handle dependency management, config, and incrementality. These tasks can also be expressed as generation tasks. But they don't need to be, and arguably it's better not to entangle these aspects with each other.


Dependency management is essentially name space management. We describe how to link an internal name to an external artefact. The traditional divide that a build tool analyzes dependencies and then dumps
the resulting artifacts in a global classpath variable that gets picked up by the compiler looks wrong, since it is stateful (order in the classpath matters) and untyped. It's much preferable to let the source handle this directly. That's what everybody (including Java) seems to move to, anyway.


Config could be part of a build script, but we believe that at least basic config tasks are better handled separately. For instance, this makes it much easier to handle config interactively such as by editing a settings panel in an IDE.


Finegrained incremental compilation requires very detailed knowledge of program structure that needs to be defined by the language and provided by the compiler. It's best encapsulated in a module that is made available to build tools, so that they don't need to re-invent the wheel here. The Zinc "compiler" is an example. The idea would be that something like it is used as a common basis for build tools and the scala command.


To summarize, the proposed scala command will have to integrate some functionality that's so far implemented in build tools, e.g. incremental compilation, file watching, nailgun integration.
The scala command runner could provide these services to build tools via its programmatic interface.
In that sense it is quite close to Bloop. Unlike Bloop, which is primarily intended as a complement to a build tool, the scala command is primarily intended to be used standalone. We should investigate whether we can evolve Bloop to be the build server that handles all scala requests.