J-Vernay/native_build_proposal.md

## native_build_proposal.md

      
    Raw
  

              native_build_proposal.md
            
          
    (DRAFT) Proposal for <native_build> in C++23

Abstract

This document proposes an extension of the C++ standard library.
This is an informal draft for a proposal: its aim is to get feedback from other C++ users and implementers.
Basically, this proposal provides a starting point to use C++ as a basis for C++ build systems.
For this, it provides the minimal C++ header <native_build> which is implemented by the compiler maintainers.
It defines new types (which are basically strong-typed alias of std::filesystem::path),
a new templated function build(...) for building the project, and run(...) for running an executable.
It has also a permissive requirement: programs using <native_build> are only required to work on the machine
which has compiled the program. Thanks to that, it is not needed to embed compiler parts in a program.
I. Motivation

There are multiple ways to build a C++ project.
Depending on the complexity of the project, a shell script suffice.
Low-level generic build systems such as make or ninja can also be used.
These simple solutions often imply to be platform-specific and require specific tools
and/or a lot of configuration (defining system variables for instance).
There are higher-level build systems such as Meson, CMake, build2, premake...
However, each of these solutions require the understanding of a new language, different from C++.
In some higher-level build systems, the user interacts with it within the IDE.
This is the case for MSVC for example. This solution lacks of interoperability.
This document proposes to use C++ as the base language for build systems.
This allows C++ coding styles and good practices to apply to project building.
The scope is only to provide a minimal core, so that build systems would them be C++ libraries, instead of tools.
This may improve inter-operability between other tools which need to cooperate with the build system,
such as a package manager or a code generator.
II. Impact on the Standard

The entire proposal is found in a new header <native_build> and in the namespace std::native_build.
As such, it does not alter any existing standard library API.
III. Design

Naming

The name native_build was chosen to point out that resulting libraries and programs are meant to be used on the user machine. This is the opposite of cross-compilation.
Cross-compilation (when the compiled files cannot be used on the compiler's machine) is out of scope for this library.
It is thought that this is a work for a library (which may or may not be standardized later).
Scope

This proposal does not provide a way to replace all build systems out of the box.
Its aim is only to provide a solid fundation on which libraries can build more complex behaviour.
As such, there is for example no support for "smart-building" (do not recompile files whose sources have not changed).
Such support would require caching timestamps of dependencies, and define a representation of a dependency graph.
This is out of scope for this proposal, so compiler maintainers have a minimal API to implement.
These functionalities may later be standardized in the standard library.
Implementation requirements

The <native_build> header is not provided bystandard library maintainers.
Instead, it is provided by compiler maintainers.
A program which uses <native_build> is required to work only on the machine which compiled the program.
Such a program is not meant to be shared across machines in a compiled form (neither in precompiled form).
Only the original source file can be shared.
This is usual in other build systems: for instance, intermediate files of CMake use absolute paths which are probably wrong when shared to another machine.
This permissive requirement allows a compiler to implement <native_build> with the knowledge that this compiler is installed on the machine. So it is not needed to embed a whole compiler in the resulting program.
An implementation can simply wrap the command-line API of the compiler (using std::system as a basis for instance).
Another implementation may rely on the presence of a shared library inside the compiler installation directory.
Impact on projects

Projects are usually composed of source files, which translate in libraries and programs, and instruction files (whether configuration or script) to build these libraries and programs.
With this proposal, there is at least one new program which must be compiled, whose purpose it to build the rest of the project.
This program is built with a single translation-unit, to avoid the chicken-or-egg problem.
This means that libraries taken advantage of <native_build> must be header-only.
If a more complex program is needed, a possible solution would be that the single-translation-unit program builds an intermediate and more complex program whose purpose is to build the rest of the project.
Another solution may be to standardize dynamic loading from shared libraries.
This is out of scope for this proposal.
Build process overview

This proposal defines a more detailed, linear sequence of events when a project is built:

Level 0: Source file (the file written by the programmer)
Level 1: Translation unit (the source file after being pre-processed)
Level 2: Object (the translation unit after being compiled)
Level 3: Archive (one to several objects bound together, possibly with other archives)
Level 4: Library (one to several archives which have been linked, possibly with other libraries)
Level 5: Program (a library with the main() entry point)

NOTE: terminology may be changed based on feedback.
API

Includes

<native_build> has the following structure (not including header guards):
#include <filesystem>
#include <initializer_list>
#include <string_view>
#include <variant>

namespace std::native_build {
    using std::filesystem::path;

    /* rest of the API */
}
Types

First there is the tag type no_renaming_t, whose purpose is explained later:
struct no_renaming_t {};
constexpr no_renaming_t no_renaming;
This proposal contains 6 similar types: source, translation_unit, object, archive, library and program.
They are all defined by the following:
// "____" is one of "source", "translation_unit", "object", "archive", "library", "program"
class ____ {
    ____(path p);
    ____(path p, no_renaming_t);
    auto get_path() -> path const&;
};
By default, an implementation should rename the filename of the path p to match the conventions of the specific platform.
For instance, object{"tmp/hello"}.get_path() would return tmp/hello.o on GNU/Linux and tmp/hello.obj on Windows.
library{"build/hello"}.get_path() would return build/libhello.so on GNU/Linux and build/hello.dll on Windows.
So the actual filename is implementation-defined.
The overload with no_renaming_t prevent this renaming. So library{"build/hello.custom", no_renaming}.get_path() would return build/hello.custom on all platforms and implementations.
NOTE: maybe an implicit or explicit conversion to std::filesystem::path should be added?
build_file is a variant which can contain source, translation_unit, object, archive, library or program.
using build_file = std::variant<source, translation_unit, object, archive, library, program>;
Function build(...)

This proposal exposes one templated function:
template<typename Output,
         typename BuildFileList  = initializer_list<build_file>,
         typename PathList       = initializer_list<path>,
         typename StringViewList = initializer_list<string_view>>
void build(Output          output,
           BuildFileList   inputs,
           PathList        include_paths = {},
           StringViewList  options = {} );
With:

Output is one of translation_unit, object, archive, library or program.
BuildFileList is a range of build_file
PathList is a range of path
StringViewList is a range of string_view

It does produce the output file using the inputs, searching includes in include_paths, and using the options in options.
It raises two types of exceptions:

Derived from std::invalid_argument when an error is due to the building program.
Derived from std::runtime_error when an error is due to the inputs being not processable.

The default types being initializer_list allows the following call:
build(program{"hello"}, { source{"hello.cpp"}, source{"main.cpp"} }, {}, { "c++17" } );
Logical behaviour

The following paragraphs describe the logical behaviour of build(...) depending on output and inputs.
They are logical behaviour, but implementations may take shortcuts to avoid generating intermediary files for instance.
When Output is translation_unit

Pre-conditions:

inputs must be a range of exactly one source

Logical behaviour:

Preprocess the provided file in inputs into output.

When Output is object

Pre-conditions:

inputs must be a range of exactly one element, this element being of type source or translation_unit.

Logical behaviour:

If inputs has a source, preprocess it into a temporary file, else do nothing.
Compile the preprocessed file into output.

When Output is archive

Pre-conditions:

inputs must be a range of at least one element, each element being of type source, translation_unit, object or archive.

Logical behaviour:

If some elements are source, preprocess them into one translation_unit for each source.
If some elements are translation_unit, compile them into one object for each translation_unit.
Group all objects in an archive.
Group all archives into output.

When Output is library

Pre-conditions:

inputs must be a range of at least one element, each element being of type source, translation_unit, object, archive or library.

Logical behaviour:

If some elements are source, preprocess them into one translation_unit for each source.
If some elements are translation_unit, compile them into one object for each translation_unit.
Group all objects in an archive.
Link all archives in a library.
Link all libraries into output.

When Output is program

Pre-conditions:

inputs must be a range of at least one element, each element being of type source, translation_unit, object, archive or library.

Logical behaviour:

If some elements are source, preprocess them into one translation_unit for each source.
If some elements are translation_unit, compile them into one object for each translation_unit.
Group all objects in an archive.
Link all archives in a library.
Link all libraries into output, making sure there is a main() entry point.

Include paths

The function build(...} accepts specifying include paths, in which #include will search for files.
All of these paths must point towards directories.
These include paths are by definition useful only for the preprocessor.
If the logical behaviour of build(...) does not involve the preprocessor, then include_paths is silently ignored.
Options

Options are described in ASCII string views. The interpretation of bytes outside the ASCII table is implementation-defined.
They are combined either by passing  multiple string_view to build(...), or by concatenating them in a string_view separated by at least one space (spaces are the bytes: \t,  , \r, \n ).
Here is an example of equivalent calls:
build(..., ..., ..., { "option1", "option2", "option3" });
build(..., ..., ..., { "option1 option2", "option3" });
build(..., ..., ..., { "option1 option2 option3" });
If contradicting options appear, the last option is taken and the others are discarded.
This allows for providing default options and option overloading.
Options are like C++ attributes: ignoring them does not prevent correct compilation.
NOTE: Options are currently not well-defined in this draft.
More feedback is waited about current usages of flags and it would require compiler cooperation
to define together what would be standard names for flags (notably warnings, features and optimizations).
Examples:

Language standard. By default, the latest stable C++ standard supported by the compiler should be used.
However, the following language options are defined to respect a precise standard:
c89 c99 c11 c++98 c++11 c++14 c++17 c++20.
The C standards are provided because most C++ compilers can also compile C, and it is common to mix C and C++ projects.
Features. By default, all supported features of C++ are enabled.
However, it is currently common to disable some features, either for performance, safety or coding style.
These options are all opt-out: no-exception no-rtti no-filesystem no-regex, etc.
Warnings. More stricter diagnosis of C++ code can be wanted.
warning-0 warning-1 warning-2
Optimizations. opti-space opti-size opti-middle opti-none

There may be also implementation-defined options, which would start by a special token. Theses options should be safely ignorable.
NOTE: maybe - like gcc and clang? or / like msvc? or another convention?
Function run(...)

The run(...) function is a wrapper of std::system.
template<typename Arguments = initializer_list<string_view>>
auto run(program p, Arguments args = {}, path working_dir = "") -> int;
With Arguments being a range of values convertible to string_view.
This function executes the program at the location p using args, from the working directory working_dir.
Arguments are passed exactly in this way to the program.
This means that the implementation correctly escape characters, so that the program's main's argc is equal to args.size().
The path for the working directory is relative to the current working directory, not the directory of the program being executed.
The return type must be the return type of the program, and not an error code from the shell.
For instance, if the program is not found or if it has provoked a segfault, an exception should be thrown, instead of an error code being returned.
NOTE: is it possible to distinguish return values from the program and from the operating system?
Example of calls:
int retval = run(program{"hello"});
retval = run(program{"hello"}, { "Mister" });
Example of use case in a project building file:
#include <native_build>
using namespace std::native_build;

constexpr auto VERSION = "2.5.0";

int main() {
    build(program{"build/codegen"}, source{"codegen/main.cpp"});
    run(program{"build/codegen"}, { "src/version.hpp.in", "build/include/version.hpp" });
    build(program{"build/hello"}, source{"src/hello.cpp"}, { "build/include/" });
    return 0;
}
IV. Examples

Simple hello world project

# Project layout
src/
  hello.cpp
  hello.hpp
  main.cpp
build.cpp
build/
  <empty>

// build.cpp
#include <native_build>
using namespace std::native_build;

int main() {
    build(program{"build/hello"}, { source{"src/hello.cpp"}, source{"src/main.cpp"} });
    return 0;
}
Project with unit tests

# Project layout
src/
  mylib/
    myclass.hpp
    myclass.cpp
  myexe/
    main.cpp
tests/
  catch.hpp
  main.cpp
  myclass.cpp
build.cpp
build/
  <empty>

// build.cpp
#include <native_build>
using namespace std::native_build;

int main(int argc, char** argv) {
    bool should_run_tests = false;
    if (argc >= 2 and std::string_view("--run-tests") == argv[1])
        should_run_tests = true;
    
    build(library{"build/mylib"}, { source{"src/mylib/myclass.cpp"} }, { "src/" });
    build(program{"build/myexe"}, { library{"build/mylib"}, source{"src/myexe/main.cpp"} });
    build(program{"build/tests"}, { library{"build/mylib"}, source{"tests/main.cpp"}, source{"tests/myclass.cpp"} });
    
    if (should_run_tests)
        run(program{"build/tests"});
    
    return 0;
}
Project with code generation

# Project layout
src/
  main-project/
    main.cpp
    version.hpp.in
  code-gen/
    main.cpp
build.cpp
build/
  include/
    <empty>
  <empty>

// build.cpp
#include <native_build>
using namespace std::native_build;

int main() {
    build(program{"build/code-gen"}, { source{"src/code-gen/main.cpp"} });
    run(program{"build/code-gen"}, { "src/main-project/version.hpp.in", "build/include/version.hpp" });
    build(program{"build/main"}, { source{"src/main-project/main.cpp"} }, { "build/include" });
    return 0;
}
Project with package manager

# Project layout
src/
  main.cpp
build-deps/
  my-package-manager.hpp
build.cpp
build/
  <empty>

// build.cpp
#include <native_build>
#include <vector>
#include "build-deps/my-package-manager.hpp"
using namespace std::native_build;
using std::vector;

int main() {
    auto SDL = my_package_manager::require_library("SDL2", "2.0.0");
    
    vector<build_file> build_files = SDL.build_files();
    build_files.push_back("src/main.cpp");
    vector<path> include_paths = SDL.include_paths();
    
    build(program{"build/my-app"}, build_files, include_paths);
    return 0;
}
V. Future thoughts

Dynamic loading

To counterbalance the requirements of the building program being a single translation unit,
it may be possible to add a specification about dynamic loading.
It was not done because of the author's lack of experience using dynamic loading.
Such a specification could declare the function:
template<typename Signature>
auto load_function(library, string_view which_language, string_view name) -> Signature*;

// Example of use:
auto func = load_function<int(float, float)>(library{"hello"}, "C", "my_func");
int result = func(0.0f, 2.0f);
With the additional benefit that which_language defines the extern specification of the function.
For instance:
extern "C" void c_func();
extern "C++" void cpp_func();
cpp_func could be loaded whith which_language == "C++".
Mangling may be solved because the signature of the function is passed in template.
So a build system would simply do:
build(library{"git"}, source{"build_src/git.cpp"});
auto git_clone = load_function<void(string_view, path)>(library{"git"}, "C++", "gitwrapper::clone");
git_clone("https://.../mydependency", "dependencies/");