This document proposes an extension of the C++ standard library.
This is an informal draft for a proposal: its aim is to get feedback from other C++ users and implementers.
Basically, this proposal provides a starting point to use C++ as a basis for C++ build systems.
For this, it provides the minimal C++ header <native_build>
which is implemented by the compiler maintainers.
It defines new types (which are basically strong-typed alias of std::filesystem::path
),
a new templated function build(...)
for building the project, and run(...)
for running an executable.
It has also a permissive requirement: programs using <native_build>
are only required to work on the machine
which has compiled the program. Thanks to that, it is not needed to embed compiler parts in a program.
There are multiple ways to build a C++ project. Depending on the complexity of the project, a shell script suffice.
Low-level generic build systems such as make or ninja can also be used. These simple solutions often imply to be platform-specific and require specific tools and/or a lot of configuration (defining system variables for instance).
There are higher-level build systems such as Meson, CMake, build2, premake... However, each of these solutions require the understanding of a new language, different from C++. In some higher-level build systems, the user interacts with it within the IDE. This is the case for MSVC for example. This solution lacks of interoperability.
This document proposes to use C++ as the base language for build systems. This allows C++ coding styles and good practices to apply to project building. The scope is only to provide a minimal core, so that build systems would them be C++ libraries, instead of tools. This may improve inter-operability between other tools which need to cooperate with the build system, such as a package manager or a code generator.
The entire proposal is found in a new header <native_build>
and in the namespace std::native_build
.
As such, it does not alter any existing standard library API.
The name native_build
was chosen to point out that resulting libraries and programs are meant to be used on the user machine. This is the opposite of cross-compilation.
Cross-compilation (when the compiled files cannot be used on the compiler's machine) is out of scope for this library.
It is thought that this is a work for a library (which may or may not be standardized later).
This proposal does not provide a way to replace all build systems out of the box. Its aim is only to provide a solid fundation on which libraries can build more complex behaviour. As such, there is for example no support for "smart-building" (do not recompile files whose sources have not changed). Such support would require caching timestamps of dependencies, and define a representation of a dependency graph. This is out of scope for this proposal, so compiler maintainers have a minimal API to implement. These functionalities may later be standardized in the standard library.
The <native_build>
header is not provided bystandard library maintainers.
Instead, it is provided by compiler maintainers.
A program which uses <native_build>
is required to work only on the machine which compiled the program.
Such a program is not meant to be shared across machines in a compiled form (neither in precompiled form).
Only the original source file can be shared.
This is usual in other build systems: for instance, intermediate files of CMake use absolute paths which are probably wrong when shared to another machine.
This permissive requirement allows a compiler to implement <native_build>
with the knowledge that this compiler is installed on the machine. So it is not needed to embed a whole compiler in the resulting program.
An implementation can simply wrap the command-line API of the compiler (using std::system
as a basis for instance).
Another implementation may rely on the presence of a shared library inside the compiler installation directory.
Projects are usually composed of source files, which translate in libraries and programs, and instruction files (whether configuration or script) to build these libraries and programs.
With this proposal, there is at least one new program which must be compiled, whose purpose it to build the rest of the project.
This program is built with a single translation-unit, to avoid the chicken-or-egg problem.
This means that libraries taken advantage of <native_build>
must be header-only.
If a more complex program is needed, a possible solution would be that the single-translation-unit program builds an intermediate and more complex program whose purpose is to build the rest of the project. Another solution may be to standardize dynamic loading from shared libraries. This is out of scope for this proposal.
This proposal defines a more detailed, linear sequence of events when a project is built:
- Level 0: Source file (the file written by the programmer)
- Level 1: Translation unit (the source file after being pre-processed)
- Level 2: Object (the translation unit after being compiled)
- Level 3: Archive (one to several objects bound together, possibly with other archives)
- Level 4: Library (one to several archives which have been linked, possibly with other libraries)
- Level 5: Program (a library with the
main()
entry point)
NOTE: terminology may be changed based on feedback.
<native_build>
has the following structure (not including header guards):
#include <filesystem>
#include <initializer_list>
#include <string_view>
#include <variant>
namespace std::native_build {
using std::filesystem::path;
/* rest of the API */
}
First there is the tag type no_renaming_t
, whose purpose is explained later:
struct no_renaming_t {};
constexpr no_renaming_t no_renaming;
This proposal contains 6 similar types: source
, translation_unit
, object
, archive
, library
and program
.
They are all defined by the following:
// "____" is one of "source", "translation_unit", "object", "archive", "library", "program"
class ____ {
____(path p);
____(path p, no_renaming_t);
auto get_path() -> path const&;
};
By default, an implementation should rename the filename of the path p
to match the conventions of the specific platform.
For instance, object{"tmp/hello"}.get_path()
would return tmp/hello.o
on GNU/Linux and tmp/hello.obj
on Windows.
library{"build/hello"}.get_path()
would return build/libhello.so
on GNU/Linux and build/hello.dll
on Windows.
So the actual filename is implementation-defined.
The overload with no_renaming_t
prevent this renaming. So library{"build/hello.custom", no_renaming}.get_path()
would return build/hello.custom
on all platforms and implementations.
NOTE: maybe an implicit or explicit conversion to std::filesystem::path
should be added?
build_file
is a variant which can contain source
, translation_unit
, object
, archive
, library
or program
.
using build_file = std::variant<source, translation_unit, object, archive, library, program>;
This proposal exposes one templated function:
template<typename Output,
typename BuildFileList = initializer_list<build_file>,
typename PathList = initializer_list<path>,
typename StringViewList = initializer_list<string_view>>
void build(Output output,
BuildFileList inputs,
PathList include_paths = {},
StringViewList options = {} );
With:
- Output is one of
translation_unit
,object
,archive
,library
orprogram
. - BuildFileList is a range of
build_file
- PathList is a range of
path
- StringViewList is a range of
string_view
It does produce the output
file using the inputs
, searching includes in include_paths
, and using the options in options
.
It raises two types of exceptions:
- Derived from
std::invalid_argument
when an error is due to the building program. - Derived from
std::runtime_error
when an error is due to the inputs being not processable.
The default types being initializer_list
allows the following call:
build(program{"hello"}, { source{"hello.cpp"}, source{"main.cpp"} }, {}, { "c++17" } );
The following paragraphs describe the logical behaviour of build(...)
depending on output
and inputs
.
They are logical behaviour, but implementations may take shortcuts to avoid generating intermediary files for instance.
Pre-conditions:
inputs
must be a range of exactly onesource
Logical behaviour:
- Preprocess the provided file in
inputs
intooutput
.
Pre-conditions:
inputs
must be a range of exactly one element, this element being of typesource
ortranslation_unit
.
Logical behaviour:
- If
inputs
has asource
, preprocess it into a temporary file, else do nothing. - Compile the preprocessed file into
output
.
Pre-conditions:
inputs
must be a range of at least one element, each element being of typesource
,translation_unit
,object
orarchive
.
Logical behaviour:
- If some elements are
source
, preprocess them into one translation_unit for each source. - If some elements are
translation_unit
, compile them into one object for each translation_unit. - Group all objects in an archive.
- Group all archives into
output
.
Pre-conditions:
inputs
must be a range of at least one element, each element being of typesource
,translation_unit
,object
,archive
orlibrary
.
Logical behaviour:
- If some elements are
source
, preprocess them into one translation_unit for each source. - If some elements are
translation_unit
, compile them into one object for each translation_unit. - Group all objects in an archive.
- Link all archives in a library.
- Link all libraries into
output
.
Pre-conditions:
inputs
must be a range of at least one element, each element being of typesource
,translation_unit
,object
,archive
orlibrary
.
Logical behaviour:
- If some elements are
source
, preprocess them into one translation_unit for each source. - If some elements are
translation_unit
, compile them into one object for each translation_unit. - Group all objects in an archive.
- Link all archives in a library.
- Link all libraries into
output
, making sure there is amain()
entry point.
The function build(...}
accepts specifying include paths, in which #include
will search for files.
All of these paths must point towards directories.
These include paths are by definition useful only for the preprocessor.
If the logical behaviour of build(...)
does not involve the preprocessor, then include_paths
is silently ignored.
Options are described in ASCII string views. The interpretation of bytes outside the ASCII table is implementation-defined.
They are combined either by passing multiple string_view to build(...)
, or by concatenating them in a string_view separated by at least one space (spaces are the bytes: \t
,
, \r
, \n
).
Here is an example of equivalent calls:
build(..., ..., ..., { "option1", "option2", "option3" });
build(..., ..., ..., { "option1 option2", "option3" });
build(..., ..., ..., { "option1 option2 option3" });
If contradicting options appear, the last option is taken and the others are discarded. This allows for providing default options and option overloading.
Options are like C++ attributes: ignoring them does not prevent correct compilation.
NOTE: Options are currently not well-defined in this draft. More feedback is waited about current usages of flags and it would require compiler cooperation to define together what would be standard names for flags (notably warnings, features and optimizations).
Examples:
- Language standard. By default, the latest stable C++ standard supported by the compiler should be used.
However, the following language options are defined to respect a precise standard:
c89 c99 c11 c++98 c++11 c++14 c++17 c++20
. The C standards are provided because most C++ compilers can also compile C, and it is common to mix C and C++ projects. - Features. By default, all supported features of C++ are enabled.
However, it is currently common to disable some features, either for performance, safety or coding style.
These options are all opt-out:
no-exception no-rtti no-filesystem no-regex
, etc. - Warnings. More stricter diagnosis of C++ code can be wanted.
warning-0 warning-1 warning-2
- Optimizations.
opti-space opti-size opti-middle opti-none
There may be also implementation-defined options, which would start by a special token. Theses options should be safely ignorable.
NOTE: maybe -
like gcc and clang? or /
like msvc? or another convention?
The run(...)
function is a wrapper of std::system
.
template<typename Arguments = initializer_list<string_view>>
auto run(program p, Arguments args = {}, path working_dir = "") -> int;
With Arguments
being a range of values convertible to string_view
.
This function executes the program at the location p
using args
, from the working directory working_dir
.
Arguments are passed exactly in this way to the program.
This means that the implementation correctly escape characters, so that the program's main's argc
is equal to args.size()
.
The path for the working directory is relative to the current working directory, not the directory of the program being executed.
The return type must be the return type of the program, and not an error code from the shell. For instance, if the program is not found or if it has provoked a segfault, an exception should be thrown, instead of an error code being returned.
NOTE: is it possible to distinguish return values from the program and from the operating system?
Example of calls:
int retval = run(program{"hello"});
retval = run(program{"hello"}, { "Mister" });
Example of use case in a project building file:
#include <native_build>
using namespace std::native_build;
constexpr auto VERSION = "2.5.0";
int main() {
build(program{"build/codegen"}, source{"codegen/main.cpp"});
run(program{"build/codegen"}, { "src/version.hpp.in", "build/include/version.hpp" });
build(program{"build/hello"}, source{"src/hello.cpp"}, { "build/include/" });
return 0;
}
# Project layout
src/
hello.cpp
hello.hpp
main.cpp
build.cpp
build/
<empty>
// build.cpp
#include <native_build>
using namespace std::native_build;
int main() {
build(program{"build/hello"}, { source{"src/hello.cpp"}, source{"src/main.cpp"} });
return 0;
}
# Project layout
src/
mylib/
myclass.hpp
myclass.cpp
myexe/
main.cpp
tests/
catch.hpp
main.cpp
myclass.cpp
build.cpp
build/
<empty>
// build.cpp
#include <native_build>
using namespace std::native_build;
int main(int argc, char** argv) {
bool should_run_tests = false;
if (argc >= 2 and std::string_view("--run-tests") == argv[1])
should_run_tests = true;
build(library{"build/mylib"}, { source{"src/mylib/myclass.cpp"} }, { "src/" });
build(program{"build/myexe"}, { library{"build/mylib"}, source{"src/myexe/main.cpp"} });
build(program{"build/tests"}, { library{"build/mylib"}, source{"tests/main.cpp"}, source{"tests/myclass.cpp"} });
if (should_run_tests)
run(program{"build/tests"});
return 0;
}
# Project layout
src/
main-project/
main.cpp
version.hpp.in
code-gen/
main.cpp
build.cpp
build/
include/
<empty>
<empty>
// build.cpp
#include <native_build>
using namespace std::native_build;
int main() {
build(program{"build/code-gen"}, { source{"src/code-gen/main.cpp"} });
run(program{"build/code-gen"}, { "src/main-project/version.hpp.in", "build/include/version.hpp" });
build(program{"build/main"}, { source{"src/main-project/main.cpp"} }, { "build/include" });
return 0;
}
# Project layout
src/
main.cpp
build-deps/
my-package-manager.hpp
build.cpp
build/
<empty>
// build.cpp
#include <native_build>
#include <vector>
#include "build-deps/my-package-manager.hpp"
using namespace std::native_build;
using std::vector;
int main() {
auto SDL = my_package_manager::require_library("SDL2", "2.0.0");
vector<build_file> build_files = SDL.build_files();
build_files.push_back("src/main.cpp");
vector<path> include_paths = SDL.include_paths();
build(program{"build/my-app"}, build_files, include_paths);
return 0;
}
To counterbalance the requirements of the building program being a single translation unit, it may be possible to add a specification about dynamic loading. It was not done because of the author's lack of experience using dynamic loading.
Such a specification could declare the function:
template<typename Signature>
auto load_function(library, string_view which_language, string_view name) -> Signature*;
// Example of use:
auto func = load_function<int(float, float)>(library{"hello"}, "C", "my_func");
int result = func(0.0f, 2.0f);
With the additional benefit that which_language
defines the extern specification of the function.
For instance:
extern "C" void c_func();
extern "C++" void cpp_func();
cpp_func
could be loaded whith which_language == "C++"
.
Mangling may be solved because the signature of the function is passed in template.
So a build system would simply do:
build(library{"git"}, source{"build_src/git.cpp"});
auto git_clone = load_function<void(string_view, path)>(library{"git"}, "C++", "gitwrapper::clone");
git_clone("https://.../mydependency", "dependencies/");
I don't think it is useful to push so much functionality into the header. I believe almost all functionality should live in build-deps.
<native_header> should only provide access to functionality the compiler provides anyways:
my proposal for <native_header> is that it contains the following interface:
// i am avoiding std::string_view as this way the whole thing c++11 compliant.
class Compiler {
// implementation-defined
public:
std::string name(); // compiler name
std::string version(); // compiler Version (semver ) ( how do we handle LLVM versions of clang?)
std::string vendor();
bool is_option_supported(const std::string&);
std::vectorstd::string get_supported_options();
std::string help_text();
// the return value of process_file is the return value the compilation process would normally have.
int process_files(
std::ostream& out, // what the compiler would send to stdout
std::ostream& err, // what the compiler would send to stderr
const std::filesystem::path& output_file, // <native_build> does not deal with
std::vector<char > options // very close and very easily convertible to char []
// Options are identical to the options via the command line.
// for compiler independend options a wrapper class has to be provided in build-deps
);
// this overload is mainly intended for querying information about files from the compiler. eg. gcc -M option.
int process_files(
std::ostream& out,
std::ostream& err,
std::ostream& output_file,
std::vector<char*>& options
);
}
I believe this can be implemented relatively easily in terms of libclang for clang also I don't believe that this should go into the C++ standard.
similar to <immintrin.h> or pragma once it should more be an effort to standardize it over compilers independend of the standard.
For the build system then all additional functionality has to be implemented in build-deps. I would call that an advantage as the build system can be updated independently from the compiler.
What do you think ?