Skip to content

Instantly share code, notes, and snippets.

@edi33416
Created August 23, 2019 21:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edi33416/d2f8afff217e64deae368e0e81f2adcf to your computer and use it in GitHub Desktop.
Save edi33416/d2f8afff217e64deae368e0e81f2adcf to your computer and use it in GitHub Desktop.
Work Product Report for Google Summer of Code 2019

Work Product Report for Google Summer of Code 2019

Introduction

In recent years, the D programming language has gained more and more attention and existing C and C++ codebases are starting to incrementally integrate D components.

In order to be able to use D components, a C or C++ interface to them must be provided; in C and C++, this is done through header files. Currently, this process is entirely manual, with the responsibility of writing a header file falling on shoulders of the programmer. The larger the D portion of a codebase is, the more tedious the task becomes: the best example being the DMD frontend which amounts to roughly ~310000 lines of code for which the C++ header files that are used by other backend implementations (GDC, LDC) are manually managed. This is a repetitive, time consuming, and rather boring task: this is the perfect job for a machine.

Project goal

The deliverable of the project is a tool that automatically generates C and C++ header files from D module files. This can be achieved either by a library solution using DMD as a Library, or by adding this feature in the DMD frontend through a compiler switch.

The advantage of using DMD as a Library is that this wouldn’t increase the complexity of compiler frontend codebase. The disadvantage will be that the user will be required to install a third-party tool. Contrasting to this, the addition of the feature to the frontend would result in a smoother integration with all the backends that use the DMD frontend.

We have decided to go with the compiler switch approach.

One major milestone (and success marker) for the project is to automatically generate the DMD frontend headers required by GDC/LDC.

Implementation strategy

The feature will require the implementation of a Visitor class that will traverse the AST resulted after the parsing phase of the D code. For each top-level Dsymbol (variable, function, struct, class etc.) the associated C++ correspondent will be written in the header file.

The visitor will override the visiting methods of two types of nodes:

  • Traversal nodes - these nodes simply implement the AST traversal logic: ModuleDeclaration, ScopeDeclaration, etc.
  • Output nodes - these nodes will implement the actual header generation logic: FuncDeclaration, StructDeclaration, VarDeclaration, etc.

The header file will consist of declarations from public extern (C++) and public extern (C) declarations/definitions from D modules.

Work done

I've started work with the revival of DMD's PR 8591, rebasing it and converting it into a compiler switch.

The next step was to add tests for the existing code.

Test description Link to commit
Test enum declarations link
Test free functions link
Test variable declarations link
Test alias declarations link
Test struct declarations link
Test class declarations link
Test template declarations link

The tests revealed the following issues

  • StructDeclaration:

    • align different than 1 does nothing; we should support align(n), where n in [1, 2, 4, 8, 16] - fixes 1, 2
    • align(n): inside struct definition doesn’t add alignment, but breaks generation of default ctors
    • if a struct has a void initializer (member = void), the code segfaults - fix
  • ClassDeclaration:

    • align(n) does nothing. You can use align on classes in C++, though It is generally regarded as bad practice and should be avoided
  • FuncDeclaration:

    • default arguments can be any valid D code, including a lambda function or a complex expression; we don't want to go down the path of generating C or C++ code, so for now default arguments get ignored.
  • TemplateDeclaration:

    • templates imply code generation, so for now we don't support them

After writing the tests and understanding what are the issues and fixing the blocking ones, I got more comfortable with the codebase and I got on to the next step: generating the DMD frontend header files from DMD's *.d frontend modules.

This took quite some time and sweat to get going: the major pain point here is given by templates. There is dmd/root/array.d which has a templated Array(T) that is used throughout the codebase. Since we don't support templates, we decided to keep the manual management of the dmd/root/*.h headers, but things aren't that simple.

The issue is that while we don't explicitly pass in any of the dmd/root/*.d modules, some of them are processed during the semantic analysis phase, which will generate the definition of some structs and enums from dmd/root/*.d into the generated frontend header. When the generated header is used in conjunction with the manually managed header files from dmd/root/*.h a struct/enum re-definition error will be thrown by the compiler.

I kept scratching my head at how to avoid this, and in the end I went with explicitly ignoring anything that comes from a dmd/root/*.d module. Ideally, this special casing shouldn't be needed, and it should go away if we can add support for some simple D -> C++ templates.

At this point (roughly 8 weeks after GSoC had started) we were pretty confident with the project structure and behaviour, and we decided to tackle the final milestone: use the header generator to generate the frontend headers required by the GNU D Compiler (GDC), in order to replace the manually managed header files with an auto-generated one. While working on this I've encountered some challenges that I will detail bellow.

Challenges

1 - Enum base type

After scratching my head for a couple of days at a bug, I realised that the header generator was not taking into account the base type of an enum. So given the following example code

enum TOK : ubyte { /* ... */ }

class Expression
{
  TOK op;
  /* ... */
}

The enum TOK above gets generated as

enum TOK { /* ... */ }

According to the C standard, the compilers are free to pick the type that can fit the enum and most of them will pick int as a base type; thus sizeof(TOK) -> 4UL. As you can see, this is a problem as the D object files will consider TOK to be one byte and the C object files will consider it to be four bytes.

The manual header implementation did this clever trick to solve the problem

#typedef unsigned char TOK;
enum
{
  TOKmem1,
  /* ... */
};

class Expression
{
  TOK op;
  /* ... */
}

The above takes advantage of the fact that enum member fields are in the global namespace, so the values will exist, and since they can fit in an unsigned char, the code will work through the typedef.

All of the above is required because C++98/03 doesn't have support for enum base types.

At first I thought that the fix should use the same trick to solve the problem, but then a community member suggested I use the -extern-std compiler flag to drive the header generation. So, if a user used -extern-std=c++98, code simillar to the one above would be generated; if a user uses c++11 and beyond, the C++ enum class feature would be used. This is done through the commits from here and here.

2 - Identify missing extern (C++) declarations

By design, the C/C++ header generator takes into consideration only structures and classes that are declared extern (C) and extern (C++). Throughout the DMD codebase, there were methods that were declared as extern (C++), and thus part of the manually managed header files, but the enclosing struct or class wasn't declared as extern (C++). This merged commit adds the missing extern (C++) declarations.

2.1 - Public symbols named as C++ keywords

In the DMD codebase, the Dsymbol class had a member named namespace. Because Dsymbol.namespace is a public extern (C++) symbol, the C++ header generator will generate the following code

class Dsymbol : public ASTNode
{
public:
    /* ... */
    CPPNamespaceDeclaration* namespace;
    /* ... */
}

As you know, and see from the syntax highlight, namespace is a C++ keyword, so this generated code won't compile. This issue was fixed and merged in this commit

C++ does not support function covariance

The issue here was that D has function covariance, but C++ does not. What this means is that if I have the following D code

extern (C++) class A
{
  void foo() {}
}

extern (C++) class B :  A
{
  override void foo() const {}
}

the generated header will look like

class A
{
  virtual void foo();
}

class B : public A
{
  void foo() const;
}

Note that B has void foo() const Compiling this code with g++ -Woverloaded-virtual (as gdc does) will result in the following warning

warning: 'virtual void A::foo()' was hidden

by `void B::foo() const`

In the DMD codebase, there were two class hierarchies where this issue was present:

  • RootObject
  • Type

For the RootObject hierarchy I made the methods toChars and equals const, and this solved the issue. This is the merged commit for this work.

I attempted to do the same for the Type hierarchy with this commit. Here the situation wasn't as simple because there are a lot of isAAA and getAAA methods that modify the internal state (lazy initialization and caching reasons) of the object, which means that we can't make all the methods in the hierarchy const. Because of this, I attempted in another commit to remove the const qualifier from the methods declaration, but I wasn't really happy with this idea (nor were the members of the community as can be seen from the commit comments).

The solution to this debacle came from the suggestion of a community member: make the header generator emit the prototype of the function as declared in the introducing base class. This was done with this commit.

Current status

The header generation tool is still in the phase of open PR, but the tool should be ready to use. Until the tool gets merged into master (probably sometime next month), one can use it by checking out the PR branch and building the compiler.

The dmd compiler, through a compiler switch, is generating a C++ header file out of a list of .d modules passed at compile time. The simplest form of using the CLI switch is dmd -HC a.d b.d

This will visit the ASTs of modules a and b and output a single header file at stdout.

By using the -HCf=<file-name> switch, the above result will be written in specified file name. Using -HCd=<path> will write the file-name in the specified path.

So, running dmd -HCf=ab.h -HCd=mypath/ a.d b.d will write the generated header in mypath/ab.h, relative to the current directory.

Work to be done

First, we want to finish the integration of the auto-generated header with GDC, as this serves as a test on a big, production ready, project. This will probably take two more weeks.

After the integration with GDC is done, the PR should go through a final round of code review and then it is ready to be merged.

Closing notes

I want to publicly thank my mentors and the community for their help and guidance, thus helping me deliver this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment