Skip to content

Instantly share code, notes, and snippets.

@SHA-4096
Last active December 23, 2024 10:23
GSoC2024-UnifiedIDL

Project: Unified IDL for Apache Dubbo-Go

Project Description

This project aims to provide apache dubbo with a Protobuf IDL translation tool to unify the interface definition in both IDL and Non-IDL mode. The toolset would be able to generate code that invokes the Server and Client APIs of Dubbo regardless of the protocol mode (IDL or Non-IDL). What's more, a toolset for translating java interface definitions to Protobuf IDL would also be developed to simplify the coordination of microservices development in different programming languages. The project has potential to grow into an even more unified microservice developing workflow by empowering users to generate code directly from an existing interface definition of various programming languages.

Project Progress Overview

Works Done:

  • Unified IDL Extension files for hessian type mapping and transport protocol specification

  • Enable developers to define Hessian2-serialized service in Protobuf IDL

  • Support the translation from Java interfaces to Protobuf service definition

  • Fix bugs in interoperation between Dubbo-Java and Dubbo-Go

Works Yet to be Done

  • Support more complex Java types in Java2Proto plugin

  • Support direct interface migration from Dubbo-Java to Dubbo-Go

  • Enable dubbo-cli to manage protoc plugins and code generation option

Project Usage & Code Insight:

The Unified-IDL project provides developers with the following features:

  1. Generate invocation functions using Hessian2 serialization from user-defined Protobuf IDL files

  2. Generate Protobuf IDL files from Java interface definition

  3. Specify the transport protocol (e.g. DUBBO/TRIPLE) in Protobuf IDL files

Unified IDL for Hessian2

Usage

The following two link are examples for the hessian2-serializatized services development using Protobuf IDL for interface definition.

dubbo-go-samples/java_interop/non-protobuf-dubbo

dubbo-go-samples/java_interop/non-protobuf-triple

In these samples, invocation files are already generated in the proto folder. In non-protobuf-dubbo this generation is made by running the following command:

protoc -I ./ \
  --go-hessian2_out=./ --go-hessian2_opt=paths=source_relative \
  --go-dubbo_out=./ --go-dubbo_opt=paths=source_relative \
  ./greet.proto

Basically, this command uses the (previously installed) protoc plugins protoc-gen-go-dubboand protoc-gen-go-hessian2for code generation. The former generates the invocation part, and the latter handles the type registration and hessian2 serialization.

Code Insight

How does the code work?

In protoc-gen-go-hessian2, the .proto file is passed on to the ProcessProtoFile function, then we extract contents in the file variable into hessian2Go struct

func ProcessProtoFile(g *protogen.GeneratedFile, file *protogen.File) (*Hessian2Go, error) {
    hessian2Go := &Hessian2Go{
        File:         file,
        Source:       file.Proto.GetName(),
        ProtoPackage: file.Proto.GetPackage(),
    }

    for _, enum := range file.Enums {
        hessian2Go.Enums = append(hessian2Go.Enums, processProtoEnum(g, enum))
    }

    for _, message := range file.Messages {
        hessian2Go.Messages = append(hessian2Go.Messages, processProtoMessage(g, file, message))
    }

    return hessian2Go, nil
}

The hessian2Go struct is then passed on to generator.GenHessian2 function for code generation.

func GenHessian2(g *protogen.GeneratedFile, hessian2Go *Hessian2Go) {
    genPreamble(g, hessian2Go)
    genPackage(g, hessian2Go)

    g.QualifiedGoIdent(protogen.GoIdent{
        GoName:       "dubbo-go-hessian2",
        GoImportPath: "github.com/apache/dubbo-go-hessian2",
    })

    for _, enum := range hessian2Go.Enums {
        genEnum(g, enum)
    }

    for _, message := range hessian2Go.Messages {
        genMessage(g, message)
    }

    genRegisterInitFunc(g, hessian2Go)
}

Functions like genXxx are functions that use g.P to send generated lines to system's stdout stream and passed on to the protoc binary. For example, genPreamble uses g.P to generate preambles of the code

func genPreamble(g *protogen.GeneratedFile, hessian2Go *Hessian2Go) {
    g.P("// Code generated by protoc-gen-go-dubbo. DO NOT EDIT.")
    g.P()
    g.P("// Source: ", hessian2Go.Source)
    g.P("// Package: ", strings.ReplaceAll(hessian2Go.ProtoPackage, ".", "_"))
    g.P()
}

In order to use Hessian2 serialization, we utilized the dubbo-go-hessian2repository in the generated code.

func genRegisterInitFunc(g *protogen.GeneratedFile, hessian2Go *Hessian2Go) {
    g.P("func init() {")
    for _, message := range hessian2Go.Messages {
        if message.Desc.IsMapEntry() || message.ExtendArgs {
            continue
        }
        g.P("dubbo_go_hessian2.RegisterPOJO(new(", message.GoIdent.GoName, "))")
        for _, inner := range message.Messages {
            if inner.Desc.IsMapEntry() {
                continue
            }
            g.P("dubbo_go_hessian2.RegisterPOJO(new(", inner.GoIdent.GoName, "))")
        }
    }

    for _, e := range hessian2Go.Enums {
        g.P()
        if e.JavaClassName != "" {
            g.P("for v := range ", e.GoIdent.GoName, "_name {")
            for range e.Values {
                g.P("dubbo_go_hessian2.RegisterJavaEnum(", e.GoIdent.GoName, "(v))")
            }
            g.P("}")
        }
    }
    g.P("}")
    g.P()
}

When it comes to protoc-gen-go-dubbo, extra option for transport protocol specification is introduced. Take service level transport protocol specification as an example:

service GreetingsService  {
  option (unified_idl_extend.service_protocol) = {
    protocol_name: "DUBBO";
  };
  ...
}

This option is used in the ProcessProtoFile method in protoc-gen-go-dubbo:

for _, service := range file.Services {
        serviceProtocolOpt, ok := proto.GetExtension(service.Desc.Options(), unified_idl_extend.E_ServiceProtocol).(*unified_idl_extend.ServiceProtocolTypeOption)
        if serviceProtocolSpecFlag && ok {
            if serviceProtocolOpt.GetProtocolName() != unified_idl_extend.ProtocolType_DUBBO.String() || serviceProtocolOpt == nil {
                // skip the service which is not dubbo protocol or does not have a service option
                continue
            }
        }
        ...
}

serviceProtocolOpt is a cli argument set by user, whose default value is false. When set to true, the generator would check whether a method has the option unified_idl_extend.service_protocol. If the option does not exist or does not equal to "DUBBO", the related invoke function would not be generated.

Java2Proto

Java2Protois a tool to generate Protobuf IDL files from java interface definition. The project is still at its early stage, but can already generate methods with simple java types as arguments

Usage

Suppose we have a set of java files to define an interface

//GreetingsService.java
package org.apache.dubbo.tri.hessian2.api;

public interface GreetingsService {
    GreetResponse greet(GreetRequest req);
}

//GreetRequest.java
package org.apache.dubbo.tri.hessian2.api;

public class GreetRequest implements java.io.Serializable {
    private String name;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
}
//GreetResponse.java
package org.apache.dubbo.tri.hessian2.api;

public class GreetResponse implements java.io.Serializable {
    private String greeting;

    public String getGreeting() {
        return greeting;
    }

    public void setGreeting(String greeting) {
        this.greeting = greeting;
    }
}

We can run the following command in terminal

 go run github.com/dubbogo/dubbo-java2proto --file ./GreetingsService.java
 go run github.com/dubbogo/dubbo-java2proto --file ./GreetRequest.java
 go run github.com/dubbogo/dubbo-java2proto --file ./GreetResponse.java

This will generate the following files:

//GreetingsService.proto
// Code generated by dubbo-java2proto. DO NOT EDIT.

syntax = "proto3";

package api;

option go_package = "org/apache/dubbo/tri/hessian2/api;api";

import "unified_idl_extend/unified_idl_extend.proto";

service GreetingsService {
  option (unified_idl_extend.service_extend) = {
    interface_name: "org.apache.dubbo.tri.hessian2.api.GreetingsService";
  };
  rpc greet(GreetRequest) returns (GreetResponse) {
    option (unified_idl_extend.method_extend) = {
      method_name: "greet";
    };
  }
}


//GreetRequest.proto
// Code generated by dubbo-java2proto. DO NOT EDIT.

syntax = "proto3";

package api;

option go_package = "org/apache/dubbo/tri/hessian2/api;api";

import "unified_idl_extend/unified_idl_extend.proto";

message GreetRequest {
  string name = 1;
  option (unified_idl_extend.message_extend) = {
    java_class_name: "org.apache.dubbo.tri.hessian2.api.GreetRequest";
  };
}

//GreetResponse.proto
// Code generated by dubbo-java2proto. DO NOT EDIT.

syntax = "proto3";

package api;

option go_package = "org/apache/dubbo/tri/hessian2/api;api";

import "unified_idl_extend/unified_idl_extend.proto";

message GreetResponse {
  string greeting = 1;
  option (unified_idl_extend.message_extend) = {
    java_class_name: "org.apache.dubbo.tri.hessian2.api.GreetResponse";
  };
}

Then these .proto files can be used by a variety of protoc plugins to generate code in other languages.

Code Insight

The most important problem to solve is how to get the syntax tree in java code. We chose tree-sitterto do the job. With go-tree-sitterwe were able to parse java source code into a syntax tree. In the parser package, we defined a JavaParser struct and parse the interface definition(and related class definition) bit by bit.


type JavaParser struct {
    content []byte
    ast     *sitter.Node

    PackageName string
    Interfaces  []*JavaInterface
    Classes     []*JavaClass
}

The NewParser method reads the source code file and gets the syntax tree with tree-sitter, then it wraps these information into the javaParser

func NewParser(filepath string) (*JavaParser, error) {
    bytes, err := os.ReadFile(filepath)
    if err != nil {
        return nil, err
    }

    parser := sitter.NewParser()
    parser.SetLanguage(java.GetLanguage())

    tree, err := parser.ParseCtx(context.Background(), nil, bytes)
    if err != nil {
        return nil, err
    }

    jp := &JavaParser{
        content: bytes,
        ast:     tree.RootNode(),
    }

    return jp, nil
}

Then ParseFile is called to get all child nodes of the AST node and parse them according to their types. Then these parseXxx functions are called recurrsively, and eventually save data into the properties of the javaParser

func (jp *JavaParser) ParseFile() {
    for i := 0; i < int(jp.ast.ChildCount()); i++ {
        child := jp.ast.Child(i)
        typ := child.Type()

        switch typ {
        case PackageDecl:
            jp.parsePackage(child)
        case InterfaceDecl:
            jp.parseInterface(child)
        case ClassDecl:
            jp.parseClass(child)
        }
    }
}

func (jp *JavaParser) parseInterface(node *sitter.Node) {
    ji := new(JavaInterface)
    for i := 0; i < int(node.ChildCount()); i++ {
        child := node.Child(i)
        typ := child.Type()

        // check if the interface is public
        // don't generate code for non-public interface
        switch typ {
        case Modifiers:
            if child.Content(jp.content) != "public" {
                break
            }
        case Identifier:
            identifier := child.Content(jp.content)
            ji.Name = identifier
        case InterfaceBody:
            jp.parseInterfaceBody(child, ji)
            jp.Interfaces = append(jp.Interfaces, ji)
        }
    }
}

Related Repositories

dubbo-go

dubbo-go-samples

protoc-gen-go-dubbo

protoc-gen-go-hessian2

Java2Proto

@ParikhShreya
Copy link

How long this project ? it's a 3 month project or 6 month ?

@SHA-4096
Copy link
Author

SHA-4096 commented Dec 1, 2024

How long this project ? it's a 3 month project or 6 month ?

Hi, it's a 3 month project.

@divyankmalik
Copy link

Hi I wish to join you for this project. What is the process for it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment