Skip to content

Instantly share code, notes, and snippets.

@SHA-4096
Last active December 23, 2024 10:23

Revisions

  1. SHA-4096 revised this gist Aug 25, 2024. 1 changed file with 379 additions and 21 deletions.
    400 changes: 379 additions & 21 deletions GSoC2024-UnifiedIDL.md
    Original file line number Diff line number Diff line change
    @@ -1,40 +1,398 @@
    GSoC 2024 Report

    Project: Unified IDL for Apache Dubbo-Go

    Project Description
    # Project Description

    This project aims to provide apache dubbo with a Protobuf IDL translation tool to unify the interface definition in both IDL and Non-IDL mode. The toolset would be able to generate code that invokes the Server and Client APIs of Dubbo regardless of the protocol mode (IDL or Non-IDL). What's more, a toolset for translating java interface definitions to Protobuf IDL would also be developed to simplify the coordination of microservices development in different programming languages. The project has potential to grow into an even more unified microservice developing workflow by empowering users to generate code directly from an existing interface definition of various programming languages.

    Project Progress Overview
    # Project Progress Overview

    Works Done:

    Unified IDL Extension files for hessian type mapping and transport protocol specification
    Enable developers to define Hessian2-serialized service in Protobuf IDL
    Support the translation from Java interfaces to Protobuf service definition
    Fix bugs in interoperation between Dubbo-Java and Dubbo-Go
    - Unified IDL Extension files for hessian type mapping and transport protocol specification

    - Enable developers to define Hessian2-serialized service in Protobuf IDL

    - Support the translation from Java interfaces to Protobuf service definition

    - Fix bugs in interoperation between Dubbo-Java and Dubbo-Go


    Works Yet to be Done
    Support more complex Java types in Java2Proto plugin
    Support direct interface migration from Dubbo-Java to Dubbo-Go
    Enable dubbo-cli to manage protoc plugins and code generation option


    - Support more complex Java types in Java2Proto plugin

    - Support direct interface migration from Dubbo-Java to Dubbo-Go

    - Enable dubbo-cli to manage protoc plugins and code generation option


    Project Usage:
    # Project Usage & Code Insight:

    The Unified-IDL project provides developers with the following features:
    Generate invocation functions using Hessian2 serialization from user-defined Protobuf IDL files
    Generate Protobuf IDL files from Java interface definition
    Specify the transport protocol (e.g. DUBBO/TRIPLE) in Protobuf IDL files

    Hessian2 IDL usage
    1. Generate invocation functions using Hessian2 serialization from user-defined Protobuf IDL files

    2. Generate Protobuf IDL files from Java interface definition

    3. Specify the transport protocol (e.g. DUBBO/TRIPLE) in Protobuf IDL files


    ## Unified IDL for Hessian2

    ### Usage

    The following two link are examples for the hessian2-serializatized services development using Protobuf IDL for interface definition.

    [dubbo-go-samples/java_interop/non-protobuf-dubbo](https://github.com/apache/dubbo-go-samples/tree/main/java_interop/non-protobuf-dubbo)

    [dubbo-go-samples/java_interop/non-protobuf-triple](https://github.com/apache/dubbo-go-samples/tree/main/java_interop/non-protobuf-triple)

    In these samples, invocation files are already generated in the proto folder. In non-protobuf-dubbo this generation is made by running the following command:

    ```
    protoc -I ./ \
    --go-hessian2_out=./ --go-hessian2_opt=paths=source_relative \
    --go-dubbo_out=./ --go-dubbo_opt=paths=source_relative \
    ./greet.proto
    ```

    Basically, this command uses the (previously installed) protoc plugins [protoc-gen-go-dubbo](https://github.com/dubbogo/protoc-gen-go-dubbo)and [protoc-gen-go-hessian2](https://github.com/dubbogo/protoc-gen-go-hessian2)for code generation. The former generates the invocation part, and the latter handles the type registration and hessian2 serialization.

    ### Code Insight

    How does the code work?

    In protoc-gen-go-hessian2, the .proto file is passed on to the `ProcessProtoFile` function, then we extract contents in the file variable into hessian2Go struct

    ```
    func ProcessProtoFile(g *protogen.GeneratedFile, file *protogen.File) (*Hessian2Go, error) {
    hessian2Go := &Hessian2Go{
    File: file,
    Source: file.Proto.GetName(),
    ProtoPackage: file.Proto.GetPackage(),
    }
    for _, enum := range file.Enums {
    hessian2Go.Enums = append(hessian2Go.Enums, processProtoEnum(g, enum))
    }
    for _, message := range file.Messages {
    hessian2Go.Messages = append(hessian2Go.Messages, processProtoMessage(g, file, message))
    }
    return hessian2Go, nil
    }
    ```

    The hessian2Go struct is then passed on to generator.GenHessian2 function for code generation.

    ```
    func GenHessian2(g *protogen.GeneratedFile, hessian2Go *Hessian2Go) {
    genPreamble(g, hessian2Go)
    genPackage(g, hessian2Go)
    g.QualifiedGoIdent(protogen.GoIdent{
    GoName: "dubbo-go-hessian2",
    GoImportPath: "github.com/apache/dubbo-go-hessian2",
    })
    for _, enum := range hessian2Go.Enums {
    genEnum(g, enum)
    }
    for _, message := range hessian2Go.Messages {
    genMessage(g, message)
    }
    genRegisterInitFunc(g, hessian2Go)
    }
    ```

    Functions like genXxx are functions that use g.P to send generated lines to system's stdout stream and passed on to the protoc binary. For example, genPreamble uses g.P to generate preambles of the code

    ```
    func genPreamble(g *protogen.GeneratedFile, hessian2Go *Hessian2Go) {
    g.P("// Code generated by protoc-gen-go-dubbo. DO NOT EDIT.")
    g.P()
    g.P("// Source: ", hessian2Go.Source)
    g.P("// Package: ", strings.ReplaceAll(hessian2Go.ProtoPackage, ".", "_"))
    g.P()
    }
    ```

    In order to use Hessian2 serialization, we utilized the [dubbo-go-hessian2](https://github.com/apache/dubbo-go-hessian2)repository in the generated code.

    ```
    func genRegisterInitFunc(g *protogen.GeneratedFile, hessian2Go *Hessian2Go) {
    g.P("func init() {")
    for _, message := range hessian2Go.Messages {
    if message.Desc.IsMapEntry() || message.ExtendArgs {
    continue
    }
    g.P("dubbo_go_hessian2.RegisterPOJO(new(", message.GoIdent.GoName, "))")
    for _, inner := range message.Messages {
    if inner.Desc.IsMapEntry() {
    continue
    }
    g.P("dubbo_go_hessian2.RegisterPOJO(new(", inner.GoIdent.GoName, "))")
    }
    }
    for _, e := range hessian2Go.Enums {
    g.P()
    if e.JavaClassName != "" {
    g.P("for v := range ", e.GoIdent.GoName, "_name {")
    for range e.Values {
    g.P("dubbo_go_hessian2.RegisterJavaEnum(", e.GoIdent.GoName, "(v))")
    }
    g.P("}")
    }
    }
    g.P("}")
    g.P()
    }
    ```

    When it comes to protoc-gen-go-dubbo, extra option for transport protocol specification is introduced. Take service level transport protocol specification as an example:

    ```
    service GreetingsService {
    option (unified_idl_extend.service_protocol) = {
    protocol_name: "DUBBO";
    };
    ...
    }
    ```

    This option is used in the ProcessProtoFile method in protoc-gen-go-dubbo:

    ```
    for _, service := range file.Services {
    serviceProtocolOpt, ok := proto.GetExtension(service.Desc.Options(), unified_idl_extend.E_ServiceProtocol).(*unified_idl_extend.ServiceProtocolTypeOption)
    if serviceProtocolSpecFlag && ok {
    if serviceProtocolOpt.GetProtocolName() != unified_idl_extend.ProtocolType_DUBBO.String() || serviceProtocolOpt == nil {
    // skip the service which is not dubbo protocol or does not have a service option
    continue
    }
    }
    ...
    }
    ```

    serviceProtocolOpt is a cli argument set by user, whose default value is false. When set to true, the generator would check whether a method has the option unified_idl_extend.service_protocol. If the option does not exist or does not equal to "DUBBO", the related invoke function would not be generated.

    ## Java2Proto

    [Java2Proto](https://github.com/dubbogo/dubbo-java2proto)is a tool to generate Protobuf IDL files from java interface definition. The project is still at its early stage, but can already generate methods with simple java types as arguments

    ### Usage

    Suppose we have a set of java files to define an interface

    ```
    //GreetingsService.java
    package org.apache.dubbo.tri.hessian2.api;
    public interface GreetingsService {
    GreetResponse greet(GreetRequest req);
    }
    //GreetRequest.java
    package org.apache.dubbo.tri.hessian2.api;
    public class GreetRequest implements java.io.Serializable {
    private String name;
    public String getName() {
    return name;
    }
    public void setName(String name) {
    this.name = name;
    }
    }
    //GreetResponse.java
    package org.apache.dubbo.tri.hessian2.api;
    public class GreetResponse implements java.io.Serializable {
    private String greeting;
    public String getGreeting() {
    return greeting;
    }
    public void setGreeting(String greeting) {
    this.greeting = greeting;
    }
    }
    ```

    We can run the following command in terminal

    ```
    go run github.com/dubbogo/dubbo-java2proto --file ./GreetingsService.java
    go run github.com/dubbogo/dubbo-java2proto --file ./GreetRequest.java
    go run github.com/dubbogo/dubbo-java2proto --file ./GreetResponse.java
    ```

    This will generate the following files:

    ```
    //GreetingsService.proto
    // Code generated by dubbo-java2proto. DO NOT EDIT.
    syntax = "proto3";
    package api;
    option go_package = "org/apache/dubbo/tri/hessian2/api;api";
    import "unified_idl_extend/unified_idl_extend.proto";
    service GreetingsService {
    option (unified_idl_extend.service_extend) = {
    interface_name: "org.apache.dubbo.tri.hessian2.api.GreetingsService";
    };
    rpc greet(GreetRequest) returns (GreetResponse) {
    option (unified_idl_extend.method_extend) = {
    method_name: "greet";
    };
    }
    }
    //GreetRequest.proto
    // Code generated by dubbo-java2proto. DO NOT EDIT.
    syntax = "proto3";
    package api;
    option go_package = "org/apache/dubbo/tri/hessian2/api;api";
    import "unified_idl_extend/unified_idl_extend.proto";
    message GreetRequest {
    string name = 1;
    option (unified_idl_extend.message_extend) = {
    java_class_name: "org.apache.dubbo.tri.hessian2.api.GreetRequest";
    };
    }
    //GreetResponse.proto
    // Code generated by dubbo-java2proto. DO NOT EDIT.
    syntax = "proto3";
    package api;
    option go_package = "org/apache/dubbo/tri/hessian2/api;api";
    import "unified_idl_extend/unified_idl_extend.proto";
    message GreetResponse {
    string greeting = 1;
    option (unified_idl_extend.message_extend) = {
    java_class_name: "org.apache.dubbo.tri.hessian2.api.GreetResponse";
    };
    }
    ```

    Then these .proto files can be used by a variety of protoc plugins to generate code in other languages.

    ## Code Insight

    The most important problem to solve is how to get the syntax tree in java code. We chose [tree-sitter](https://tree-sitter.github.io/tree-sitter/)to do the job. With [go-tree-sitter](http://github.com/smacker/go-tree-sitter)we were able to parse java source code into a syntax tree. In the parser package, we defined a JavaParser struct and parse the interface definition(and related class definition) bit by bit.

    ```
    type JavaParser struct {
    content []byte
    ast *sitter.Node
    PackageName string
    Interfaces []*JavaInterface
    Classes []*JavaClass
    }
    ```

    The NewParser method reads the source code file and gets the syntax tree with tree-sitter, then it wraps these information into the javaParser

    ```
    func NewParser(filepath string) (*JavaParser, error) {
    bytes, err := os.ReadFile(filepath)
    if err != nil {
    return nil, err
    }
    parser := sitter.NewParser()
    parser.SetLanguage(java.GetLanguage())
    tree, err := parser.ParseCtx(context.Background(), nil, bytes)
    if err != nil {
    return nil, err
    }
    jp := &JavaParser{
    content: bytes,
    ast: tree.RootNode(),
    }
    return jp, nil
    }
    ```

    Then ParseFile is called to get all child nodes of the AST node and parse them according to their types. Then these parseXxx functions are called recurrsively, and eventually save data into the properties of the javaParser

    ```
    func (jp *JavaParser) ParseFile() {
    for i := 0; i < int(jp.ast.ChildCount()); i++ {
    child := jp.ast.Child(i)
    typ := child.Type()
    switch typ {
    case PackageDecl:
    jp.parsePackage(child)
    case InterfaceDecl:
    jp.parseInterface(child)
    case ClassDecl:
    jp.parseClass(child)
    }
    }
    }
    func (jp *JavaParser) parseInterface(node *sitter.Node) {
    ji := new(JavaInterface)
    for i := 0; i < int(node.ChildCount()); i++ {
    child := node.Child(i)
    typ := child.Type()
    // check if the interface is public
    // don't generate code for non-public interface
    switch typ {
    case Modifiers:
    if child.Content(jp.content) != "public" {
    break
    }
    case Identifier:
    identifier := child.Content(jp.content)
    ji.Name = identifier
    case InterfaceBody:
    jp.parseInterfaceBody(child, ji)
    jp.Interfaces = append(jp.Interfaces, ji)
    }
    }
    }
    ```

    # Related Repositories

    [dubbo-go](https://github.com/apache/dubbo-go)

    The following two link are examples for the hessian2-serializatized services development using Protobuf IDL for interface definition.
    [dubbo-go-samples](https://github.com/apache/dubbo-go-samples)

    dubbo-go-samples/java_interop/non-protobuf-dubbo at main · apache/dubbo-go-samples (github.com)
    [protoc-gen-go-dubbo](https://github.com/dubbogo/protoc-gen-go-dubbo)

    dubbo-go-samples/java_interop/non-protobuf-triple at main · apache/dubbo-go-samples (github.com)
    [protoc-gen-go-hessian2](https://github.com/dubbogo/protoc-gen-go-hessian2)

    In these samples, invocation files are already generated in the proto folder.
    [Java2Proto](https://github.com/dubbogo/dubbo-java2proto)
  2. SHA-4096 created this gist Aug 22, 2024.
    40 changes: 40 additions & 0 deletions GSoC2024-UnifiedIDL.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,40 @@
    GSoC 2024 Report

    Project: Unified IDL for Apache Dubbo-Go

    Project Description

    This project aims to provide apache dubbo with a Protobuf IDL translation tool to unify the interface definition in both IDL and Non-IDL mode. The toolset would be able to generate code that invokes the Server and Client APIs of Dubbo regardless of the protocol mode (IDL or Non-IDL). What's more, a toolset for translating java interface definitions to Protobuf IDL would also be developed to simplify the coordination of microservices development in different programming languages. The project has potential to grow into an even more unified microservice developing workflow by empowering users to generate code directly from an existing interface definition of various programming languages.

    Project Progress Overview

    Works Done:

    Unified IDL Extension files for hessian type mapping and transport protocol specification
    Enable developers to define Hessian2-serialized service in Protobuf IDL
    Support the translation from Java interfaces to Protobuf service definition
    Fix bugs in interoperation between Dubbo-Java and Dubbo-Go

    Works Yet to be Done
    Support more complex Java types in Java2Proto plugin
    Support direct interface migration from Dubbo-Java to Dubbo-Go
    Enable dubbo-cli to manage protoc plugins and code generation option



    Project Usage:

    The Unified-IDL project provides developers with the following features:
    Generate invocation functions using Hessian2 serialization from user-defined Protobuf IDL files
    Generate Protobuf IDL files from Java interface definition
    Specify the transport protocol (e.g. DUBBO/TRIPLE) in Protobuf IDL files

    Hessian2 IDL usage

    The following two link are examples for the hessian2-serializatized services development using Protobuf IDL for interface definition.

    dubbo-go-samples/java_interop/non-protobuf-dubbo at main · apache/dubbo-go-samples (github.com)

    dubbo-go-samples/java_interop/non-protobuf-triple at main · apache/dubbo-go-samples (github.com)

    In these samples, invocation files are already generated in the proto folder.