Skip to content

Instantly share code, notes, and snippets.

@ksss
Last active September 11, 2022 07:10
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ksss/00592da24f28774bf8fc5db08331666e to your computer and use it in GitHub Desktop.
Save ksss/00592da24f28774bf8fc5db08331666e to your computer and use it in GitHub Desktop.

[PoC] Proposal of RBS generation method using Rack architecture

RBS generation problem

RBS is an easy language to generate code for because of its simple syntax and lack of dependencies between files. In addition, there are currently a large number of required type definitions, so code generation for RBS is considered to be highly important.

Various attempts have been made to generate RBS code, including generation from JSON files and analysis of static and dynamic Ruby code. However, each method has its advantages and disadvantages, and not all problems have been solved.

  • Large number of undefined gems
  • Determination of generics
  • Dynamic method generation by eval-type methods
  • Dynamic module include/extend/prepend using Module.new or included
  • Various extension requests

Proposed Methodology

I propose a method of combining small classes like a plugin mechanism for Rack middleware. We refer to this mechanism as RBSG (RBS Generator).

Like a Rack application, you write a small amount of code and run this.

Generator example:

loader = -> () {
  require 'foo'
  class Bar
    include Foo
  end
}
RBSG::Builder.new do
  use RBSG::Logger
  use RBSG::CreateFileByName,
    base_dir: "sig/out",
    header: "# !!! GENERATED CODE !!!"
  use RBSG::ObjectSpaceDiff
  use RBSG::IncludeExtendPrepend
  use RBSG::Result
  run loader
end

outputs:

sig/out/foo.rbs

# !!! GENERATED CODE !!!

module Foo
end

sig/out/bar.rbs

# !!! GENERATED CODE !!!

class Bar
  include Foo
end

Middleware example:

module RBSG
  class ObjectSpaceDiff
    def initialize(loader, if: nil)
      @loader = loader
      @if = binding.local_variable_get(:if)
    end

    def call(env)
      modules_before = ObjectSpace.each_object(Module).to_a

      result = @loader.call(env)

      modules_after = ObjectSpace.each_object(Module).to_a
      (modules_after - modules_before).each do |mod|
        next unless @if.nil? || @if.call(mod)
        result[mod.to_s] # set by default value
      end

      result
    end
  end
end

Pros

  • Middleware can be stacked by function, and middleware can be easily added or removed.
  • Middleware is in simple classes, and because they are simple, they can be used in a variety of situations.
  • The independent loading phase of the code allows both the pre- and post-loading code to be written. It is also easy to enclose the code in blocks.
  • The Rack architecture is widely accepted by rubyists and the acquisition cost can be estimated at a low level.
  • Unnecessary output can be controlled by middleware.
  • File output can also be written as middleware, so any output format can be supported.

Cons

  • The developer must write the loader and middleware stack like an application.
  • Middleware using TracePoint will not work as intended if the load timing is off.
  • High scalability and flexibility have a trade-off that also increases the cost of understanding.

Figure

sequenceDiagram
Middleware1 ->> Middleware2: .call
Middleware2 ->> Result: .call
Result ->>+ loader: .call
loader ->>- Result: no result
Result ->> Middleware2: result
Middleware2 ->> Middleware1: result

Middleware Example

  • Class/module definition using ObjectSpace
  • Static and dynamic addition of include/extend/prepend modules
  • Output data filtering
  • Debugging display of output
  • File output per class/module
  • Constant definition and type guessing
  • Logger configuration
  • Rails extensions to support class_attribute and mattr_acessor
  • Automatic support for method delegation

Use case

The proposed method is expected to be applied to a variety of use cases because of its simple and powerful mechanism.

gem_rbs_collection

When generating definitions for ActiveRecord

loader = -> (_env) {
  # code loading
  require 'active_record'
  ActiveRecord.eager_load!
}
RBSG::Builder.new do
  use RBSG::Logger
  use RBSG::CreateFileByName, # output
    base_dir: "sig/out",
    header: "# !!! GENERATED CODE !!!"
  use RBSG::Clean, if: -> (name, bodies) {
    if RBSG.rbs_defined?(name, library: "stdlib")
      bodies.empty? # skip empty definition
    else
      !(name.start_with?("ActiveRecord")) # skip out of scope
    end
  }
  use RBSG::ObjectSpaceDiff # class definition
  use RBSG::IncludeExtendPrepend # imported modules
  use RBSG::Rails::ClassAttribute # extention for rails
  use RBSG::Result
  run loader
end

Also, methods that are extended in Rails can be developed by writing extensions prepared for them and adding functionality. Furthermore, by switching the branch of the code to be read from, it is possible to easily output the code for each version.

Rails Application

env = {}
loader = -> (_env) {
  Rails.application.eager_load!
}
RBSG::Builder.new do
  use RBSG::Logger
  use RBSG::CreateFileByName,
    base_dir: Rails.root.join("sig/out"),
    header: "# !!! GENERATED CODE !!!"
  use RBSG::Clean, if: -> (name, _bodies) {
    RBSG.rbs_defined?(name, collection: true) # skip exist definition
  }
  use RBSG::ObjectSpaceDiff
  use RBSG::IncludeExtendPrepend
  use RBSG::Rails::ClassAttribute
  use CustomGenerator::Rolify # user customized
  use RBSG::Result
  run loader
end.call(env)

Similar to the gem_rbs_collection example, the same middleware can be used in the application code by changing the loader portion. In addition, users can add their own extensions and try them out, and it is easy to convert them to gems after they are used.

Common specifications

Loader

The loader only needs to load the code and does not need to worry about the return value.

Middlewear

Create it with a class that has a #call method, like Rack middleware.

Example of simple middleware

class SampleMiddlewear
  def initialize(loader)
    @loader = loader
  end
  
  def call(env)
    @loader.call(env)
  end
end

The interface is limited, but the content is not. Generate code, filter output, change output format, debug display, read documentation, configure Logger, etc.

Result

The return value is basically a Hash object with the class/module name as key and the content of each class/module as body. RBS is constructed by adding output codes to this result. It can output multiple classes/modules, so it can be used in libraries that extend core classes, such as active_support, and of course in Rails applications.

Output

The output is done using result. output can also be middleware, so it can handle a variety of output requests. For example, "write to a file for each class/module name", "write everything to standard output", etc.

@pocke
Copy link

pocke commented May 6, 2022

Thanks for writting the documentation and sorry for the late feedback.
I've read and considered RBSG.

The phase of writing RBSs

I'm wondering if it should separate phases between writing phases and others.

Currently a middleware writes .rbs files in #call method. So the middleware should be called the most outside of the middlewares because the result should be prepared before it writes as .rbs. The users need to care about the order of writing middleware. It looks a bit difficult.

  • Pros of writing RBSs in #call are:
    • Simple interface. The user only needs to define #call.
  • Pros of separating the writing phase are:
    • The user doesn't need to care about the order of writing middleware.

I think both approaches are good. I'd like to hear your opinion.

I have two ideas instead of #call. For example:

# callback style
class Middleware1
  def before_load() = ...
  def after_load() = ...
  def write() = ...
end
RBSG::Builder.new do
  use Middleware1
end

# Or Rails callback style
RBSG::Builder.new do
  # The `Middleware*` implements #call
  before_load Middleware1
  after_load Middleware2
  around_load Middleware3
  write Middleware4
end

Recommended configuration

I think we need to recommend a configuration by default. Because currently the users need to construct the whole of the builder.

I have two solution ideas.

  • Provide a boilerplate
    • for example, run rbs make-boilerplate-for-builder command, then it generates a boilerplate.
  • Provide the default ruleset
    • RBSG::Builder.new automatically uses some middlewares that are necessary always.

The loader is enough?

Currently RBS Rails and rbs prototype rb parse Ruby code to generate RBS. But RBSG doesn't provide parsing Ruby code. Is it enough?

Loading existing RBSs

We need to load existing RBSs during generating RBSs in order to avoid generating duplicated methods and so on.

I guess RBSG can provide a loader for existing RBS, but RBS::EnvironmentLoader is enough for this purpose maybe. We need to investigate EnvironmentLoader.

How to filter output classes

The users need to filter output classes. For example, the user may need RBS only for app/**/*.rb , or one class, or the whole Ruby process.

I think RBSG provides a middleware(s) for this purpose.

I'm not sure it is possible to filter extensions. For example:

class Object
  def blank?() = false
end

If an app or a gem has the above .rb code, RBGS should generate Object#blank?. But It may be difficult only with the runtime information.

Questions

  • What is the env argument of #call?
  • What is the result which is the return value of a loader?
  • Where does a middleware receive keyword arguments of use?

@ksss
Copy link
Author

ksss commented May 6, 2022

@pocke

Thank you for your great great reviewing.
I often refer to your code for implementation.

The phase of writing RBSs

There is a problem with the order.
In Rack, puma and other servers are responsible for output, so they were not helpful for this problem.
Currently RBSG does not provide any special support for output in order to keep the architecture simple.
I am also thinking that the following Recommended configuration may be able to handle some of this problem.

ja

順番の問題はありますね。
Rackではpuma等のserverが出力を担当しているのでこの問題の参考にはなりませんでした。
RBSGの現状ではアーキテクチャーをシンプルに保つため出力のための特別な対応はしていません。
また、次のRecommended configurationでいくらか対応できるのではないかと考えています。

Recommended configuration

I feel it is a very good idea and necessary.
RBSG needs to be customized for each situation as a highly flexible trade-off.
I initially thought I could provide an example directory and copy/paste it, but as you say, it looks like a bit more support.

ja

非常に良いアイデアで必要だと感じています。
RBSGは自由度が高いトレードオフとして、状況毎にカスタマイズする必要があります。
私は当初exampleディレクトリを用意してコピー&ペーストで対応できるかと考えていましたが、おっしゃる通り、もう少しサポートそうです。

The loader is enough?

You are right.
RBSG is mainly focused on functionality and will not be good at handling files.
The interface I'm thinking of to deal with this is this.

ja

貴方は正しいです。
RBSGは機能を中心にしているため、ファイルの扱いは得意ではないでしょう。
対策として考えているインターフェースはこうです。

  use FileBaseMiddlewareSample,
    paths: Dir.glob("foo/**/*.rb")

Loading existing RBSs

I too felt the need for this functionality and was able to implement it after much effort.
After parsing it as RBS and assigning it to a temporary RBS::Environment, it is again converted to a string with RBS::Writer and added to the existing result.
Duplicate elimination is done just before output.
The stored string is parsed as RBS, called #uniq! with the class and name of member declarations, and stringified again with RBS::Writer for output.
These will be quick to see the implementation.

ja

私もこの機能の必要性を感じていて、苦労の末実装できました。
一度RBSとしてparseし、仮のRBS::Environmentに代入した後、もう一度RBS::Writerで文字列化して既存のresultに追加します。
重複の排除は出力の直前に行います。
貯められた文字列をRBSとしてパースし、member declarationsのclassとnameで#uniq!を呼び出し、再度RBS::Writerで文字列化して出力します。
これらは実装を見るのが早いでしょう。

How to filter output classes

I am impressed that you are aware of that problem.

I was initially thinking of a filter around Object.const_source_location, but it turns out to be a very complex need.

  1. For nonexistent class names
    • Unnamed modules added by Module.new etc. such as `User::AttributeMethods::GeneratedAttributeMethods
  2. For modules that exist but are generated by a library.
    • User::GeneratedAssociateMethods, etc.
  3. Such as User::DeleteJob.
    • I want to keep app/models, but eliminate app/jobs.

It looks like I only need to look at the User path to deal with 1., but that doesn't eliminate the definition of 3.
If I try to deal with 3. as well, I will end up eliminating 1. and 2. as well.

This issue is unresolved, but it seemed to me that if we had middleware that could Filter either way, we could then provide users with an example implementation.

ja

その問題に気がつくとはさすがです。

私は当初Object.const_source_locationを中心としたフィルターを考えていましたが、非常に複雑なニーズがあることが分かっています。

  1. 存在しないclass名の場合
    • User::AttributeMethods::GeneratedAttributeMethodsのようなModule.new等で追加される名前のないmodule
  2. 存在するが、ライブラリーで生成しているmoduleの場合。
    • User::GeneratedAssociateMethods
  3. User::DeleteJobのような場合。
    • app/modelsは残したいが、app/jobsは排除したい

1.を対応するためにはUserのpathだけを見ればいいように見えますが、それだと3.の定義を排除できません。
3.も対応しようとすると、今度は1.や2.も排除してしまいます。

この問題は未解決ですが、どちらにせよFilterできるMiddlewareがあれば、あとはユーザーに実装例を示すことで対応できるように思いました。

Questions

What is the env argument of #call?

I finally removed the env argument from the spec.

At first I was quoting from the Rack spec.
I thought it could be used if there were common configuration items between middleware, but never had the opportunity to do so.
I read the Rack specification and interpreted env as handling HTTP requests.
RBSG does not have an input concept like HTTP request.
Thus I decided to remove it.

ja

私は最終的にenv引数を仕様から削除しました。

はじめはRackの仕様から引用していました。
ミドルウェア間で共通する設定項目などがあれば利用できるかと考えていましたが、その機会はありませんでした。
私はRackの仕様を読み、envはHTTP requestを扱うものだと解釈しました。
RBSGにHTTP requestのような入力の概念はありません。
よって私は削除することに決めました。

What is the result which is the return value of a loader?

I decided to ignore the return value of loader.
The loader would be written by the user, and I figured the experience would be better if the user did not have to know the return value specification.

ja

私はloaderの返り値は無視することに決めました。
loaderはユーザーが書くことになりますが、返り値の仕様を知らなくてもよい方が体験は良いだろうと私は考えました。

Where does a middleware receive keyword arguments of use?

The implementation of use is just as follows

ja

useの実装はちょうど以下のようになります。

    def use(middleware, *args, **key, &block)
      @use << proc { |loader| middleware.new(loader, *args, **key, &block) }
    end

@ksss
Copy link
Author

ksss commented May 9, 2022

@pocke
Copy link

pocke commented May 13, 2022

Thanks for your comment and for publishing the gems! I'm reading the gems with your comment.

@pocke
Copy link

pocke commented May 20, 2022

I'm reading orthoses code and the middleware approach looks working well 👍

I imagine the following roadmap for the next steps.

  1. Make a prototype
    • It's done as orthoses and this gist
  2. Move orthoses code to ruby/rbs to provide RBS generator library by rbs gem
    • rbs gem should have the core structure and basic middlewares.
    • Advanced middlewares, such as for Rails, should be outside of ruby/rbs. I have not considered where is the best place to put them yet.
  3. Replace all rbs prototypes with orthoses
    • rbs prototype's implementations should be just a set of middleware.
    • The users can use rbs prototype for a shorthand of generator.
    • If they want to customize rbs prototype, they can copy rbs prototype's implementation and inject middlewares.
  4. Replace RBS Rails with the new generator

What do you think? If you feel mismatches with the roadmap, please tell me 🙏


By the way, I have not read the whole implementation of orthoses yet, but I've just read the core architecture and considered that it can replace rbs prototype.
The middleware approach is good for customizable generator, but I worry that the concrete implementation is the best way. So I'd like to review the implementation on some way. I can review it on a PR of ruby/rbs or on a video meeting. Which way (or something else) would you like?

@ksss
Copy link
Author

ksss commented May 20, 2022

@pocke
Thank you for your aggressive proposal.

I did not think of replacing the rbs prototype. I was rather thinking of offering it as a piece of middleware since rbs prototype is a very cool script.
But replacing it is an interesting idea.

Here is the roadmap I had in mind.

  1. Automate rails-related RBS in gem_rbs_collection as much as possible by orthoses to reduce administration cost.
  2. Implement middleware for Rails applications and provide users with recommended combinations.
  3. Continue to create extensions for devise, etc.

Video meetings are welcome.

@pocke
Copy link

pocke commented May 26, 2022

Thanks for sharing your roadmap!

Personally I think it will confuse users if we provide multiple ways to generate RBS. So I'd like to merge the generator to the core.

I'd like to talk about the implementation details to merge it into the RBS core.
So, could you give me time to read orthoses code and try using it? I'll try it and summarize the tried result to you in a comment on this gist. Then I'd like to decide how to talk, on a video or gist comments.

@ksss
Copy link
Author

ksss commented May 27, 2022

Thank you very much for discussing this proposal.

I too think the user experience should be taken into consideration.

I am always welcome. The code may be hard to read, but I am happy if you read it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment