rm155/garbage_collection_for_ractors.md Secret

## garbage_collection_for_ractors.md

      
    Raw
  

              garbage_collection_for_ractors.md
            
          
    Garbage Collection for Ractors

The fork of Ruby that I developed this project in can be found here: https://github.com/rm155/ruby/tree/Ractor-Local-GC-version-1
Background

Ruby 3.0 introduced Ractors, a feature that enables true concurrency in the language. By allowing different pieces of code to run at the same time, the execution time of program can be greatly reduced. However, when multiple processes work on the same data concurrently, it can lead to data race conditions, which cause the program to behave unpredictably. Because of this, Ruby heavily restricts the use of shared data between Ractors.
In my 2021 GSoC project, I helped modify existing Ruby libraries to be compliant with Ractor rules. The purpose of this was to make the functionality of those libraries usable within Ractors. After the project, I continued to work with Ractors, and I found that Ractors were not providing as much of a speedup as one might have expected. I asked my mentor, Koichi Sasada, about this, and he explained to me that it was likely because Ruby’s Garbage Collector (GC) was not yet parallelized.
Ruby’s GC runs using the “stop-the-world” technique: when the GC is being executed, all other parts of the program come to a halt. When Ractors are used, if one Ractor calls the GC, stop-the-world requires that all other Ractors be paused for this time. Because GC time can take up a large fraction of execution time, this limits the potential speedup that Ractors can provide.
Objective

The goal of the project is to make Ruby’s GC as parallelized as possible, thus increasing the benefits that Ractors can provide. In order to accomplish this, the proposed strategy is to create Ractor-Local GCs. These Ractor-Local GCs would avoid operating on data being used by other Ractors, making it safe for GCs to run without forcing other Ractors to pause.
Algorithm Design

Much of the initial phase of the project was spent on planning the algorithm for Ractor-Local GCs.
The existing GC algorithm in Ruby is a “mark-and-sweep” algorithm. It starts by marking the “root” objects, which are generally directly accessible from the top level of the program. Since these objects can link to other objects, those objects are marked as well. The program continues down this object tree until there are no objects left to mark. At the end of this process, any object that is not marked must not be in the tree at all, and is therefore unreachable. These objects are all “swept” away, since their data is no longer needed. This strategy has been optimized in various ways to improve performance, but this is the core of the existing algorithm.
Because this algorithm is applied to all objects in the program, it must be modified for Ractor-Local GCs. I worked with Koichi Sasada to design this modified algorithm. The most important principle to consider in designing the algorithm was that a reachable object must never be swept. In other words, it must be ensured that an object in use is never recycled by the GC. An accompanying principle is that the program should aim to sweep all as many unreachable objects as possible.
After several discussions about the various possible strategies, we settled on a design that we called “GC 1”. Like the existing GC strategy, GC 1 begins by marking root objects. However, GC 1 only marks the roots that are accessible from the current Ractor. Furthermore, all shareable objects belonging to the Ractor are considered roots; this is because there is a possibility that these objects are in use by other Ractors, and thus should not be swept. Once the roots have been marked, GC 1 continues marking the object tree in a way similar to the existing GC. However, whenever GC 1 arrives at a shareable object, it checks to make sure that the object belongs to its local Ractor, and it only marks the object if that is true. Doing this ensures that the GC never strays outside of the local Ractor’s territory on the object tree.
This algorithm alone is insufficient because it never collects shareable objects. To make up for this, a global GC must occasionally be run as well. This global GC works in a way similar to the existing GC, as it operates on the whole object tree without giving special treatment to shareable objects. However, because it needs to be run less frequently, the total amount of time taken in garbage collection should be reduced.
This strategy is an effective algorithm for a parallelized GC, but it is not optimal. “GC 2,” as suggested by Koichi Sasada, would be more selective about the shareable objects that are given special treatment. In particular, the program would track exactly which Ractors share each object. When an object can only be accessed by one Ractor, then even if it is technically shareable, it does not need to be considered a root in the mark-and-sweep algorithm.
These optimizations in the GC 2 strategy should lessen the frequency with which the global GC is needed, but they are more complex to implement. For this reason, we decided to focus on implementing GC 1. Once GC 1 has been successfully incorporated, it can then be developed into GC 2.
Implemented Steps

The following are the major steps I took in order to implement GC 1; each step was tested to ensure that it behaved as expected.
Creating the Shareable-Table

One of the first steps toward implementing GC 1 was to make shareable objects be treated as roots. I did this by creating a table owned by the program’s object-space. This first column in the table has the shareable objects themselves, while the second is currently just a column of zeroes. (The existence of this second column is to make it easier to later convert it into GC 2, where the number of objects using each Ractor will need to be tracked.) Ruby checks the shareability of objects that are used in Ractors; I modified this check so that the object is placed in the shareable-table whenever the object turns out to be shareable. I also added a section to Ruby’s root-marking function in order to have every object in the table marked.
Relevant Commit(s):

https://github.com/rm155/ruby/commit/0fd43f944f9f832b6bc31a969c217d0a7cbf11a9
https://github.com/rm155/ruby/commit/66c677f3bf26c2b78aa7fc6dcf34ac037ad3c696

Dividing the Object-Space

After this, a critical step in implementing GC 1 was to split the object-space by Ractor. Each Ractor would have its own object-space, which would generally contain objects created during that Ractor’s run. This also gave each Ractor its own shareable-table, as well as other attributes contained by the original object-space. Since the object-space is a critical component of Ruby’s GC file, splitting it by Ractor required many adjustments to be made to the code. The following changes were some of the most significant:

Whenever a Ractor is created, a new object-space must set up for it.
Previously, when the GC needed to access the object-space, it would get it directly from the VM structure. Now, it first finds the Ractor that is running the code, and then obtains the object-space from that Ractor.
Ruby’s GC tracks object IDs in tables stored in the object-space. Now that each Ractor has its own object-space, looking up an object ID first involves identifying an object’s Ractor.
When the GC tries to mark a shareable object, it checks to see if the object belongs to the ongoing Ractor, and then skips marking it if that is not the case.

Prior to splitting the object-space, a change had to be made to the way in which Ractor-related objects are marked. There are several kinds of objects that are referenced by Ractor objects themselves, and the existing GC encounters them once it marks their Ractor. (This includes the local variables that exist in the code a Ractor executes.) However, in the multiple object-space system, a Ractor object is stored in the object-space of the Ractor that created it, while most of the objects that it references are stored in its own object-space. Without modification, the GC would not reach these objects, as they have no parent in their own object-space. In order to remedy this, I created a new set of roots that contains the Ractor’s relevant objects, allowing them to be marked at the start of the GC.
Relevant Commit(s):

https://github.com/rm155/ruby/commit/abc866f1cc1aefbbe1901bc55474eba1e7702ac6
https://github.com/rm155/ruby/commit/034a94c78480901515fada139072b02d46912fca

Updating _id2ref

Once the multiple object-spaces were established, I decided to update the “_id2ref” function. This function takes a given ID and returns the corresponding object. The present version of Ruby disables this function when Ractors are in use, as it could otherwise be used to obtain another Ractor’s unshareable object. However, the new object-space system makes it clear which Ractor each object belongs to. Because of this, I was able to re-enable the function, allowing it to return objects as long as they belong to the ongoing Ractor or are shareable. By making this modification, I was able to use this function in my test files, which ensured that the objects were behaving as expected.
Relevant Commit(s):

https://github.com/rm155/ruby/commit/c7f65a7a959b6f1cf89c0794902ef4e9c76dbd96

Settling Finalizers

At this point, a decision had to be made about finalizers. A finalizer is a procedure that is run upon the deletion of a particular object. The existing GC stores all finalizers in its sole object-space. Once the object-space was divided, there either had to be a finalizer list for each Ractor’s object-space, or a single global finalizer list for the program. I discussed this with my mentor, and we decided that it was better to go with the Ractor-Local strategy. There was no issue in having this implemented, as it was a natural extension of creating multiple object-spaces. However, I had to add a new restriction to prevent the user from defining new finalizers on an object from another Ractor, as this would cause finalizers to be stored in the wrong object-spaces.
Relevant Commit(s):

https://github.com/rm155/ruby/commit/88288832f216f70b501cdffa495004315017edc1

Adding the Ractor-Teardown GC

In order to prevent a memory buildup as more Ractors are created, we decided to have each Ractor run their local GC when they are terminated. This is because most of the objects in the Ractor would be totally unusable after the Ractor closes. The exception is shareable objects, which are spared by the local GC anyway. The only difference I added to this cleanup GC was that it would skip marking the Ractor’s local variables, which would normally be roots in the new local GC.
Relevant Commit(s):

https://github.com/rm155/ruby/commit/1b171f60986d7c8bc3922bb4396e5fdcdfce282f

Organizing Threads

I then noticed that the primary threads of Ractors appeared to be referencing a particular object in the main Ractor. It turned out that this object was the main Ractor’s default ThreadGroup. My mentor suggested giving each Ractor its own local default ThreadGroup. This was fairly simple to implement.
It then seemed appropriate to have a Ractor’s thread be stored in its own object-space, even though the parent Ractor is the one that creates it. This would be the first time an object is stored in the object-space of a Ractor that did not create it. At first, I tried to make a special exception for the primary threads of Ractors, giving them a special procedure for their allocation. However, I found that these threads had references that needed to be in the same object-space as their corresponding thread. Because of this, number of objects created in that section of the code needed to be allocated in the new Ractor’s object-space. To do this, I gave object-spaces the ability to redirect allocations over to another object-space. In this way, when the Ractor’s thread is being created, the old object-space temporarily sets the new Ractor’s object-space as the target, and then returns to normal once all relevant allocations are over.
Relevant Commit(s):

https://github.com/rm155/ruby/commit/2ba1a4ba9bb1e67a78423751aaa9738ae1160b1b
https://github.com/rm155/ruby/commit/eb4830607e6c1fcdec991103dba5be1c688bbd16
https://github.com/rm155/ruby/commit/bfc1ed97dd62e83034f947c5c8b1b2df48cfb899
https://github.com/rm155/ruby/commit/e1388ea1999704fc2f5364a23355f24ea24c638b

Handling Shareability Exceptions

The local GC’s algorithm relies on the principle that no unshareable object will be used by multiple Ractors at once. In practice, there were some exceptions to this. Once I had finished most of the local GC algorithm, I decided to find and handle these exceptions. Unfortunately, there were many more of these cases than I had expected. Some of the details will be discussed later in the “Issues Encountered” section.
Establishing the Global GC

Once the local GC could finally work without error, I moved on to implementing the global GC. Before creating the full functionality of the global GC, I decided to first create a simplified version that essentially runs the local mark-and-sweep algorithm on each object-space sequentially from one single Ractor. This was a little more complicated than just running the algorithm in a loop, as each of the object-spaces need to be prepared before the mark-and-sweep can begin. Furthermore, because this was the first time that a Ractor could mark objects in another object-space, it required a modification to the way in which the current object-space is identified. When the Ractor runs the GC in another object-space, it now has to make a note of which object-space it is running in. Until this note is removed, that object-space will be treated as the current object-space.
Relevant Commit(s):

https://github.com/rm155/ruby/commit/be8fe8d54472bc3cf00726228ee1a33d4958f319

Developing the Global GC

After the rudimentary global GC was established, the next step was to give the global GC the permissions that the local GCs do not have: in particular, it needed to be able to follow the object tree between object-spaces, as well as to be able to ignore the shareable-table when marking roots. Removing these restrictions for the global GC was fairly simple.
However, doing so did not finish the task. Normally, when the GC runs on a single object-space, it removes existing marks, adds the new marks, and then sweeps. In order for the global GC to work correctly, each step (remove marks, add marks, sweep) must be done in all object-spaces before moving on to the next step. Otherwise, marks may be inappropriately removed, and objects may be swept prematurely. For this reason, I separated each of these steps into its own loop.
Relevant Commit(s):

https://github.com/rm155/ruby/commit/cbb4c85e3b964ee8dba097052fd105f6e9aeb60b

Issues Encountered

The main issues I have encountered while working on this project so far have almost all been related to the GC sweeping an object early. These problems would usually surface in the form of unexpected behavior, segmentation faults, or an error message indicating that the GC was attempting to mark a previously swept object. There were a few categories of objects that were particular prone to causing this type of problem.
T_IMEMO objects

One of these categories is the T_IMEMO objects. These objects are for Ruby’s internal use, and rarely appear in Ruby-level coding. Because of this, their shareability was not yet defined. As I developed the local GC, I found that these objects were being swept even while they were still needed. My mentor and I reviewed the purpose of each kind of T_IMEMO object, and we were able to classify them by shareability. Once the T_IMEMO objects had their shareability status confirmed, many of the issues they had been causing were resolved. The “env” type T_IMEMO object still appears to cause some issues, but I have made it an exception to the GC’s current algorithm for now.
Relevant Commit(s):

https://github.com/rm155/ruby/commit/5c9c4e4f54824648628100cecff7d5ebbbb8f733

Non-Tabled Shareable Objects

The problem of being mistakenly swept was also present in objects that should have been shareable, but had not yet made it onto the shareable-table. For most objects, shareability only becomes relevant when they (or ancestors in the object tree) are explicitly used in a Ractor, invoking a shareability check. For this reason, it appeared to be sufficient to have the shareable-table updated only when this check takes place. However, it turned out that some objects are relevant to Ractors even before their shareability is checked. Each time I encountered one of these cases, I had to ensure that the shareable-table would be updated whenever that case occurred.
Relevant Commit(s):

https://github.com/rm155/ruby/commit/5afdf2084fb1d5d6bef6dd95da7955cbdf86e7b0
https://github.com/rm155/ruby/commit/9f3049095f3e772417a3f3b1c7b268cfffb0c683
https://github.com/rm155/ruby/commit/e35b983ce16e10f3a4716610a21694fa0e16fc5b
https://github.com/rm155/ruby/commit/21b0ff54099b152e0419030c54a984e705431f0c

Objects Inconsistent with Shareability Rules

The last major cause of incorrect sweeping was violations of the rule that unshareable objects can only be used by a single Ractor. These objects often appeared to be connected to only one Ractor on the surface, but were subtly linked to objects in other Ractors as well. These needed to be dealt with on a case-by-case basis. Some of them could simply be declared to be shareable. Others needed to be updated so that each Ractor would have its own copy. In a few cases that were harder to classify, I simply made the objects exceptions for the GC to handle, but these will be dealt with again later.
Relevant Commit(s):

https://github.com/rm155/ruby/commit/55921ff44adc8e3193233862ff6ebae4b8904718
https://github.com/rm155/ruby/commit/0e20db279b9154298f11c5abf965d8ab9cebfb7f
https://github.com/rm155/ruby/commit/7580c94a04cc11715c670b4b25af14676370f16f
https://github.com/rm155/ruby/commit/74ba7281d1f73f1b8b689cbc481dd69ada78fc78
https://github.com/rm155/ruby/commit/345b29e2c953176b11854a75acffa2e733f561e8
https://github.com/rm155/ruby/commit/b0471e63243da16490bc443539dff5db184ccbbd
https://github.com/rm155/ruby/commit/199c7df0c43acde668533a4f1e458ac732a7106c

Present Status and Remaining Tasks

Currently, the local GC algorithm and the global GC algorithm both appear to be able to run successfully. However, I still have several tasks to finish before GC 1 is complete. The implementation strategies of these steps need to be considered carefully, but they should all be doable with a little more time.
The remaining tasks are as follows:

The stop-the-world effect must be removed from the local GCs. This is necessary to allow other Ractors to continue running while the local GC is ongoing, which is the goal of this project.
The conditions for the global GC’s activation need to be clearly defined. At the current point in the project, the global GC only runs when it is called manually. Finding the appropriate conditions for the global GC will likely involve some trial and error.
Moved objects need to be properly transferred to their new object-space. At the moment, moved objects are treated as exceptions and handled specially by the GC. Ultimately, the moved objects must actually switch object-spaces.

The moving algorithm for T_DATA objects must be completed. Right now, moving T_DATA objects do not cause the objects’ references to be moved. The makes those references vulnerable to being swept early.


One possible way to prevent object-spaces from accumulating would be to have the object-spaces of a closed Ractor merge with the object-space of the parent Ractor. This keeps the number of object-spaces to a minimum, which reduces the number of loops needed by the global GC.
The final version of the Ractor-Local GC will need to treat the main Ractor a little differently. This is because classes defined in non-main Ractors can reference unshareable objects in the main Ractor’s object-space. Since those unshareable objects have no parent in their object-space, the current system will sweep them early.
If possible, the new system should be optimized so that non-Ractor programs are minimally affected.

Future Work

Once GC 1 is completed, the next step would be to begin developing GC 2. Because GC 2 would track sharedness more carefully, it should have even more efficiency than GC 1.
Another feature to develop is Ruby’s compaction algorithm, which is an extension of Ruby’s GC algorithm. At the current point in the project, compaction can be done locally by using the Ractor-Local GC. Developing a global compaction algorithm would allow for more efficient use of memory.