Skip to content

Instantly share code, notes, and snippets.

@brixen
Created August 8, 2011 18:13
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brixen/1132332 to your computer and use it in GitHub Desktop.
Save brixen/1132332 to your computer and use it in GitHub Desktop.
Proposal to change .rbc files to use only the rbc.db mechanism
Rubinius uses .rbc files to cache on disk the compiled bytecode for a Ruby
source code file. Typically, these cache files exist alongside the
corresponding .rb file, however, it is possible to collect all the cache files
into a single directory (and subdirectories) by hashing the full path to the
Ruby source file as a key to find a file in the cache directory.
In Rubinius 2.0, we have multiple language modes. The bytecode for 1.8
language mode differs from the bytecode for 1.9 language mode. The .rbc file
format was extended to include language version. This ensures that running the
same Ruby file in different modes will not use the wrong version of bytecode.
When Rubinius is installed, we pre-compile all the Ruby files in the lib/
directory. This ensures that if Rubinius is installed to a directory where a
user does not have write access, the cache files will still be created and can
be used to speed loading of standard library code.
If the .rbc files are placed alongside the .rb files, the existing arrangement
must be changed to provide different .rbc files depending on language mode. In
other words, just versioning the .rbc file is no longer sufficient as the
version of the .rbc files created for lib/**/*.rb files would be one or the
other. The same situation exists for the pre-installed gems, which are not
split into different gem directories for 1.8 and 1.9 mode.
An additional problem with creating .rbc files alongside the .rb files is that
people object to cluttering their source with the cache files.
There have repeatedly been requests for distributing Ruby applications without
Ruby source code. The existing .rbc files can be used for this, but are quite
primitive and don't provide easy ability to abstract other storage
configurations (eg encryption).
Finally, there are potentially numerous advantages to storing the compilation
cache in a proper database that would permit storing a great deal of
additional metadata for building tools for Ruby. Abstracting the cache from
the existing .rbc files to the directory of files using the -Xrbc.db option is
a good first step.
To summarize the problems with the existing .rbc mechanism:
1. Multiple different files are required to permit .rbc files in different
language modes to exist alongside a single .rb file, as is the case with
pre-compiling the standard library files on install.
2. People object to the files cluttering their source code.
3. The files don't easily permit extending them to store additional, valuable
metadata.
4. Related to 3, the files don't provide a suitably powerful mechanism for
distributing Ruby applications without source code.
The existing -Xrbc.db option is a direct replacement for storing the .rbc
files alongside the .rb files and immediately solves problem #1 above. One
issue with the rbx.db option is what to provide for a default value. This is
my proposal:
1. If the user explicitly provides a path with -Xrbc.db, cache all files in
that path.
2. If the user does not provide a path, use two separate paths as follows:
a. on boot, record the current working directory (referred to as CWD below).
b. if the file being loaded has CWD as a prefix, store the cache for the
file in CWD/.rbx/<wherever>
c. if the file being loaded does not have CWD as a prefix, store the cache
for the file in ~/.rbx/<wherever>
3. When hashing the file path to determine the cache file, add the language
mode so that 1.8 and 1.9 files are separated. This does not replace the use
of the language version information embedded in the .rbc format, but avoids
recompile thrashing for e.g. running the specs under 1.8 mode and then
under 1.9 mode.
4. Only read and write to the cache if the cache directory is owned by the
user. This avoids a potential security hole where a superuser could be
running bytecode that was put into the cache maliciously and prevents the
superuser from creating files that the user would not be able to overwrite.
With these changes above, we have a reasonable default for all files. The
standard library files cache would exist in ~/.rbx/, which is reasonable for a
file installed with Rubinius that isn't going to be changing. The application
files would by default be cached with the application directory, but would not
liter files where source code files are. If the user explicitly requests a
rbc.db directory, all files are written there, but are still segregated based
on language version.
As a related but separate change, since we have full Ruby concurrency in
Rubinius 2.0, I propose making the Writer stage of the bytecode compiler use a
separate thread. Once the CompiledMethod is created, it is enqueued for
writing to the cache and immediately returned. The program can start executing
the method while the separate cache thread figures out where to put it and
marshals the contents to disk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment