zeroomega/elfabi_ifs_merge.md Secret

## elfabi_ifs_merge.md

      
    Raw
  

              elfabi_ifs_merge.md
            
          
    llvm-elfabi/ifs merge plan

Introduction

llvm-elfabi and llvm-ifs are tools that emit both text and binary stub for shared objects. They are providing similar functionalities but they have incompatible input formats. This document serves as a comparison between these 2 tools and initial planning for merging these tools.
Background/Motivation

A number of projects have implemented their own versions of shared object stubbing (the process of separating a linkable shared object’s interface from its implementation) for a number of reasons related to improving the overall linking experience. This functionality was absent from LLVM despite how close the practice is to LLVM’s domain. llvm-elfabi was created by the Fuchsia toolchain team to provide LLVM a functionality to emit shared object stubs (both textual and binary) for the ELF file format. llvm-ifs shared similar motivations. However, unlike llvm-elfabi which only focuses on ELF file format, llvm-ifs also tried to support MachO/COFF (not yet implemented) and Apple TAPI formats. Given the fact that the two tools have very similar text stub format and both are capable of generating ELF stubs from text stubs, it is better to merge these 2 tools together to avoid reinvesting efforts on 2 tools with the same functionalities.
Comparison

Input/Ouput


llvm-elfabi
llvm-ifs


Input
ELF
Text stub (IFS YAML, multiple)


Text stub (TBE YAML)


Output
ELF
Text stub (IFS YAML)


Text stub (TBE YAML)
Apple TBD file


ELF Stub


The biggest difference here is that llvm-ifs does not read ELF files. The IFS YAML file, which is the llvm-ifs's text stub file format, is the only supported input format. The reason is that IFS files are supposed to be generated from clang frontend, instead of from linked ELF files. llvm-ifs is also capable of merging multiple IFS YAML files together as its output, since clang can only generate one ifs file per translation unit.
llvm-ifs is capable of writing Apple TBD. Not quite sure why llvm-ifs classify it as a binary file (require -write-bin arg instead of -write-ifs arg) though TBD file is a text representation of the dylib. Apple TBD is not supported by elfabi as it is out of scope.
Text Stub format

llvm-elfabi

--- !tapi-tbe
TbeVersion: 1.0
Arch: x86_64    /* Or AArch64, Unknown */
SOName: libtest.so   /* Optional */
NeededLibs:
  - libc.so.6
Symbols:
  sym0: { Type: Notype }
  sym1: { Type: Object, Size: 0 }
  sym2: { Type: Func, Weak: false }
  sym3: { Type: TLS }
  sym4: { Type: Unknown, Warning: foo }
...

llvm-ifs

--- !experimental-ifs-v2
IfsVersion: 2.0
Triple: x86_64-unknown-linux-gnu
ObjectFileFormat: ELF
SOName: libtest.so   /* Optional */
NeededLibs:
  - libc.so.6
Symbols:
  - { Name: sym0, Type: NoType }
  - { Name: sym1, Type: Object, Size: 0 }
  - { Name: sym2, Type: Func, Weak: false }
  - { Name: sym3, Type: Unknown, Warning: foo }
...

The text stub format between these tools are quite similar. Both tools use YAML for their text stubs. There are 3 major differences:

elfabi uses the e_machine and cmd line args to identify arch, bit width and endianness, while ifs uses 'LLVM target triple'.
elfabi uses a custom map in Symbols table, which uses symbol name as key, while ifs uses generic array, the symbol name is just an attribute.
elfabi supports TLS symbol type which ifs does not.

In both cases, the symbol tables are sorted by name. LLVM YAML engine does not guarantee that order when reading the YAML file.
I think the main issue is how elfabi and ifs handles architecture differently. The reason ifs uses LLVM triple is that it was generated from clang frontend, where the triple is easy to get. Ifs also does not have a use case to generate ELF stubs for multiple platforms. I think the best approach here is to accommodate both parties' use cases, having data structure in the text YAML file to record arch, bitwidth and endianness information, while allowing them to be overridden in command line options.
Command line options

llvm-elfabi

llvm-elfabi [--emit-tbe=<path>] [--output-target=elf32-little|elf32-big|elf64-little|elf64-big] [--soname=<name>] [--tbe] [--elf] [--write-if-changed]

llvm-ifs

llvm-ifs --action=<write-ifs|write-bin> [--force-format=ELF|TBD] -o=<path> <ifs-files>

llvm-elfabi requires the user to explicitly define the output format, by either using --emit-tbe flag or --output-target=. The input format for llvm-elfabi is automatically inferred but can be forced using --tbe or --elf flags. In contrast, llvm-ifs only accept IFS YAML files, so the input file type cannot be changed. The output can be controlled by both --action(mandatory) and --force-format(optional) flags. The reason is that IFS YAML has a format field, which can be set to TBD, so it needs the --force-format flag to override that, while --action flag controls whether the output should be a merged IFS YAML or ELF/TBD binary.
Merge plan

Unified YAML format

The major difference between these 2 tools is the YAML format and that is the first thing that needs to be unified. In YAML format, the 2 tools differ in their way to record the target platform information.  Given the fact that text stub is a human readable representation of a shared object file stub, we think it is better to record architecture information that exists in the object file instead of a target triple which is only available at build time. Besides, the target triple cannot be reliably derived from an already linked ELF. For example, there is no way to distinguish ELF files built with x86_64-unknown-linux-gnu and x86_64-fuchsia reliably. Their ELF headers simply look the same. However, llvm target triple is widely used in other llvm tools and it is unwise not to support it. Therefore, we propose to allow using either target triple field or ELF derived target fields in the text YAML format.
The proposed format is illustrated as follow:
--- !ifs-v1
IFSVersion: 1.0
Target: x86_64-unknown-linux-gnu   /* Optional, format 1, same format as llvm target triple */ **OR**
Target: { ObjectFormat: ELF, Arch: x86_64, Endianness: little, Bitwidth: 64 } /* Optional, format 2 */
Name: libtest.so   /* Optional */
Needed:
  - libc.so.6
Symbols:
  - { Name: sym0, Type: Notype }
  - { Name: sym1, Type: Object, Size: 0 }
  - { Name: sym2, Type: Func, Weak: false }
  - { Name: sym3, Type: TLS }
  - { Name: sym4, Type: Unknown, Warning: foo }
...

Field Target can be either a llvm triple string or an object with Arch, Endianness and BitWidth fields. It cannot be both. Arch can be any platform supported by e_machine field in ELF format. Endianness can only be big or little and Bitwidth can be 32 or 64. They are set to be optional, which allow us to use a single text stub for generating multiple object stubs for multiple platforms.
Unified command line options

To satisfy the use cases from both llvm-elfabi and llvm-ifs, we propose the following command line options format:
llvm-ifs [options] --output=<path> <input files>

Options:


Options
Description


--input-format=[IFS|ELF|OtherObjectFileFormats]
Specify input file format. Optional.


--output-format=[IFS|ELF|OtherObjectFileFormats]
Specify output file format. Required.


--arch=[x86_64|AArch64|...]
Should be only used when reading an IFS which does not define the Arch field.


This flag defines the architecture of the output file, and can be any string


supported by ELF e_machine field. If the value is conflicting with the IFS


file, an error will be reported and the program will stop. Optional.


--endianness=[little|big]
Should be only used when reading an IFS which does not define the Endianness


field. This flag defines the endianness of the output file. If the value is


conflicting with the IFS file, an error will be reported and the program will


stop. Optional.


--bitwidth=[32|64]
Should be only used when reading an IFS which does not define the BitWidth


field. This flag defines the bit width of the output file. If the value is


conflicting with the input IFS file, an error will be reported and the


program will stop. Optional.


--target=[x86_64-unknown-linux-gnu|...]
Should be only used when reading an IFS which does not define any target


information. This flag defines architecture, endianness and bit width of the


output file using llvm target triple. Optional and cannot be used


simultaneously with other target related flags. Optional.


--hint-ifs-target=[x86_64-unknown-linux-gnu|...]
When input is an object file and output format is IFS, by default, llvm-ifs


will use 'Arch, Endianness and BitWidth' fields to reflect the target


information from the input object file. Using this flag will tell llvm-ifs


the expected target triple in the output IFS file. If the value matches the


target information from the object file, this value will be used in the


'Target:' filed in the generated IFS. If it conflicts with the input object


file, an error will be reported and the program will stop.


--strip-ifs-arch
Strip the Arch field when output is IFS.


--strip-ifs-endianness
Strip the Endianness field when output is IFS.


--strip-ifs-bitwidth
Strip the BitWidth field when output is IFS.


--strip-ifs-target
Strip the architecture, endianness and bit width information from the output


IFS text stub.


User needs to explicitly specify the output format but input format can be optional, since it can be easily inferred from the input file(s). There are 4 output target related flags to allow the user control the output's architecture, endianness and bit width when the output file is an object file like ELF. These flags are optional if the input file(s) already have this information. They will be required if the input file is a stripped ifs text stub. --strip-ifs* flags can be used to strip arch,... information from the output of the stub file. When llvm-ifs reads an object file like ELF and output format is IFS, by default it will extract the target information and use the format like Target: { Arch: x86_64, Endianness: little, Bitwidth: 64 } in the generated IFS file, as the llvm target triple cannot be reliably derived from an already linked binary. E.g. ELF files built with x86_64-unknown-linux-gnu and x86_64-fuchsia will have the same set of e_machine, e_ident[EI_DATA] and e_identEI_[CLASS] in the ELF header. The users can use --hint-ifs-target= flag to tell llvm-ifs the expected target triple for the input object file. If the target triple supplied in --hint-ifs-target= matches the information from the input object file, the target triple will be written into the IFS file, otherwise an error will be reported indicating there are conflicts between the supplied target triple and input file.
Implementation planning

Due to the fact that both input format and command line options need to be changed, the merging will be a breaking change for both llvm-elfabi and llvm-ifs. Fuchsia is the only known user for llvm-elfabi so breaking changes are not an issue for us. I believe it's the similar case for llvm-ifs. Still we prefer and propose a multi-steps transition:

Implement the unified YAML format and command line option in llvm-elfabi. Add a warning in stdout to notify the user the tool is supposed to merge with ifs in the near future.
Add option in clang IFSO driver to allow generating unified YAML format.
Add options in llvm-ifs to read unified YAML format as well as the support for the unified command line options.
Remove llvm-elfabi.
Remove outdated command line options and old input format.

This merge transition should guarantee the old interface and format are still supported until the new format and interface are stable. While we won't provide any backward compatibility after the merging is complete, it shouldn't cause too many issues as the new IFS files can be easily regenerated from either clang IFSO or llvm-ifs tool.
	llvm-elfabi	llvm-ifs
Input	ELF	Text stub (IFS YAML, multiple)
	Text stub (TBE YAML)

Output	ELF	Text stub (IFS YAML)
	Text stub (TBE YAML)	Apple TBD file
		ELF Stub
Options	Description
`--input-format=[IFS\|ELF\|OtherObjectFileFormats]`	Specify input file format. Optional.
`--output-format=[IFS\|ELF\|OtherObjectFileFormats]`	Specify output file format. Required.
`--arch=[x86_64\|AArch64\|...]`	Should be only used when reading an IFS which does not define the Arch field.
	This flag defines the architecture of the output file, and can be any string
	supported by ELF `e_machine` field. If the value is conflicting with the IFS
	file, an error will be reported and the program will stop. Optional.

`--endianness=[little\|big]`	Should be only used when reading an IFS which does not define the Endianness
	field. This flag defines the endianness of the output file. If the value is
	conflicting with the IFS file, an error will be reported and the program will
	stop. Optional.

`--bitwidth=[32\|64]`	Should be only used when reading an IFS which does not define the BitWidth
	field. This flag defines the bit width of the output file. If the value is
	conflicting with the input IFS file, an error will be reported and the
	program will stop. Optional.

`--target=[x86_64-unknown-linux-gnu\|...]`	Should be only used when reading an IFS which does not define any target
	information. This flag defines architecture, endianness and bit width of the
	output file using llvm target triple. Optional and cannot be used
	simultaneously with other target related flags. Optional.

`--hint-ifs-target=[x86_64-unknown-linux-gnu\|...]`	When input is an object file and output format is IFS, by default, llvm-ifs
	will use 'Arch, Endianness and BitWidth' fields to reflect the target
	information from the input object file. Using this flag will tell llvm-ifs
	the expected target triple in the output IFS file. If the value matches the
	target information from the object file, this value will be used in the
	'Target:' filed in the generated IFS. If it conflicts with the input object
	file, an error will be reported and the program will stop.

`--strip-ifs-arch`	Strip the Arch field when output is IFS.
`--strip-ifs-endianness`	Strip the Endianness field when output is IFS.
`--strip-ifs-bitwidth`	Strip the BitWidth field when output is IFS.
`--strip-ifs-target`	Strip the architecture, endianness and bit width information from the output
	IFS text stub.