Skip to content

Instantly share code, notes, and snippets.

@moteus
Forked from DhavalKapil/ELF-symbol-resolution.md
Created November 15, 2019 09:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save moteus/4ceafd5d26c9267d16fd4cc2376f5f23 to your computer and use it in GitHub Desktop.
Save moteus/4ceafd5d26c9267d16fd4cc2376f5f23 to your computer and use it in GitHub Desktop.

ELF

ELF Header

The first portion of any ELF file is the ELF header. This generally provides offsets to other headers (program headers and section headers) within an ELF.

typedef struct {
  unsigned char e_ident[EI_NIDENT];
  uint16_t e_type;
  uint16_t e_machine;
  uint32_t e_version;
  ElfN_Addr e_entry;
  ElfN_Off e_phoff; /* The program header offset */
  ElfN_Off e_shoff; /* The section header offset */
  uint32_t e_flags;
  uint16_t e_ehsize;
  uint16_t e_phentsize; /* Size of a program header entry */
  uint16_t e_phnum; /* Number of entries in the program header table */
  uint16_t e_shentsize; /* Size of a section header entry */
  uint16_t e_shnum; /* Number of entries in the section header table */
  uint16_t e_shstrndx;
} ElfN_Ehdr;

ELF Types

  1. ET_REL: Relocatable file. These are generally Position Independent Code. The code and data has not been linked to an executable. Also called object files.

  2. ET_EXEC: Executable file. The file has an entry point and can be executed. Various object (relocatable) files are linked together at compilne-time to produce an executable. Also called program.

  3. ET_DYN: Shared object. This is a dynamically linked object file. These can be directly loaded and linked to a program's process at runtime. Also called shared library.

  4. ET_CORE: Core file. This is a dump of the process's state at the time of a program crash.

  5. ET_NONE: Unkown type.

ELF Program Headers

These headers describe segments with a binary. These are needed at load time to form the memory layout. Hence, relocatable ELF files do not have program headers (linux loadable kernel modules are an exception).

typedef struct {
  uint32_t p_type; /* segment type */
  Elf32_Off p_offset; /* segment offset */
  Elf32_Addr p_vaddr; /* segment virtual address */
  Elf32_Addr p_paddr; /* segment physical address */
  uint32_t p_filesz; /* size of segment in the file */
  uint32_t p_memsz; /* size of segment in memory */
  uint32_t p_flags; /* segment flags, I.E execute|read|read */
  uint32_t p_align; /* segment alignment in memory */
} Elf32_Phdr; /* A single entry in the program header table */

These are the common types of segments:

  1. PT_LOAD: Loadable segment. These segments are the ones that are to be loaded or mapped into memory. Code and data reside in such segments. All executables will generally have these two segments.

  2. PT_DYNAMIC: Dynamic segment. These segments are present in those ELF files that are dynamically linked. These contain information necessary for the dynamic linker. Examples of dynamic segments include symbol hash table, string table, symbol table, relocation table, initiatization and termination functions, name of a shared object, address of GOT (Global Offset Table), etc.

  3. PT_NOTE: This segment generally contains vendor/system related auxiliary information.

  4. PT_INTERP: This segment only contains the address (and size) of a string representing the program interpreter (generally the dynamic linker 'linux-ld.so.2')

  5. PT_PHDR: This segment contains the address (and size) of the program table itself.

ELF Section Headers

Each segment is divided into various sections. The section header table describes these sections. This information is primarily needed for linking and debugging purposes. The program can by run even after 'stripping' section headers. Each section containes either code or data (including meta data).

typedef struct {
  uint32_t sh_name; /* offset into shdr string table for shdr name */
  uint32_t sh_type; /* shdr type I.E SHT_PROGBITS */
  uint32_t sh_flags; /* shdr flags I.E SHT_WRITE|SHT_ALLOC */
  Elf32_Addr sh_addr; /* address of where section begins */
  Elf32_Off sh_offset; /* offset of shdr from beginning of file */
  uint32_t sh_size; /* size that section takes up on disk */
  uint32_t sh_link; /* points to another section */
  uint32_t sh_info; /* interpretation depends on section type */
  uint32_t sh_addralign; /* alignment for address of section */
  uint32_t sh_entsize; /* size of each certain entries that may be in section */
} Elf32_Shdr;

Note that only the 'section header' table is not needed for execution, not the 'sections'.

Some common section types within text segment:

  1. .text: Code section.
  2. .rodata: Read only data section. This includes strings, etc.
  3. .plt: Procedure linkage table section. Contains meta information needed by the dynamic linker to import functions from shared libraries.
  4. .dynsym: Dynamic symbol information. These symbols are the ones imported from shared libraries.
  5. dynstr: The string table for dynamic symbols. Contains name of each symbol in a series of null terminated strings.
  6. .hash / .gnu.hash: This section contains a hash table for symbol lookup
  7. .rel.*: The relocation section contains information about the changes to be done in existing ELF at linking or at runtime (such as adjusting addresses).

Some common section types with data segment:

  1. .data: This data section contains data such as initialized global variables.
  2. .bss: This section contains unitialized global data. This takes up no storage on disk. During loading, all this daa is initialized with '0' bytes.
  3. .got.plt: This contains the global offset table, used in conjuction with the PLT by the dynamic linker to import functions from shared libraries.

.symtab: The symbol table. .strtab: The symbol string table. .shstrtab: The section header string table. Contains strings such as ".text", etc.

ELF Symbols

Symbols are simple symbolic references to any function variable, etc. There are two symbol tables: .dynsym and .symtab. The former contains references to external sources (shared libraries). The latter contains both external as well as local references. Only the .dynsym is needed for program execution as the dynamic linker needs to resolve these references at run time. .symtab only helps in debugging and linking.

typedef struct {
  uint32_t st_name; /* Offset into the corresponding table's string table */
  unsigned char st_info; /* Symbol type and binding attributes */
  unsigned char st_other; /* Symbol visibility */
  uint16_t st_shndx; /* Each symbol is defined for a particular section */
  Elf64_Addr st_value; /* Either an address or offset */
  uint64_t st_size; /* Size of the actual reference */
} Elf64_Sym; /* A single entry in any of the symbol table */

These are the different symbol types:

  1. STT_NOTYPE: Undefined type.
  2. STT_FUNC: Function or executable code.
  3. STT_OBJECT: Data object.

There are the different symbol bindings:

  1. STB_LOCAL: Local symbols (static variables and functions).
  2. STB_GLOBAL: Symbols visible to all object files.
  3. STB_WEAK: Same as global, with less precedence. Can be overriden by another symbol having the same name but not marked as STB_WEAK.

ELF Relocations

Relocation is the process of connecting symbolic references with symbolic definitions. Relocation entries are meant for this purpose. Relocations involve binary patching.

There are two types of relocation entries:

/* Implicit addends - The addend is stored in at the place where relocation occurs */
typedef struct {
  Elf64_Addr r_offset;
  uint64_t r_info;
} Elf64_Rel;

typedef struct {
  Elf64_Addr r_offset; /* Points to the location that requires relocation */
  uint64_t r_info; /* Symbol table index and type of relocation */
  int64_t r_addend; /* Specifies a constant addend */
} Elf64_Rela;

ELF Dynamic Linking

Shared libraries are compiled as position independent code. Whenever an executable program is loaded in the memory to be run, the kernel sets up the stack. The bottom of the stack (highest address) is filled with an auxiliary vector.

typedef struct
{
  uint64_t a_type; /* Entry type */
  union
  {
    uint64_t a_val; /* Integer value */
  } a_un;
} Elf64_auxv_t;

The types include program headers (and its size), entry point, page size, etc. The dynamic linker uses details from this auxiliary vector to resolve symbols.

PLT and GOT

A function call to a function present in some external shared library is resolved using PLT and GOT. Call to func is actually a call to func@plt. The PLT function jumps to an address in GOT, pushes an index on the stack and then jumps to a common PLT function. The index pushed is actually an index in the .rel.plt table section that contains a particular relocation entry for that symbol (referencing that function).

The address in GOT is the relocation offset for func. Initially that address points to the push instruction in PLT.

The first three GOT address are reserved:

  • GOT[0] - points to the dymanic segment of the executable
  • GOT[1] - points to the link_map structure
  • GOT[2] - points to _dl_runtime_resolve()

The common PLT function calls GOT[2] with arguments GOT[1] and the index pushed.

Dynamic Segment

The '.dynamic' section has a section header as well as a program header so that it can be used by the dynamic linker. This section contains arrays of:

typedef struct {
  Elf32_Sword d_tag;
  union {
    Elf32_Word d_val;
    Elf32_Addr d_ptr;
  } d_un;
} Elf32_Dyn;

The different types of d_tags are:

  1. DT_NEEDED: holds the string table offset to the name of a needed shared library.

  2. DT_SYMTAB: contains the address of the dynamic symbol table .dynsym.

  3. DT_HASH: contains the address of the symbol hash table .gnu.hash.

  4. DT_STRTAB: contains the address of the symbol string table .dynstr.

  5. DT_PLTGOT: contains the address of the global offset table.

Clearly, some part (section headers) of a stripped binary can be recovered using the .dynamic section.

When the dynamic linker is mapped to the memory, it first handles its own relocations. Then, it looks into the .dynamic section and searches for DT_NEEDED tags to locate the different shared libraries to be loaded. It then brings the shared library in memory, looks into its .dynamic section and adds the library's symbol table to a chain of symbol tables it maintains. It also creates an entry for every shared library:

struct link_map
{
  ElfW(Addr) l_addr; /* Base address shared object is loaded at. */
  char *l_name; /* Absolute file name object was found in. */
  ElfW(Dyn) *l_ld; /* Dynamic section of the shared object. */
  struct link_map *l_next, *l_prev; /* Chain of loaded objects. */
  ElfW(Dyn) *l_info[DT_NUM + DT_THISPROCNUM + DT_VERSIONTAGNUM + DT_EXTRANUM + DT_VALNUM + DT_ADDRNUM]; /* Holds pointers to symbol table (l_info[DT_SYMTAB]) and relocation table (l_info[DT_JMPREL]) */
 };

Note that the first entry in link_map is of the executable binary itself.

During the process of linking external functions, a call is made to _dl_runtime_resolve with parameters: the link_map struct and the index into the relocation table for that function. The relocation entry gives the index in the symbol table for that function and also the address in GOT to be patched. The symbol is then searched in shared libraries using the link_map struct. The search involves the following steps:

  1. Generating a hash of the symbol name to be searched for.
  2. Lookup the symbol table entry using that index.
  3. Lookup the name of that symbol in string table and compare.

If found, the symbol's address is added to the corresponding shared library's base address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment