sftblw/공유 라이브러리의 로드 시점 재배치(relocation).md

## 공유 라이브러리의 로드 시점 재배치(relocation).md

      
    Raw
  

              공유 라이브러리의 로드 시점 재배치(relocation).md
            
          
    공유 라이브러리의 로드 시점 재배치(relocation)

이 글은 원글을 이해하기 위해 이미 번역이 존재하지만서도 직접 번역한 것입니다. 매끄러운 가독성과 쉬운 이해를 위해 의역을 왕창 끼얹었으며, 제가 이해한 바를 기준으로 하므로 오역 같은 걸 왕창 끼얹나? 일 수도 있습니다. 지적 감사합니다.
아직 번역 허락을 받진 않았습니다. 받을지 여부도 모르겠습니다. 귀찮아서 안 받을 것 같습니다.


로드 와 적재를 혼용해서 쓰고 있습니다. 정리할 필요가 있어보이네요. 귀찮다 안 해야지...
섹션, 세그먼트는 서로 혼동될 수 있는데다 전문용어인 걸로 판단해 음차대로 썼습니다.
보너스 섹션 1, 2는 번역하지 않았습니다. 다른 거 해야돼서요...


이 글의 목표는 현대적인 운영체제가 로드 시점(load-time) 재배치 기법을 활용하여 공유 라이브러리를 사용 가능하게 하는 방법을 설명하는 것입니다. 이 글은 32비트 x86 Linux 운영체제를 중심으로 살펴보지만, 일반적인 원리는 다른 운영체제와 CPU에도 적용됩니다.
공유 라이브러리(shared library)는 여러 이름이 있다는 점을 주의해주세요 - shared libraries, shared objects, dynamic shared objects (DSOs), dynamically linked libraries (DLL - 배경지식이 윈도 기반이라면 들어보셨을 겁니다) 같은 것들 말입니다. 일관성을 위해 이 글에서는 되도록이면 "공유 라이브러리(shared library)" 라는 말만 사용하려고 합니다.
실행파일 로드하기

다른 가상 메모리를 지원하는 OS처럼, Linux는 실행파일을 고정된 메모리 주소에 적재합니다. 아무 실행파일이나 잡아서 ELF 헤더를 살펴보면 (프로그램의) 시작지점 주소(Entry point address)를 찾을 수 있을겁니다:
$ readelf -h /usr/bin/uptime
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 # 역주: 매직 넘버는 파일 형식을 나타내는 일련의 비트 시퀀스로 파일 초반에 나타납니다.
  Class:                             ELF32
  [...] 몇몇 헤더 필드들 (생략)
  Entry point address:               0x8048470             # 역주: 이겁니다.
  [...] 몇몇 헤더 필드들 (생략)
이 값은 OS에게 실행파일에 들어있는 코드 영역의 실행 시작지점이 어디인지 알려주기 위해 링커가 집어넣은 값입니다 1. 그리고 정말로 GDB로 실행파일을 로드해서 0x8048470 주소를 들여다보면, 실행파일의 .text 세그먼트(주: ELF 포맷의 코드 영역)의 첫 번째 부분에 해당하는 명령어들을 발견할 것입니다.
이건 링커가 실행파일을 링크할 때 모든 (함수와 데이터를 가리키는) 내부 기호 참조(internal symbol references)를 고정된 최종 위치로 완전히 해석해낼(resolve) 수 있다는 것을 의미합니다. 링커가 내부적으로는 재배치(relocation)를 좀 하긴 하지만 2, 결국 링커의 최종 출력물에는 더 이상의 재배치(relocation) 정보가 없습니다.
아니면 재배치 정보가 있을까요? 앞 문단에서 내부적으로라는 단어로 강조했었다는 점에 주의해주세요. 실행 파일이 공유 라이브러리를 필요로 하지 않는 이상 3 그 실행파일의 경우 재배치는 필요 없죠. 하지만 (대부분의 주된 리눅스 응용 프로그램이 그렇듯이) 공유 라이브러리를 사용한다면, 공유 라이브러리의 로드 방법 때문에 공유 라이브러리에서 가져다 쓴 기호(Symbols)들을 재배치(relocated)할 필요가 있습니다.
공유 라이브러리 로드하기

실행파일의 빌드와는 달리, 공유 라이브러리를 빌드할 때에는 링커가 (라이브러리의) 코드들의 실제 고정 주소를 추정할 수 없습니다. 이유는 간단합니다. 프로그램은 공유 라이브러리를 몇 개든지 쓸 수 있고, 그러다보니 특정 공유 라이브러리가 대체 프로세스의 가상 메모리 어디에 로드될 지 알 수 없어서입니다. 시간이 흐름에 따라 이 문제를 해결할 많은 방법들이 발명되었지만, 이 글에서는 Linux에서 현재(역주: 않이오...) 사용되는 방법에만 중점을 둘 것입니다.
그 전에 먼저, 문제가 뭔지부터 대략적으로 살펴봅시다. 공유 라이브러리로 컴파일할 샘플 C 코드 4 를 제시해드리겠습니다:
int myglob = 42;

int ml_func(int a, int b)
{
    myglob += a;
    return b + myglob;
}
ml_func()이 myglob를 어떻게 참조하는지 확인해보세요. 이게 x86 어셈블리로 바뀌면, 메모리에서 레지스터로 myglob의 값을 끌어올리기 위한 mov 명령어를 포함할 겁니다. mov 명령어에는 절대주소가 필요합니다 - 그럼 링커는 mov 명령어에 어떤 주소를 (인자로) 넣어야 하는지 어떻게 알까요? 정답은 모른다 입니다. 위에서 말씀드렸다시피, 공유 라이브러리에는 미리-정해진(pre-defined) 로드 주소가 없습니다 - 주소는 실행될 때(runtime) 결정될 겁니다.
Linux에서, 동적 로더(dynamic loader) 5 는 프로그램을 실행 가능하도록 준비하는 책임을 지는 코드 조각입니다. 동적 로더의 작업 중 하나는 실행중인 실행파일이 동적 라이브러리를 요청했을 때 동적 라이브러리를 디스크에서 메모리로 로드하는 일입니다. 동적 라이브러리는 메모리에 적재될 때 새 위치로 조정됩니다. 동적 로더가 할 일이 바로 이전 문단에서 언급했던 문제를 해결하는 것입니다.
리눅스 ELF 공유 라이브러리에서 이런 문제를 해결하는 접근방법은 두 가지가 있습니다.

로드-시점 재배치 (load-time relocation)
위치무관 코드 (PIC, Position independent code)

PIC가 좀 더 일반적이고 요즘에 사용되는 해결방법이지만서도, 이 글에서 저는 로드-시점 재배치만 다룰 것입니다. 최종적으로는 두 접근 방법을 모두 다룰 것이고 PIC에 대한 별개의 글을 쓸 것이고, 로드-시점 재배치로 (설명하기) 시작하는 게 나중에 PIC를 설명하는 게 쉬워질 거라고 생각하거든요. (2011.03.11 추가: PIC를 설명한 글을 공개했습니다 / 역주: 않이오 아직 번역이...)
로드 시점 재배치(relocation)을 위한 공유 라이브러리의 링킹

로드 시점에 재배치되는 공유 라이브러리를 만들기 위해, -fPIC 플래그 없이 컴파일하겠습니다 (이렇게 안 하면 PIC가 생성될 겁니다).
gcc -g -c ml_main.c -o ml_mainreloc.o
gcc -shared -o libmlreloc.so ml_mainreloc.o
가장 먼저 관심있게 살펴봐야 할 부분은 libmlreloc.so의 진입 지점입니다.
$ readelf -h libmlreloc.so
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  [...] 몇몇 헤더 필드들 (생략)
  Entry point address:               0x3b0
  [...] 몇몇 헤더 필드들 (생략)
단순성을 위해, 링커는 어찌되었든 나중에 옮길 거라는 걸 알기에, 공유 개체(shared object)를 그냥 주소 0x0에 링크했습니다 (이렇게 되면 .text 섹션이 0x3b0에 시작합니다). 이 사실을 잘 기억해두세요 - 이 글의 나중 부분에서 유용할 겁니다.
이제 ml_func에 주목하면서 이 공유 라이브러리의 디스어셈블리를 살펴봅시다:
$ objdump -d -Mintel libmlreloc.so

libmlreloc.so:     file format elf32-i386

[...] 일부 생략

0000046c <ml_func>:
 46c: 55                      push   ebp                       # 역주: prologue
 46d: 89 e5                   mov    ebp,esp
 46f: a1 00 00 00 00          mov    eax,ds:0x0                # 역주: myglob += a
 474: 03 45 08                add    eax,DWORD PTR [ebp+0x8]
 477: a3 00 00 00 00          mov    ds:0x0,eax
 47c: a1 00 00 00 00          mov    eax,ds:0x0                # 역주: return b + myglob
 481: 03 45 0c                add    eax,DWORD PTR [ebp+0xc]
 484: 5d                      pop    ebp
 485: c3                      ret

[...] 일부 생략
프롤로그 6 인 첫 두 명령어 다음에, myglob += a의 컴파일된 버전이 있습니다 7. myglob의 값은 메모리에서 eax로 옮겨지고, a (ebp+0x8) 만큼 증가한 뒤, 메모리에 다시 놓여집니다.
어라 잠깐만요, mov가 myglob의 값을 가져간다고요? mov의 실제 피연산자(operand)는 그냥 0x0 인 것 같네요 8. 뭘 준다고요? (역주: 0x0을 준다고요?) 이게 바로 재배치(relocation)의 동작 원리입니다. 링커는 사전에 정의된 임시 값(이 경우 0x0)을 명령어 나열에 집어넣은 뒤, 이 곳을 가리키는 특수한 재배치 항목(relocation entries)을 만듭니다. 이 공유 라이브러리의 재배치 항목(relocation entry)들을 살펴봅시다:
$ readelf -r libmlreloc.so

Relocation section '.rel.dyn' at offset 0x2fc contains 7 entries: # 역주: 오프셋 0x2fc의 `.rel.dyn` 재배치 섹션에 7개의 항목이 존재합니다:
 Offset     Info    Type            Sym.Value  Sym. Name
00002008  00000008 R_386_RELATIVE
00000470  00000401 R_386_32          0000200C   myglob
00000478  00000401 R_386_32          0000200C   myglob
0000047d  00000401 R_386_32          0000200C   myglob
[...] 일부 생략
ELF의 rel.dyn 섹션은 동적인 (로드 시점) 재배치를 위해 예약되어 있고, 이 섹션은 동적 로더가 사용할 것입니다. 디스어셈블리에서 myglob에 대한 참조가 3개 있다보니, 위에 나온 섹션에도 myglob에 대한 3개의 재배치(relocation) 항목이 있습니다. 먼저 첫 번째 항목부터 해독해봅시다.
이 (공유 라이브러리의) 개체(object) 내에서 오프셋 0x470으로 가서, myglob 기호(symbol)에 대해 R_386_32 종류의 재배치(relocation) 를 적용하라고 하는군요. ELF 명세를 살펴보면 R_386_32 가 뭘 의미하는지 알 수 있을 겁니다: 항목에 적힌 오프셋에서 값을 가져와서(역주: ds:0x0 즉 0), 심볼의 (실제) 주소를 더하고는 (역주: 로드 시점에 결정됨), 그걸 다시 오프셋에 배치하라는 의미입니다.
개체(object) 내의 오프셋 0x470에는 뭐가 있을까요? ml_func의 디스어셈블리를 다시금 살펴봅시다:
46f:  a1 00 00 00 00          mov    eax,ds:0x0
#역주: 값:   a1  00  00  00  00
#    주소:  46f 470 471 472 473
#           --- ---------------
#           mov     ds:0x0
#          *주: a1 명령어 자체가 대상이 eax라는 의미를 포함 http://c9x.me/x86/html/file_module_x86_id_176.html
a1은 mov 명령어로 해석되고, 그래서 mov 명령어의 피연산자는 0x470 다음의 주소부터 시작됩니다. 디스어셈블리에서 이 곳에는 0x0이 있죠. 그러니 이제 재배치 항목(relocation entry)으로 돌아가보면, 이제 이게 뭘 말하는지 알 수 있을 겁니다: myglob( 기호)의  주소를 mov 명령어의 피연산자에 더하라는 거죠. 다른 말로 하자면 동적 로더에게 이렇게 말해주는 것입니다 - 실제 주소를 할당하는 작업을 수행할 때, myglob의 실제 주소를 0x470에 넣어주세요 라고 말하는 것과 같죠. 그래서 결국 mov 명령어의 피연산자를 올바른 기호(symbol) 값으로 바꾸게 됩니다. 멋지지 않나요?
재배치 섹션(relocation section)에 있는 "Sym. value"(기호 값) 에는 myglob 용으로 0x200C 이라는 값이 들어있다는 점도 확인해주세요. 이 값은 공유 라이브러리의 가상메모리 이미지 상에서의 myglob의 오프셋입니다 (돌이켜보면 이 오프셋은 링커가 그냥 0x0에서 로드될거라고 가정했던 값이죠). 이 값은 라이브러리의 기호 테이블(symbol table)을 살펴보면 알 수 있습니다, 예를 들자면  nm 같은 도구로 말이죠:
$ nm libmlreloc.so
[...] skipping stuff
0000200c D myglob
또한 이 결과는 라이브러리에서의 myglob의 오프셋도 알려줍니다. D가 의미하는 바는 기호(symbol)가 데이터 섹션(.data, data section) 에서 정의되었다는 것입니다.
로드 시점 재배치 실례

실제로 로드 시점 재배치가 동작하는 걸 보기 위해, 간단한 드라이버 실행파일에서 이전의 공유 라이브러리를 사용해보겠습니다. 이 실행파일을 실행하면, OS는 공유 라이브러리를 로드해서 적합하게 재배치(relocate)할 것입니다.
기묘하게도, Linux에 활성화 되어있는 주소 영역 임의화(randomization) 기능 때문에, 매번 실행파일을 실행할 때마다 libmlreloc.so 공유 라이브러리가 매번 다른 가상 메모리 주소에 배치되는 것 때문에 9, 재배치 작업(relocation)을 따라가며 살펴보기 좀 힘들겁니다.
뭐 그래도 이 정도 방해면 약하죠. 이걸 전부 말이 되게 하는 방법이 있습니다. 하지만 그 전에 먼저 이 공유 라이브러리에 포함된 세그먼트들에 관한 얘기를 해봅시다:
$ readelf --segments libmlreloc.so

Elf file type is DYN (Shared object file)
Entry point 0x3b0
There are 6 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0x004e8 0x004e8 R E 0x1000
  LOAD           0x000f04 0x00001f04 0x00001f04 0x0010c 0x00114 RW  0x1000
  DYNAMIC        0x000f18 0x00001f18 0x00001f18 0x000d0 0x000d0 RW  0x4
  NOTE           0x0000f4 0x000000f4 0x000000f4 0x00024 0x00024 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4
  GNU_RELRO      0x000f04 0x00001f04 0x00001f04 0x000fc 0x000fc R   0x1

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .eh_frame
   01     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
   02     .dynamic
   03     .note.gnu.build-id
   04
   05     .ctors .dtors .jcr .dynamic .got
myglob 기호(symbol)를 찾아보려면 목록에서 두 번째 세그먼트에 관심을 가져야겠네요. 몇 가지 확인하고 넘어가죠:

아래쪽의 세그먼트 매핑 섹션에서, 세그먼트 01은 .data 섹션을 포함한다고 나와있습니다. 이 영역이 myglob가 있는 곳이죠.
VirtAdd 열을 보면 두 번째 세그먼트는 0x1f04에서 시작하고 0x10c 만큼 크다고 나와있습니다. 이건 0x2010 (역주: 0x1f04 + 0x10c) 까지 이 섹션이 공간을 차지한다는 의미고, 0x200C에 있는 myglob가 거기 있다는 얘기죠.

로드 시점 링킹 절차를 살펴보기 위해 Linux가 제공하는 멋진 도구들을 사용해봅시다 - 어플리케이션이 실행 시간에 어떤 공유 라이브러리가 로드되었는지 조회할 수 있고, 더 중요한 점으로 프로그램 헤더를 들여다볼 수 있게 해주는 dl_iterate_phdr 함수를 사용해봅시다.
그러니 아래의 C 소스를 driver.c에 쓸겁니다:
#define _GNU_SOURCE
#include <link.h>
#include <stdlib.h>
#include <stdio.h>

static int header_handler(struct dl_phdr_info* info, size_t size, void* data)
{
    printf("name=%s (%d segments) address=%p\n",
            info->dlpi_name, info->dlpi_phnum, (void*)info->dlpi_addr);
    for (int j = 0; j < info->dlpi_phnum; j++) {
         printf("\t\t header %2d: address=%10p\n", j,
             (void*) (info->dlpi_addr + info->dlpi_phdr[j].p_vaddr));
         printf("\t\t\t type=%u, flags=0x%X\n",
                 info->dlpi_phdr[j].p_type, info->dlpi_phdr[j].p_flags);
    }
    printf("\n");
    return 0;
}

extern int ml_func(int, int);

int main(int argc, const char* argv[])
{
    dl_iterate_phdr(header_handler, NULL);

    int t = ml_func(argc, argc);
    return t;
}
header_handler는 dl_iterate_phdr의 콜백을 구현합니다. 이 함수는 모든 라이브러리에 대해 호출되어 라이브러리의 이름, 로드 주소, 모든 세그먼트 영역들을 알려줄 겁니다. 또한 이 (프로그램)은 libmlreloc.so 공유 라이브러리에서 가져온 ml_func도 호출할 겁니다.
드라이버 프로그램을 이전의 공유 라이브러리와 함께 컴파일하고 링크하려면 다음을 실행하면 됩니다:
gcc -g -c driver.c -o driver.o
gcc -o driver driver.o -L. -lmlreloc
드라이버만 혼자 실행시켜도 필요한 정보를 얻을 수 있지만, 매번 실행할 때마다 주소가 다를겁니다. 그래서 gdb 10 로 실행시켜서 뭐가 나오는 지 보고, gdb로 프로세스의 메모리 영역에 뭐가 더 있는지 살펴보는 방식으로 진행하려고 합니다.
 $ gdb -q driver
 Reading symbols from driver...done.
 (gdb) b driver.c:31
 Breakpoint 1 at 0x804869e: file driver.c, line 31.
 (gdb) r
 Starting program: driver
 [...] 출력 생략
 name=./libmlreloc.so (6 segments) address=0x12e000
                header  0: address=  0x12e000
                        type=1, flags=0x5
                header  1: address=  0x12ff04
                        type=1, flags=0x6
                header  2: address=  0x12ff18
                        type=2, flags=0x6
                header  3: address=  0x12e0f4
                        type=4, flags=0x4
                header  4: address=  0x12e000
                        type=1685382481, flags=0x6
                header  5: address=  0x12ff04
                        type=1685382482, flags=0x4

[...] 출력 생략
 Breakpoint 1, main (argc=1, argv=0xbffff3d4) at driver.c:31
 31    }
 (gdb)
드라이버 프로그램이 (libc나 동적 로더 자기자신같이 암묵적으로 로드하는 것까지 포함해서) 로드하는 모든 라이브러리를 보고하므로, 출력 내용이 매우 길 것이고 그러니 저는 libmlreloc.so에 관련된 결과만 살펴볼 것입니다.
수학을 좀 해보죠. 출력 내용이 말하길 libmlrelc.so는 가상메모리의 0x12e000에 놓여있다고 하는군요. 아까 readelf에서 봤던 것처럼 오프셋 0x1f04에 있는 두 번째 세그먼트을 살펴봐야겠죠. 살펴보면 두 번째 세그먼트가 0x12ff04에 로드되었음을 출력에서 확인할 수 있습니다. 그리고 myglobe는 파일에서 오프셋 0x200C에 있다고 했으니, 실제 주소는 0x13000c에 있을거라고 예측할 수 있을겁니다 (역주: 0x13000c = 0x12e000 + 0x200c).
그러니 GDB에게 물어봅시다:
(gdb) p &myglob
$1 = (int *) 0x13000c
완벽하군요! 하지만 myglob를 참조하는 ml_func의 코드는 어떨까요? 다시 GDB에게 물어봅시다:
(gdb) set disassembly-flavor intel
(gdb) disas ml_func
Dump of assembler code for function ml_func:
   0x0012e46c <+0>:   push   ebp
   0x0012e46d <+1>:   mov    ebp,esp
   0x0012e46f <+3>:   mov    eax,ds:0x13000c
   0x0012e474 <+8>:   add    eax,DWORD PTR [ebp+0x8]
   0x0012e477 <+11>:  mov    ds:0x13000c,eax
   0x0012e47c <+16>:  mov    eax,ds:0x13000c
   0x0012e481 <+21>:  add    eax,DWORD PTR [ebp+0xc]
   0x0012e484 <+24>:  pop    ebp
   0x0012e485 <+25>:  ret
End of assembler dump.
예상했던 대로, 재배치 항목(relocation entry)에 기술된 대로, myglobe를 참조하는 부분이 있는 모든 mov 명령어에 실제 myglob의 주소가 들어가있습니다.
함수 호출 재배치(relocation)하기

여기까지 이 글에서는 - 전역변수 myglob를 예로 들어 - 데이터 참조에 대한 재배치(relocation)에 대해 설명했습니다. 재배치(relocate)될 필요가 있는 다른 것들은 코드에 대한 참조인데요 - 다른 말로 하자면, 함수 호출이죠. 이 구역에서는 어떻게 이런 게 (함수 호출에 대한 재배치가) 이루어지는지 대략적으로 설명합니다. 독자 여러분께서 재배치(relocation)가 뭔지 이제 이해하셨을거라고 가정하고, 나머지 부분은 속도를 좀 높이겠습니다.
이것저것 더 말하기 전에 일단 해봅시다. 앞에 나왔던 공유 라이브러리 코드를 다음과 같이 수정했습니다:
int myglob = 42;

int ml_util_func(int a)
{
    return a + 1;
}

int ml_func(int a, int b)
{
    int c = b + ml_util_func(a);
    myglob += c;
    return b + myglob;
}
ml_util_func()가 추가되었고 이걸 ml_func() 에서 사용합니다. 아래는 링크된 공유 라이브러리의 ml_func()의 디스어셈블리입니다:
000004a7 <ml_func>:
 4a7:   55                      push   ebp                      # 역주: 함수 프롤로그
 4a8:   89 e5                   mov    ebp,esp
 4aa:   83 ec 14                sub    esp,0x14
 4ad:   8b 45 08                mov    eax,DWORD PTR [ebp+0x8]
 4b0:   89 04 24                mov    DWORD PTR [esp],eax
 4b3:   e8 fc ff ff ff          call   4b4 <ml_func+0xd>        # 역주: ml_util_func()
 4b8:   03 45 0c                add    eax,DWORD PTR [ebp+0xc]
 4bb:   89 45 fc                mov    DWORD PTR [ebp-0x4],eax
 4be:   a1 00 00 00 00          mov    eax,ds:0x0
 4c3:   03 45 fc                add    eax,DWORD PTR [ebp-0x4]
 4c6:   a3 00 00 00 00          mov    ds:0x0,eax
 4cb:   a1 00 00 00 00          mov    eax,ds:0x0
 4d0:   03 45 0c                add    eax,DWORD PTR [ebp+0xc]
 4d3:   c9                      leave
 4d4:   c3                      ret
여기서 흥미로운 부분은 0x4b3 주소에 있는 명령어입니다 - 이 명령어는 ml_util_func() 를 호출하죠. 해석해봅시다:
e8이 call(호출 명령어)의 명령코드(opcode)입니다. 이 호출의 인자는 다음 명령어까지의 상대적인 오프셋이죠. 위 디스어셈블리에서, 인자는 0xfffffffc 혹은 단순하게 보면 -4 입니다. 그러니까 함수 호출은 지금으로서는 자기 자신을 가리키고 있습니다. 이는 명백히 옳지 않은 내용이죠 - 그렇지만 재배치(relocation)가 아직 있다는 점을 잊으면 안 되겠죠. 공유 라이브러리의 재배치(relocation) 섹션은 이렇게 나옵니다:
$ readelf -r libmlreloc.so

Relocation section '.rel.dyn' at offset 0x324 contains 8 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00002008  00000008 R_386_RELATIVE
000004b4  00000502 R_386_PC32        0000049c   ml_util_func
000004bf  00000401 R_386_32          0000200c   myglob
000004c7  00000401 R_386_32          0000200c   myglob
000004cc  00000401 R_386_32          0000200c   myglob
[...] 일부 생략
이전에 readelf -r을 호출했던것과 비교해보면, 새로운 항목 ml_util_func가 생겼다는 걸 알아챌 수 있습니다. 주소 0x4b4에 있는 이 진입점(entry points)은 call(호출) 명령어의 인자이며, 종류는 R_386_PC32 입니다. 이 재배치(relocation) 종류는 R_386_32 보다 복잡하지만 그렇게 많이 복잡하지는 않습니다.
R_386_PC32는 다음을 의미합니다: 항목에 명시된 오프셋 위치에 있는 값을 가져온 뒤(역주: -4, call 명령어 e8의 상대적인 길이를 고려한 값), 기호(symbol)의 주소를 거기에 더해서(역주: 로드 시점에 결정, ml_util_func의 실주소), 오프셋 자체의 주소를 빼고(역주: 로드 시점에 결정, 재배치할 call 인자의 로드 시점 주소), 다시 원래 위치에 되돌려놓는 것입니다. 이 작업이 심볼의 최종 로드 주소와 재배치된 오프셋(relocated offset)을 이미 정확히 알 수 있는 로드 시점에 이루어진다는 걸 명심하세요. 이 최종 주소가 계산 절차에 끼어있습니다.
이게 무슨 동작을 하나요? 이건 자기 자신의 위치를 고려한 상대 주소입니다. 그래서 (call 명령어 e8 같이) 상대적인 주소를 필요로 하는 명령어들의 인자에 쓰기에 적합하죠. 실제 값을 보면 명확해질거라고 약속드립니다.
이제 재배치(relocation)의 실제 동작을 살펴보기 위해 드라이버 코드를 빌드해서 GDB에서 실행해볼 겁니다. 아래는 GDB 세션이고, 그 아래에 설명이 딸려나올 겁니다:
 $ gdb -q driver
 Reading symbols from driver...done.
 (gdb) b driver.c:31
 Breakpoint 1 at 0x804869e: file driver.c, line 31.
 (gdb) r
 Starting program: driver
 [...] skipping output
 name=./libmlreloc.so (6 segments) address=0x12e000
               header  0: address=  0x12e000       # 역주: (1)
                       type=1, flags=0x5
               header  1: address=  0x12ff04
                       type=1, flags=0x6
               header  2: address=  0x12ff18
                       type=2, flags=0x6
               header  3: address=  0x12e0f4
                       type=4, flags=0x4
               header  4: address=  0x12e000
                       type=1685382481, flags=0x6
               header  5: address=  0x12ff04
                       type=1685382482, flags=0x4

[...] 출력 생략
Breakpoint 1, main (argc=1, argv=0xbffff3d4) at driver.c:31
31    }
(gdb)  set disassembly-flavor intel
(gdb) disas ml_util_func
Dump of assembler code for function ml_util_func:
   0x0012e49c <+0>:   push   ebp        # 역주: (2)
   0x0012e49d <+1>:   mov    ebp,esp
   0x0012e49f <+3>:   mov    eax,DWORD PTR [ebp+0x8]
   0x0012e4a2 <+6>:   add    eax,0x1
   0x0012e4a5 <+9>:   pop    ebp
   0x0012e4a6 <+10>:  ret
End of assembler dump.
(gdb) disas /r ml_func
Dump of assembler code for function ml_func:
   0x0012e4a7 <+0>:    55     push   ebp
   0x0012e4a8 <+1>:    89 e5  mov    ebp,esp
   0x0012e4aa <+3>:    83 ec 14       sub    esp,0x14
   0x0012e4ad <+6>:    8b 45 08       mov    eax,DWORD PTR [ebp+0x8]
   0x0012e4b0 <+9>:    89 04 24       mov    DWORD PTR [esp],eax
   0x0012e4b3 <+12>:   e8 e4 ff ff ff call   0x12e49c <ml_util_func> # 역주: (3) 0x0012e4b3 다음이 0x12e4b4 입니다. e4 ff ff ff 부분
   0x0012e4b8 <+17>:   03 45 0c       add    eax,DWORD PTR [ebp+0xc]
   0x0012e4bb <+20>:   89 45 fc       mov    DWORD PTR [ebp-0x4],eax
   0x0012e4be <+23>:   a1 0c 00 13 00 mov    eax,ds:0x13000c
   0x0012e4c3 <+28>:   03 45 fc       add    eax,DWORD PTR [ebp-0x4]
   0x0012e4c6 <+31>:   a3 0c 00 13 00 mov    ds:0x13000c,eax
   0x0012e4cb <+36>:   a1 0c 00 13 00 mov    eax,ds:0x13000c
   0x0012e4d0 <+41>:   03 45 0c       add    eax,DWORD PTR [ebp+0xc]
   0x0012e4d3 <+44>:   c9     leave
   0x0012e4d4 <+45>:   c3     ret
End of assembler dump.
(gdb)
중요한 부분들은 이렇습니다:

드라이버에서의 출력을 보면, libmlreloc.so의 첫 번째 세그먼트(코드 세그먼트)가 0x12e000으로 매핑되었음을 확인할 수 있습니다 11.
ml_util_func은 0x0012e49c에 로드되었습니다.
재배치된(relocated) 오프셋의 주소는 0x0012e4b4 입니다.
ml_func에 있는 ml_util_func로의 call이 0xffffffe4를 가리키도록 수정되었습니다 (코드 자체의 hex값을 출력하려는 목적으로 /r 플래그를 넣어 ml_func을 디스어셈블했습니다.), 그리고 이 값은 ml_util_func을 가리키는 올바른 (상대)주소로 해석됩니다.

확실히 우리는 (4)가 어떻게 이렇게 되었는지가 가장 궁금합니다. 또 수학을 좀 할 때가 되었군요. 위에서 언급했던 대로 R_386_PC32 재배치 항목(relocation entry)를 해석하자면 이렇습니다:
항목에 명시된 오프셋 위치에 있는 값을 가져온 뒤 (0xfffffffc, 역주: call의 원래 인자), 기호(symbol)의 주소를 거기에 더해서 (0x0012e49c, 역주: (2)), 오프셋 자체의 주소를 빼고(0x0012e4b4, 역주: (3)), 다시 원래 위치에 되돌려놓는 것입니다. 모든 계산은 물론 2의 보수를 가정하여 이루어졌습니다. 결과값은 예상한 대로 0xffffffe4 이구요.
추가 내용: call의 재배치는 왜 필요한거죠?

이 부분은 Linux 에서의 공유 라이브러리 로딩 구현의 몇 가지 특성을 설명하는 "보너스" 섹션입니다. 원하는 게 재배치(relocation)가 어떻게 되는지를 이해하는 것 뿐이었다면, 이 구역은 넘어가도 무방합니다. (역주: 전 넘어갔습니다.)
When trying to understand the call relocation of ml_util_func, I must admit I scratched my head for some time. Recall that the argument of call is a relative offset. Surely the offset between the call and ml_util_func itself doesn't change when the library is loaded - they both are in the code segment which gets moved as one whole chunk. So why is the relocation needed at all?
Here's a small experiment to try: go back to the code of the shared library, add static to the declaration of ml_util_func. Re-compile and look at the output of readelf -r again.
Done? Anyway, I will reveal the outcome - the relocation is gone! Examine the disassembly of ml_func - there's now a correct offset placed as the argument of call - no relocation required. What's going on?
When tying global symbol references to their actual definitions, the dynamic loader has some rules about the order in which shared libraries are searched. The user can also influence this order by setting the LD_PRELOAD environment variable.
There are too many details to cover here, so if you're really interested you'll have to take a look at the ELF standard, the dynamic loader man page and do some Googling. In short, however, when ml_util_func is global, it may be overridden in the executable or another shared library, so when linking our shared library, the linker can't just assume the offset is known and hard-code it 12. It makes all references to global symbols relocatable in order to allow the dynamic loader to decide how to resolve them. This is why declaring the function static makes a difference - since it's no longer global or exported, the linker can hard-code its offset in the code.
추가 내용 #2: 실행 파일에서 공유 라이브러리 데이터 참조하기

다시 말씀드리자면, 이 부분은 고급 주제를 다루는 보너스 섹션입니다. 이런 내용이 지겹다면 넘어가도 무방합니다. (역주:는 넘어갔습니다.)
In the example above, myglob was only used internally in the shared library. What happens if we reference it from the program (driver.c)? After all, myglob is a global variable and thus visible externally.
Let's modify driver.c to the following (note I've removed the segment iteration code):
#include <stdio.h>

extern int ml_func(int, int);
extern int myglob;

int main(int argc, const char* argv[])
{
    printf("addr myglob = %p\n", (void*)&myglob);
    int t = ml_func(argc, argc);
    return t;
}

It now prints the address of myglob. The output is:
addr myglob = 0x804a018
Wait, something doesn't compute here. Isn't myglob in the shared library's address space? 0x804xxxx looks like the program's address space. What's going on?
Recall that the program/executable is not relocatable, and thus its data addresses have to bound at link time. Therefore, the linker has to create a copy of the variable in the program's address space, and the dynamic loader will use that as the relocation address. This is similar to the discussion in the previous section - in a sense, myglob in the main program overrides the one in the shared library, and according to the global symbol lookup rules, it's being used instead. If we examine ml_func in GDB, we'll see the correct reference made to myglob:
0x0012e48e <+23>:      a1 18 a0 04 08 mov    eax,ds:0x804a018

This makes sense because a R_386_32 relocation for myglob still exists in libmlreloc.so, and the dynamic loader makes it point to the correct place where myglob now lives.
This is all great, but something is missing. myglob is initialized in the shared library (to 42) - how does this initialization value get to the address space of the program? It turns out there's a special relocation entry that the linker builds into the program (so far we've only been examining relocation entries in the shared library):
$ readelf -r driver

Relocation section '.rel.dyn' at offset 0x3c0 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
08049ff0  00000206 R_386_GLOB_DAT    00000000   __gmon_start__
0804a018  00000605 R_386_COPY        0804a018   myglob
[...] skipping stuff

Note the R_386_COPY relocation for myglob. It simply means: copy the value from the symbol's address into this offset. The dynamic loader performs this when it loads the shared library. How does it know how much to copy? The symbol table section contains the size of each symbol; for example the size for myglob in the .symtab section of libmlreloc.so is 4.
I think this is a pretty cool example that shows how the process of executable linking and loading is orchestrated together. The linker puts special instructions in the output for the dynamic loader to consume and execute.
Conclusion
Load-time relocation is one of the methods used in Linux (and other OSes) to resolve internal data and code references in shared libraries when loading them into memory. These days, position independent code (PIC) is a more popular approach, and some modern systems (such as x86-64) no longer support load-time relocation.
Still, I decided to write an article on load-time relocation for two reasons. First, load-time relocation has a couple of advantages over PIC on some systems, especially in terms of performance. Second, load-time relocation is IMHO simpler to understand without prior knowledge, which will make PIC easier to explain in the future. (Update 03.11.2011: the article about PIC was published)
Regardless of the motivation, I hope this article has helped to shed some light on the magic going behind the scenes of linking and loading shared libraries in a modern OS.


1 for some more information about this entry point, see the section "Digression – process addresses and entry point" of this article.
2 Link-time relocation happens in the process of combining multiple object files into an executable (or shared library). It involves quite a lot of relocations to resolve symbol references between the object files. Link-time relocation is a more complex topic than load-time relocation, and I won't cover it in this article.
3 This can be made possible by compiling all your libraries into static libraries (with ar combining object files instead gcc -shared), and providing the -static flag to gcc when linking the executable - to avoid linkage with the shared version of libc.
4 ml simply stands for "my library". Also, the code itself is absolutely non-sensical and only used for purposes of demonstration.
5 Also called "dynamic linker". It's a shared object itself (though it can also run as an executable), residing at /lib/ld-linux.so.2 (the last number is the SO version and may be different).
6 If you're not familiar with how x86 structures its stack frames, this would be a good time to read this article.
7 You can provide the -l flag to objdump to add C source lines into the disassembly, making it clearer what gets compiled to what. I've omitted it here to make the output shorter.
8 I'm looking at the left-hand side of the output of objdump, where the raw memory bytes are. a1 00 00 00 00 means mov to eax with operand 0x0, which is interpreted by the disassembler as ds:0x0.
9 So ldd invoked on the executable will report a different load address for the shared library each time it's run.
10 Experienced readers will probably note that I could ask GDB about i shared to get the load-address of the shared library. However, i shared only mentions the load location of the whole library (or, even more accurately, its entry point), and I was interested in the segments.
11 What, 0x12e000 again? Didn't I just talk about load-address randomization? It turns out the dynamic loader can be manipulated to turn this off, for purposes of debugging. This is exactly what GDB is doing.
12 Unless it's passed the -Bsymbolic flag. Read all about it in the man page of ld.