STAGE 1: Preprocessing

Command: gcc -E hello.c -o hello.i

What happens:

#include <stdio.h> gets replaced by the entire contents of stdio.h, thousands of lines of function declarations, type definitions, macros. printf is not defined yet, just declared. The preprocessor doesn’t know what printf does, just that it exists.

Output: hello.i, pure C code, no # directives, just expanded text. Roughly 800 lines for this tiny program.

// ... thousands of lines from stdio.h ...
extern int printf(const char *__restrict __fmt, ...);
// ...
int main() {
printf("hello world");
return 0;
}

STAGE 2: Compilation (C → Assembly)

Command: gcc -S hello.i -o hello.s

What happens inside the compiler:

Step 2a: Lexing (Tokenization)

Source text gets broken into tokens:

int  main  (  )  {  printf  (  "hello world"  )  ;  return  0  ;  }
KW   IDENT  (  )  {  IDENT   (  STRING         )  ;  KW      NUM ;  }`

Step 2b: Parsing (Tokens → AST)

Tokens become an Abstract Syntax Tree:

FunctionDecl: main
└── CompoundStmt
    ├── CallExpr: printf
    │   └── StringLiteral: "hello world"
    └── ReturnStmt
        └── IntLiteral: 0`

Step 2c: Semantic Analysis

  • Is printf declared? Yes, from stdio.h
  • Are argument types correct? Yes, const char* matches string literal
  • Is return type correct? Yes, int main returns 0

Step 2d: IR Generation (AST → LLVM IR)

@str = private constant [12 x i8] c"hello world\00"

define i32 @main() {
entry:
%call = call i32 @printf(ptr @str)
ret i32 0
}

declare i32 @printf(ptr, ...)

Notice:

  • "hello world" becomes a global constant in .rodata
  • printf is declared not defined, it’ll be resolved by linker later
  • main returns i32 0

Step 2e: Optimization Passes

On the IR, LLVM runs passes:

  • Dead code elimination
  • Constant propagation
  • Inlining (printf might get inlined if it’s simple enough)

Step 2f: Backend — IR → Assembly

.section .rodata
.LC0:
.string "hello world"


    .text
    .globl  main


main:
pushq   %rbp              ; save frame pointer
movq    %rsp, %rbp        ; set up stack frame
leaq    .LC0(%rip), %rdi  ; arg1 = address of "hello world"
call    printf            ; call printf
movl    $0, %eax          ; return value = 0
popq    %rbp              ; restore frame pointer
ret                       ; return

Output: hello.s, human readable assembly text

STAGE 3: Assembly (Assembly → Object File)

Command: as hello.s -o hello.o

What happens:

Each assembly instruction → binary encoding. 32 or 64 bit words. But call printf, the assembler doesn’t know where printf lives. So it:

  • Emits placeholder bytes for the address
  • Creates a relocation entry: “at this offset, patch in the address of printf

Output: hello.o, ELF object file

readelf -S hello.o

Sections inside:

.text    ← your compiled main() binary
.rodata  ← "hello world\0" string
.rela.text ← relocation: "patch printf address here"
.symtab  ← symbols: main defined, printf undefined`

STAGE 4: Linking

Command: ld hello.o -lc -o hello (gcc does this automatically)

What happens:

Linker takes:

  • hello.o, your code
  • libc.so or libc.a, C standard library (contains printf definition)
  • crt0.o / crtbegin.o, C runtime startup code

Resolves symbols:

  • printf found in libc → patches the relocation entry with real address

Produces final executable layout:

Program Headers (segments)
.text    ← all code merged
.rodata  ← all read-only data
.data    ← initialized globals
.bss     ← uninitialized globals
.dynamic ← dynamic linking info (if shared)
.got     ← Global Offset Table
.plt     ← Procedure Linkage Table`

Output: hello executable ELF binary

STAGE 5: OS Loading, You Type ./hello

Step 5a: Shell forks

shell calls fork() → creates child process
child calls execve("./hello", ...) → OS takes over`

Step 5b: OS reads ELF header

Magic: 7f 45 4c 46  ← "ELF" magic bytes
Class: 64-bit
Entry point: 0x401050  ← address of _start`

Step 5c: OS maps segments into virtual memory

mmap .text  → virtual address 0x401000, read+execute
mmap .rodata → virtual address 0x402000, read only
mmap .data  → virtual address 0x403000, read+write
mmap stack  → top of address space, read+write, grows down
mmap heap   → above .data, read+write, grows up (empty now)`

Step 5d: Dynamic linker runs (ld.so)

Since printf is in libc.so (shared library):

  • OS loads ld.so first
  • ld.so reads .dynamic section
  • Finds dependency: libc.so.6
  • Maps libc into process address space
  • Resolves printf symbol → patches GOT entry

Step 5e: _start runs (CRT — C Runtime)

This is code you never wrote — it comes from crt0.o:

asm

  _start:
    xor  %rbp, %rbp        ; mark outermost frame
    mov  %rsp, %rdi        ; argc
    call __libc_start_main ; sets up environment
                           ; calls main()
                           ; calls exit() when main returns`

Step 5f: main() runs

Stack frame created:

│ return addr  │  ← address in _start to return to
├──────────────┤  ← RBP (frame pointer)
│ saved RBP    │
├──────────────┤
│  (locals)    │  ← none in this program
└──────────────┘  ← RSP`

leaq .LC0(%rip), %rdi, loads address of “hello world” into first argument register.

call printf:

  • CPU pushes return address onto stack
  • Jumps to printf in libc
  • printf calls write(1, "hello world", 11)- syscall
  • CPU switches to kernel mode
  • Kernel writes bytes to stdout file descriptor
  • Returns to userspace
  • printf returns

ret- pops return address, jumps back to _start.

Step 5g: exit(0) called by _start

  • Flushes stdio buffers
  • Calls atexit handlers
  • exit_group(0) syscall
  • OS reclaims all memory, closes file descriptors
  • Process is gone