Understading the different phases in C Program Compilation

Understading the different phases in C Program Compilation

C_Compilation_System

Here in this article we will try to get some understanding on the different phases through which a C program goes through until we get a final executable file as our output.

Test Environment

Fedora 37 workstation
gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)

What is C Programming Language

C Programing language is a complier based programing language. In this type of programming language we need to transform our source code which is mostly written in plain text in a text editor using the ASCII character set into a format which the machine can understand. This format in which the computer system or a machine can execute a program is called machine instruction set code which are packed within a binary executable file.

We will be using a basic C Program as shown below which display the “Hello World!!” string as an output when this program is executed.

[admin@fedser]$ cat hello.c 
#include <stdio.h>

int main(void)
{
    printf("Hello World!!\n");
    return 0;
}

The C Program Compilation System consist of the below phases through which the source code get processed and transformed into a machine instruction set code which a compute system can understand.

Image credit from Computer Systems A Programmer’s Perspective book
C Compilation System Phases
Pre-processor
Compiler
Assembler
Linker

Let us explore on these different phases of Compiling a C Program in the step by step procedure outline below.

If you are interested in watching the video. Here is the YouTube video on the same step by step procedure outlined below.

Procedure

Step1: Preprocessing Phase

Here in this phase the original C Program is modified by inserting the code relaetd any header files that were mentioned in the C Program. In our case it will insert the code related to stdio.h header file which is the first line in our source code. Let us try to use the gcc utility to preprocess our source code and save the output into a file named hello.i as shown below. The option that we need to use with the gcc utility to only preprocess a C program file is “E”.

[admin@fedser]$ gcc --help | grep -i "preprocess only"
  -E                       Preprocess only; do not compile, assemble or link.

[admin@fedser]$ gcc -E hello.c -o hello.i

Here is the sample output of the hello.i file.

[admin@fedser]$ cat hello.i 
# 0 "hello.c"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "hello.c"
# 1 "/usr/include/stdio.h" 1 3 4
...
extern char *ctermid (char *__s) __attribute__ ((__nothrow__ , __leaf__))
  __attribute__ ((__access__ (__write_only__, 1)));
# 867 "/usr/include/stdio.h" 3 4
extern void flockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));



extern int ftrylockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)) ;


extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));
# 885 "/usr/include/stdio.h" 3 4
extern int __uflow (FILE *);
extern int __overflow (FILE *, int);
# 909 "/usr/include/stdio.h" 3 4

# 2 "hello.c" 2


# 3 "hello.c"
int main(void)
{
    printf("Hello World!!\n");
    return 0;
}

Step2: Compilation Phase

This is the phase in which the modified source code from above step is now transformed into a assembly language program code. Different Compilers for the different programming language transform their source code into the same assembly programming language code. Let us try to use the gcc utility to compile our source code and save the output into a file named hello.s as shown below. The option that we need to use with the gcc utility to only compile a C program file is “S”.

[admin@fedser]$ gcc --help | grep -i "compile only"
  -S                       Compile only; do not assemble or link.

[admin@fedser]$ gcc -S hello.i -o hello.s

Here is the output of the hello.s file.

[admin@fedser]$ cat hello.s
	.file	"hello.c"
	.text
	.section	.rodata
.LC0:
	.string	"Hello World!!"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	$.LC0, %edi
	call	puts
	movl	$0, %eax
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (GNU) 12.2.1 20221121 (Red Hat 12.2.1-4)"
	.section	.note.GNU-stack,"",@progbits

Step3: Assembly Phase

In this phase the Assembler translates the Assembly language code stored in hello.s file into a machine language instruction set. These machine language instruction set are packaged into a form known relocatable object program which is a binary file. Let us try to use the gcc utility to translate our assemble language source code into machine language instruction and save the output into a file named hello.o as shown below. The option that we need to use with the gcc utility to assemble a C program file is “C”.

[admin@fedser]$ gcc --help | grep -i "compile and assemble"
  -c                       Compile and assemble, but do not link.

[admin@fedser]$ gcc -c hello.s -o hello.o

Step4: Linker Phase

In our C program we use a printf function which is actually provided by a standard C library. The printf function resides in a separate precompiled object file called printf.o, which
must somehow be merged with our hello.o program. This task is carried out by the Linker which merges the different binary files (ie. files with .o extension) and build a executable object file that can be loaded into main memory and executed by the computer system.

In order to link the multiple binary object file we need to pass all those file names to the gcc utility and output to a different file (ie. executable file) which merges the different binary object files as shown below.

We are not mentioning the printf.o binary file because its a standard object file provided by C standard library and linked is able to automatically merge it from the standard paths where its available with out source binary file.

[admin@fedser]$ gcc hello.o -o hello

Step5: Execute the Program

Now our executable file is ready to be loaded into memory from the shell program and get it executed as shown below.

[admin@fedser]$ ./hello 
Hello World!!

Hope you enjoyed reading this article. Thank you..