Unix Tutorial


What is Unix?

Getting started

Hardware

Software

Basic Commands

vi Commands

 

Files

File Commands

Different types of file

File Redirection

Permissions

Pipelines / Filters

 

Directory

Directory Structure

Directory Command

Shell

Different types of shell

Compiling Program

Networking

Commands

Reference

Reference Commands

Online help : Manpages

Useful links/Books

 

Search Engine

Compiling Programs


Compiling C Programs:

  • Compiling single source C Programs

The easiest case of compilation is when you have all your source code set in a single file.Let us illustrate with an example test.c program:

#include<stdio.h>

int main(int argc, char* argv[]) {

printf ("hello world\n");

return 0; }

There is a file named 'test.c' that we want to compile. We will do so using a command line similar to this:

%gcc test.c

We are using a GNU compiler, you'll write 'gcc' instead. If you program complies well you should get a file 'a.out' as a result.

Suppose that you want the resulting program to be called "test_output". In that case, you could use the following line to compile it:

%gcc test.c -o test_output

Here -o flag indicates name of the resulting executable file(test_output).

  • Running the resulting program

    Once we created the program, we wish to run it. This is usually done by simply typing its name, as in: test_output.

When a file is created in the system, it is immediately given some access permission flags. These flags tell the system who should be given access to the file, and what kind of access will be given to them.Unix systems use 3 kinds of entities to which they grant (or deny) access: The user which owns the file, the group which owns the file, and everybody else. Each of these entities may be given access to read the file ('r'), write to the file ('w') and execute the file ('x'). Now, when the compiler created the program file for us, we became owners of the file. Normally, the compiler would make sure that we get all permissions to the file - read, write and execute.Now we'll surely be able to run our program.

Note too that you cannot just move the file to a different computer an expect it to run - it has to be a computer with a matching operating system (to understand the executable file format), and matching CPU architecture (to understand the machine-language code that the executable file contains). Finally, the run-time environment has to match. For example, if we compiled the program on an operating system with one version of the standard C library, and we try to run it on a version with an incompatible standard C library, the program might crush, or complain that it cannot find the relevant C library. This is especially true for systems that evolve quickly (e.g. Linux with libc5 vs. Linux with libc6).

  • Creating Debug-Ready Code

Normally, when we write a program, we want to be able to debug it - that is, test it using a debugger that allows running it step by step, setting a break point before a given command is executed, looking at contents of variables during program execution, and so on. In order for the debugger to be able to relate between the executable program and the original source code, we need to tell the compiler to insert information to the resulting executable program that'll help the debugger. This information is called "debug information".

In order to add that to our program, lets compile it differently:

%gcc -g test.c -o test_output

The '-g' flag tells the compiler to use debug information, and is recognized by mostly any compiler out there. You will note that the resulting file is much larger then that created without usage of the '-g' flag. The difference in size is due to the debug information.

We may still remove this debug information using the strip command, like this:

% strip test_output

You'll note that the size of the file now is even smaller then if we didn't use the '-g' flag in the first place. This is because even a program compiled without the '-g' flag contains some symbol information (function names, for instance), that the strip command removes.

  • Compiling a multi-source C Programs

You will see that having all the source in a single file is rather limiting, for several reasons: As the file grows, compilation time tends to grow, and for each little change, the whole program has to be re-compiled. It is very hard, if not impossible, that several people will work on the same project together in this manner. Managing your code becomes harder. Backing out erroneous changes becomes nearly impossible. The solution to this would be to split the source code into multiple files, each containing a set of closely-related functions (or, in C++, all the source code for a single class).

There are two possible ways to compile a multi-source C program.

  1. The first is to use a single command line to compile all the files. Suppose that we have a program whose source is found in files "main.c", "a.c" and "b.c".

We could compile it this way:

%gcc main.c a.c b.c -o hello_world

This will cause the compiler to compile each of the given files separately, and then link them all together to one executable file named "hello_world". Two comments about this program: If we define a function (or a variable) in one file, and try to access them from a second file, we need to declare them as external symbols in that second file. This is done using the C "extern" keyword. The order of presenting the source files on the command line may be altered. The compiler (actually, the linker) will know how to take the relevant code from each file into the final program, even if the first source file tries to use a function defined in the second or third source file.

The problem with this way of compilation is that even if we only make a change in one of the source files, all of them will be re-compiled when we run the compiler again.

In order to overcome this limitation, we could divide the compilation process into two phases - compiling, and linking. Lets first see how this is done, and then explain:

%gcc -c main.cc

%gcc -c a.c

%gcc -c b.c

%gcc main.o a.o b.o -o hello_world

The first 3 commands have each taken one source file, and compiled it into something called "object file", with the same names, but with a ".o" suffix. It is the "-c" flag that tells the compiler only to create an object file, and not to generate a final executable file just yet. The object file contains the code for the source file in machine language, but with some unresolved symbols. For example, the "main.o" file refers to a symbol named "func_a", which is a function defined in file "a.c". Surely we cannot run the code like that. Thus, after creating the 3 object files, we use the 4th command to link the 3 object files into one program. The linker (which is invoked by the compiler now) takes all the symbols from the 3 object files, and links them together - it makes sure that when "func_a" is invoked from the code in object file "main.o", the function code in object file "a.o" gets executed. Further more, the linker also links the standard C library into the program, in this case, to resolve the "printf" symbol properly. To see why this complexity actually helps us, we should note that normally the link phase is much faster then the compilation phase. This is especially true when doing optimizations, since that step is done before linking. Now, lets assume we change the source file "a.c", and we want to re-compile the program. We'll only need now two commands:

%gcc -c a.c

%gcc main.o a.o b.o -o hello_world

In our small example, it's hard to notice the speed-up, but in a case of having few tens of files each containing a few hundred lines of source-code, the time saving is significant; not to mention even larger projects

Getting a Deeper Understanding - Compilation steps

Now that we've learned that compilation is not just a simple process, lets try to see what is the complete list of steps taken by the compiler in order to compile a C program.

Driver -

what we invoked as "cc". This is actually the "engine", that drives the whole set of tools the compiler is made of. We invoke it, and it begins to invoke the other tools one by one, passing the output of each tool as an input to the next tool.

C Pre-Processor -

normally called "cpp". It takes a C source file, and handles all the pre-processor definitions (#include files, #define macros, conditional source code inclusion with #ifdef, etc.) You can invoke it separately on your program, usually with a command like:

%gcc -E single_source.c

The C Compiler -

normally called "cc1". This is the actual compiler, that translates the input file into assembly language. As you saw, we used the "-c" flag to invoke it, along with the C Pre-Processor, (and possibly the optimizer too, read on), and the assembler.

Optimizer -

sometimes comes as a separate module and sometimes as the found inside the compiler module. This one handles the optimization on a representation of the code that is language-neutral. This way, you can use the same optimizer for compilers of different programming languages.

Assembler -

sometimes called "as". This takes the assembly code generated by the compiler, and translates it into machine language code kept in object files. With gcc, you could tell the driver to generated only the assembly code, by a command like: cc -S single_source.c

Linker-Loader -

This is the tool that takes all the object files (and C libraries), and links them together, to form one executable file, in a format the operating system supports. A Common format these days is known as "ELF". On SunOs systems, and other older systems, a format named "a.out" was used. This format defines the internal structure of the executable file - location of data segment, location of source code segment, location of debug information and so on. As you see, the compilation is split in to many different phases. Not all compiler employs exactly the same phases, and sometimes (e.g. for C++ compilers) the situation is even more complex. But the basic idea is quite similar - split the compiler into many different parts, to give the programmer more flexibility, and to allow the compiler developers to re-use as many modules as possible in different compilers for different languages (by replacing the preprocessor and compiler modules), or for different architectures (by replacing the assembly and linker-loader parts).

Compiling Single-Source C++ Program

All we need to do is use a C++ compiler, in place of the C compiler . So, if our program source is in a file named 'test.cpp' .

#include<iostream.h>

int main(int argc, char* argv[]) {

cout << "hello world" << endl;

return 0; }

We will use a command such as the following

% g++ test.cpp -o test_output