CSP301
Linking And Compiling
Doxygen
& Latex
CVS
REVISION
The Compiler Action:
-
gcc a.c b.c translates to
- Running preprocessor on input
to generate the intermediate code
- cpp [options] a.c(the
input file) /tmp/a.i(output file ;.i output format with -save-temps
option)
- Compiling this code to
assembly form
- cc1 [options]
/tmp/a.i -o /tmp/a.s(assembly format can be genetrated with -S
option as well)
- Assembling the code to
generate object files
- as [options] /tmp/a.s -o
/tmp/a.o
Anything after preprocessing
stage is architecture dependent.
- Same happens with b.c
and we obtain two object files that are linked by the linker to get the
final a.out
- ld [options] /tmp/a.o
/tmp/b.o -o a.out
Object Files:
Object files contain the binary representation of code and data
from source files -along with information needed for relocation and
linking.
'nm' that we saw in the class is a tool to see this symbol
information while 'objdump' can be used to obtain a lot more
information about the
object files. (Use man pages to see the plethora of options
and information that come with these two tools)
Type of object files
- relocateable (ex a.o b.o
above) but not executable --as we saw above they can be linked to
give executable files
- executable (a.out from the
linking stage), these have all their symbols resolved unless they come
from standard unix libraries.
In addition there are shared object files
that can be added and relocated at link time or load time. We will soon
learn about the special advantages from the same.
CLASS III
Linking :
Everything compiled as object (prior to
the linking stage) has
- addressing starting
with zero as base , so all address have been calculated relative
to it
- whole lot of unresolved
symbols
Intentions at link time are to resolve as
many undefined symbols as possible.
The types of
symbols that we can possible have are:
- Global symbols that the
object file defines and exports for others to use. - these map to
global variables and non-static function definitions.
- Global symbols that the
object file references but aren't with it. - the C decleration of
extern falls within this catagory.
- Local symbols
exclusively accessed by the object file. -these include the static functions
and variables.
The linking process proceeds with these
symbols as followed:
- Local symbols: No problems
since they havbe to be uniquely defined inside the file or compilation
process will complain.
- Global symbols that are
exported by different relocatable files: Here we need a unique map from
the undefined symbols in different object files to one that defines the
symbol. If there are mutiple maps linker complains about multiple
definitions.
To simplify things lets consider linking
process for a static binary in very simple words.
As we scan the
files specified to be included in compilation we build a pool each
for undefined , defined symbols. Keep resolving
undefined symbols in the
pool and
addending to the defined pool as new defined symbols are
encountered. Do it for all the files being linked or any static
libraries being compiled against.
At the end of
it all if the pool of undefined symbols has shrunk to zero we have a
successful attempt as symbol resolution.
Programming Tips: If
you get errors reporting redecleration of variables in files of type
(/tmp/xxxxxxxx) this could be a possible reason.
From the way this is done it is apperent that static libraries must be
placed at the end. Special care must be taken when cyclic dependencies
exist between the
libraries.
Linking also
requires relocation since all the object files have addresses
calculated relative to zero.
The steps
involved are:
- Merging of similar sections
from object files, assigning new runtime addresses to these sections,
recreation of symbol tables with new addresses.
- Recalculation of symbol
referneces in .code and .data segement.
Have a look at 'objdump -r a.o' we
will see a set of relocation records generated in relocatable object
files that are used now to set the entry for any
undefined reference right.
Working with Shared Libraries:
Advantages:
- Can be linked even at load
time
- Can be shared among many
binaries
- Smaller binaries or
executables.
- Lazy loading : we can load a
shared object into memory only when needed.
Command format : gcc
-shared(make shared) -fPIC (standing for Position independent code) -o
libuseless.so(output lib) a.o b.o
Linking to the shared
library:
- Statically linked : the
addresses to code and data in the libraries are bound to at runtime
only loading at runtime.
- Dynamically linked :
both linking and loading are deferred till link time
Imagine a situation when the library
changes since the time of compilation. In that case the addresses in
the library may change causing the binary to be broken
So only minor changes in
the libaray are possible with relinking in case of statically linked
shared files.
On the other hand dynamic
linking of shared libraries at runtime causes a cost to be
incurred at every run of the program , but this allows for quite
change in libraries
without any
hassles of relinking you executables agains them. But this
demands strict adherence to API on the part of executable programmers
and library programmers
as this may cause
unforseen changes in the behavious of the executables.
Tools to plav with:
gcc, ar, objdump , nm,
ldd( use it on any binary to see what all libraries it is linked
against),strip(to strip off symbol table information-- use it only on
linked object files)
Moving On: Using Make
Lets get back to the
bussiness of code organization again.
As you might remember
there were a sequence of steps that we followed to get an executable.
Remembering them and performing them in order may give headaches to
programmers. You can
device many ways to perform the same sequence of operations by a single
command.
Some one got a noble idea
about checking the files for update and recompiling only those
files that get effected by the changes made. This can save a lot of
compilation time
for large projects.
This notion got formalized
into the 'make' utility which is a framework to do just what we said
above. It works on the principle of dependency that must be satisfied
to
complete a certain target.
The utility parses a file named 'makefile' or 'Makefile by default.
For example:
target :
source
file(s)
command (must be
preceded by a tab)
The directive here says that the target
file shall be compiled only if its source files are modified. Here 'command
' represents all the action that
must be taken in that order to
fulfill the target directive. (
do 'info make ' to obtain all the
information on make and the format of makefiles)
Make process is recursive, i.e. is
execution of one directive depends on another then the other
directive is satisfied before execution of first occurs.
Make is capable of
perfoming non-compilation tasks ( and is heavily used for installing
executables). (clean etc)
Make supports
regular expressions ,implicit rule, macros etc. ( *.c , %.o : %.c)
Make supports
variables and creation of environment ($CC=gcc , $VPATH
=.....(search path for target source files)....)
Programming Tips: Its
good to see the use of the default variables they make the task
of making general makefiles easier.
Phony directive should be used whenever the
target isn't a files but just a comand to be executed.
-
Thanks --
Avinash