This is the last stage of the compiler. The input code generator is an intermediate representation of the source program, while the output is the equivalent target program.
The position of the code generator in the compiler is as follows:
- Code generator
- Symbol table
- Code optimizer
- Front end
Code generator requirements:
The output code must be correct and of high quality, that is, it must use the resources of the target machine effectively
must be efficient.
CODE GENERATOR DESIGN ISSUES
Although it really depends on the target machine and operating system, the important problems in code generation are:
- Input for code generator
- Program targets
- Memory Management
- Instruction Selection
- Register Allocation
- Selection Order Evaluation
- Code Generator Approach
INPUT CODE GENERATOR
The input consists of an intermediate representation of the source program generated by the front-end, along with information in the symbol table that is used to determine the run time address of the data object in the intermediate representation.
It is assumed that before code generation, the front-end has to scan, parse, & translate the source program into a detailed intermediate detail representation.
In addition, type checking is also carried out on the above assumptions, so that type-con-version operators can be inserted where necessary and semantic errors have been detected.
Target program by the code generator that meets, the form is:
- Absolute Machine Language
- Relocatable Machine Language
- Assembly Language
Absolute machine language: has the advantage that it can be placed in a fixed location in memory and can be executed directly.
a number of “student-job” compilers,
like WATFIV and PL/C
Relocatable Machine Language (Object Module): allows subprograms to be compiled separately (flexibility). A set of object modules can be linked/loaded together on linking-loader execution (requires extra effort)
Assembly Language: the advantage is that there is an easy code generation process for generating symbolic instructions and using macro facilities from the assembler to generate code.
Mapping names to addresses of data objects in run-time memory that runs together with the front-end and code generator.
It is assumed that the three-address statement references the name to the symbol-table entry.
The type in the declaration specifies the width (amount of storage) required for the declared name.
Uniformity and completeness are important factors, as are instruction speed and machine idioms.
Three-address statement, of the form:
x := y + z
where : x, y, z allocated statically can be translated into the following code sequence:
MOV y, R0/* load y to register R0 */
ADD z, R0/* add z to R0 */
MOV R0, x /* store R0 into x */
However, statement-by-statement often results in bad code. Example of a sequence of statements
a := b + c
d := a + e
can be translated into:
MOV b, R0
ADD c, R0
MOV R0, a
MOV a, R0
ADD e, R0
MOV R0, d
Here the third and fourth statements will be redundant, as will the third statement if a is not used sequentially.
Instructions involving register operands are usually shorter and faster than those involving operands in memory.
Therefore, efficient use of registers is very important to generate good code.
The use of registers is divided into 2 sub-problems:
1. During register allocation, the variables that will remain in the register are selected at some point in the program
2. During register assignment, a special register is taken where the variable will be assigned.
Finding the optimal assignment of registers to variables is difficult, even with single registers.
Mathematically, this problem is NP-Complete.
SELECTION OF EVALUATION ORDER
The order of computations can affect the efficiency of the target code. Some computational sequences require fewer registers to hold intermediate results than others. Taking the best order is the NP-Complete Problem.
CODE GENERATOR APPROACH
The most important criterion is to produce good code.
The design objectives are:
easy to implement
easy to test
easy to maintain