.CH "Overview" .MH "Philosophy" .SH "Design Considerations" The design of the code generator (hereinafter referred to as VCG, for "V-mode Code Generator") was driven by a number of considerations: .sp .in +5 .ta 6 .ti -5 [bf .][tc]For experimental language translators, code generation should be fast and straightforward. This is necessary both for fast turnaround and ease of debugging in the development stage, and for fast turnaround in typical educational applications. .sp .ti -5 [bf .][tc]The VCG should insulate front-ends from details of storage allocation and data format selection, as well as instruction generation. This encourages inter-language compatibility at the object code level, as well as providing a framework for easily retargetable front-ends. .sp .ti -5 [bf .][tc]The intermediate form (IMF) that is processed by the VCG should be simple to generate and display (for debugging purposes). Furthermore, it should not unduly restrict extension for additional functionality or optimization. .sp .ti -5 [bf .][tc]The output object code should conform to Prime's current standards, and should include at least minimal provisions for separate compilation and run-time debugging. .sp .in -5 .SH "Implementation Approaches" After some time, consideration of the goals above led to the following approaches to the implementation of the VCG: .sp .in +5 .ta 6 .ti -5 [bf .][tc]The basic IMF handled by both the front end and the VCG should be a tree structure. A tree is easily generated from the information available on the semantic stack during a bottom-up parse, and can be generated directly without an explicit stack during a top-down parse. A number of operations like constant folding, reordering of operands of commutative operators, and global context propagation are readily performed on a tree structure. Furthermore, use of a tree can eliminate the need for generation and tracking of temporary variables in the front end. .sp .ti -5 [bf .][tc]The IMF operators should be close to the constructs used in an algorithmic language of the level of, say, Pascal. This permits straightforward translation of most algorithmic languages, and provides enough additional context to simplify many optimization tasks. For example, the IMF resembles the program's flow graph closely enough that simple global register allocation can be performed without graph reduction. .sp .ti -5 [bf .][tc]One of the basic functions of the VCG is the mapping of data descriptions supplied by the front end into physical storage layouts. The goal of complete machine data structure independence in the front end cannot be met without compromising the code generator's utility for languages that allow storage layout specification (C and Ada are notable examples). Therefore, the IMF should contain descriptions of data structures in terms of a small set of primitive data modes that can easily be parameterized in front-end code. Simple variables, structures, and arrays defined in the front end must be converted to single or multiple instances of the following basic machine data modes: 16-bit signed integer, 16-bit unsigned integer, 32-bit signed integer, 32-bit unsigned integer, 32-bit floating point, and 64-bit floating point. .sp .ti -5 [bf .][tc]The IMF tree should be linearized and passed to the VCG as a stream of data in prefix Polish notation. The linearized form partly reflects the usual Software Tools methodology of expressing even complex data transformations as "filters." However, there are other advantages, particularly in storing and interpreting the IMF for debugging. Prefix Polish was chosen because it can be generated easily from the internal representation of the tree, and because it minimizes the amount of state information that must be explicitly maintained by both the front end and the VCG in order to output or input the IMF. .sp .ti -5 [bf .][tc]The final output of the VCG should be a stream of Prime Macro Assembly Language source code. Although the time required to assemble this source imposes a significant penalty on code generator performance, it appears to be unavoidable if the compiler writer is to be insulated from Prime's object code format. (In addition, Prime has scheduled object code format changes, and it would not be wise to invest heavily in the present format.) .in -5 .MH "Structure" The VCG "main loop" simply reads each module present on its input, rebuilds the tree represented by the input, transforms the tree to a linked list of machine instructions, performs register tracking optimizations on that list, and finally converts the list to assembly language and outputs it. .pp The input and output routines are straightforward and relatively uninteresting. .pp The optimization routines amount to about 13 pages of Ratfor code, and work by simulating the effect of each machine instruction on the contents of the six registers that are tracked. At the moment, three types of optimization are performed: redundant loads are eliminated, some memory references are eliminated in favor of register-to-register transfers, and general instruction sequences are replaced with special-case code. .pp The heart of the code generator is the set of transformation routines that convert the tree representation to the doubly-linked list of machine instructions. The transformation routines exhibit a great deal of knowledge about the machine architecture, but actually employ only very simple algorithms for code generation. .pp IMF operators may appear in one of several "contexts," identified internally by the following terms: .sp .in +5 [bf Reach]. An operator evaluated in reach context yields the address of a word in memory containing the result of the operation, if possible. At present, only the object, constant, pointer dereferencing, array indexing, and structure member selection operators yield addresses. All other operators behave as if they were evaluated in "load" context. .sp [bf Load]. An operator evaluated in load context yields a value in a machine register. The particular register used depends only on the basic machine data mode of the operation. Most IMF operators are evaluated only in this context. .sp [bf Void]. An operator evaluated in void context yields side effects only. In a very few cases, this results in an opportunity to exploit special-case machine instructions that perform some calculation without making the result available in a register (incrementing a memory location, for example). .sp [bf Flow]. An operator evaluated in flow context yields a change in flow-of-control rather than a value. For example, a "test for equality" operator would return 1 or 0 in a load context, but in flow context would cause a jump to a given label depending on the outcome of the test. .sp [bf AP]. An operator evaluated in AP context yields an "argument pointer" rather than a value. Argument pointers are used to pass parameters to procedures. .sp .in -5 .pp Context information is propagated top-down by the code generator as it scans the IMF tree. Additional information in the form of register requirements is propagated from the bottom up during the same scan. Together, context and register usage determine with fair accuracy the optimal code sequence to be generated for a given operator. .MH "Input/Output Semantics" .SH "Input Structure" The IMF passed to the VCG consists of a sequence of [ul modules]. A module is a sequence of procedure definitions, static data definitions, and entry point declarations. The static data definitions build a data area that is shared by all procedures in the module, while the procedure definitions build code and data areas that are strictly local to each procedure, and the entry point declarations make the static data area or procedures visible to Prime's link editor. .pp Prime's Fortran compilers currently generate code that is equivalent to one procedure per module under this scheme; Prime's PL/I and Pascal compilers generate code that is equivalent to a single IMF module. The VCG module structure permits compatibility with either of these alternatives, as well as compromise forms that are more suitable for other languages. .bq Note: Separate compilation capability directly affects module structure. At present, there is no way for separately compiled procedures to share a static data area. Furthermore, separately-compiled static objects must be referenced by a unique 8 or fewer character name made visible to the loader. A Fortran COMMON block definition can be used to reduce the number of such external symbols, but COMMON definitions must match exactly in all separately-maintained modules. In addition, note that Prime's current loader software requires that external objects be referenced through an indirect address, which can cause a significant reduction in performance. .eq .pp Each [cu static data definition] allocates space for an object and may specify an initial value for the object. A [cu static data declaration] names an object that is defined outside the current module, but provides no other information about the object. .pp Each [cu procedure definition] consists of information associated with a closed routine defined by the front end. In particular, the procedure's argument types and code tree are included. .pp The bulk of the IMF will be in subtrees defining the code associated with procedures. Most storage allocators, arithmetic operators, and flow controllers are straightforwardly expressed in tree form; a description of these IMF components is available elsewhere. .SH "Output Structure" Each VCG input module generates a single PMA input module, terminated by an END pseudo-op. The PMA input module may be assembled, link-edited, and subsequently executed. The concatenation of all static data definitions and declarations forms a [cu link frame] that is shared by all procedures in the module. Each procedure definition yields an entry control block (ECB) and a chunk of machine code that implements the function of the procedure, including the allocation of space in the procedure's [ul stack frame] for local variables. .EV .fo //- # -//