.CH "Conversion" The Georgia Tech C compiler is based on the specifications contained in [ul The C Programming Language] by Kernighan and Ritchie. However the C compiler environment is not totally compatible with the Unix C implementation. Simulation of a Unix environment under Primos can be done only with an unreasonable loss of performance. Therefore, Unix C programs require some conversion to execute on Prime systems. (Programs that depend intimately upon the Unix process mechanism or the Unix file system layout are more difficult to convert. Likewise, programs that make heavy use of Unix inter-process 'signal' interfaces will be difficult to convert.) .# .MH "C Program Checker" There exist the beginnings of a "C Program Checker" to flag possibly dangerous C program constructs when it encounters them; e.g. type mismatches. The "C Program Checker" can be called by using the "-y" option with 'cc', 'ccl', or 'ucc'. It currently reports on mismatched formal/actual parameters and misdeclared function return values. .MH "Incompatabilities With PDP-11 C" The C compiler is compatible with PDP-11 C where possible. The following list enumerates those features of the Georgia Tech C compiler which are not compatible with PDP-11 C. .SH "Include Statements" The compiler will complain about semicolons appearing at the end of include statements. .pp Note that the Georgia Tech C compiler automatically includes the standard definitions in "=cdefs=" so that the typical Unix-style "#include " is optional. The compiler will search for an include file starting with the the current working directory, through the directories listed with the "-I" compiler option in the order listed, and ending with the system include directory "=incl=". Use of angle brackets (e.g., ) rather than double quotes (e.g., "filename") in the include statement directs the compiler to skip the search of the current working directory. .# .pp .# We are not currently satisfied with location and contents of the .# standard "include" files. Suggestions for improvement will .# be appreciated. .# .SH "Pointers" It is currently not possible to make pointers and [bf int]s the same length. Pointers are 32 bits, [bf int]s are 16 bits. The compiler tries to warn of pointer truncation, but cannot always detect it. .pp If NULL pointers are to be passed as arguments, they must be of type pointer (e.g. you cannot pass 0 or 0L as a NULL pointer. Use the symbolic constant NULL which is defined in "=incl=/stdio.h" to be "(char*) 0"). .pp Pointers to dynamically linked objects cannot be compared. Pointers to dynamically linked objects (currently only functions are dynamically linked) are actually faulting pointers to character strings. At run time, these pointers are filled in with the correct linkage address (the links are "snapped") the first time the pointer is referenced indirectly. The C compiler must generate a constant pointer to each external object in each C object file. If relocatable files are linked together, during execution it is possible to have one file's constant pointer snapped, and the other's untouched. The object code generated by the compiler to compare these pointers does not reference through the pointers; it merely treats them as 32-bit integers. Because of this, comparisons of pointers to dynamically linked objects may give inconsistent results. A significant performance penalty would be required to guarantee consistent results in such a limited case. .# .SH "Program and Data Object Size Restrictions" No source file may require more than 65536 words of static data. The static data for each C source file is compiled into a single linkage frame, and the linkage frame size restriction is imposed by the system architecture. .pp If you do require very large data objects, you may be able to get around this restriction with some work. You must declare the data object as an [bf extern] and write a Fortran subroutine that declares the data object name as a common block. Then when accessing the contents of this large block you must somehow insure that an object [bf never] crosses a segment boundary (start it at the beginning of the next segment just as Fortran does). If you attempt to address an object (such as a [bf double]) across a segment boundary, part of your reference simply wrap around to the beginning of the segment you are trying to reference beyond. .pp No source file may require more than 65536 words of procedure text. The compiler generates all procedures in the same PMA (Prime Macro Assembler) module. Currently PMA restricts the module size to 65536 words. .pp No function may generate more than 65536 words of internal-format PMA (currently around 16K statements). This is a code-generator workspace restriction. It has only been encountered with output from YACC -- functions this huge are just not normally found around PDP-11s. (YACC is an LALR(1) parser generator. Its reads a BNF grammar, and produces a C function which will parse the grammar. This generated output has many large tables.) .SH "Functions" In C, all arguments are passed by value. In Georgia Tech C, as long as arguments match in type they are, in all outward appearances, also passed by value. However, the internal mechanism for parameter passing is different from Unix C and will give different side effects if arguments do not match in type and in number. .pp The Prime architecture maintains a stack for local variables and provides a 64V mode procedure call argument transfer primitive for passing pointers, but not data values. We have used this mechanism to take advantage of its speed. Therefore, pointers are passed by value, just as in Unix C, but data values are not passed by value; a pointer to the data value is passed into the stack frame of the called procedure; the data value is then copied into the local stack frame by the procedure initialization code. This scheme is transparent as long as there are no type mismatches. For this reason, an attempt to cast a pointer argument to a non-pointer type will fail. .pp A variable number of arguments can be used, but not in the same manner as in Unix. The strategy is to declare as many arguments as you will ever need (make them pointers so that the compiler does not try to copy them). You will actually ignore all but the first of these names in the function. This trick forces the compiler to leave enough room for your arguments in the procedure's local stack frame. When the function is called, you will find the first argument pointer at the address of the first argument, the second argument pointer at the address of the first argument plus 3, the third at the address of the first argument plus 6, etc. Note that because of software conventions, i.e., the procedure initialization code, functions that are declared with zero arguments must be called with exactly zero arguments; and functions that are declared with one or more arguments must [ul not] be called with zero arguments. .pp Programs that depend on the order of parameter evaluation will fail. .pp You cannot call a function with single precision floating point arguments nor can you ever expect a function to return a single precision floating point argument. Remember, C turns them into double precision. .pp If a structure is to be a return value, the compiler adds on an additional first argument through which it passes a pointer to a temporary area in the calling procedure for the return value. Needless to say, type or length mismatches could cause significant nastiness. .pp The side effects of type mismatches are quite predictable and can be useful for calling non-C procedures. For example, if you pass a non-pointer argument to a pointer argument, it will behave exactly as if a pointer had been passed (i.e. possibly allowing the supposed "value" argument to be modified). If you pass a pointer argument to a simple variable argument it behaves just like you passed the value of the argument instead. .pp Be wary of non-C routines which modify their arguments (particularly Subsystem routines like 'ctoi'); if you pass a constant, the "constant" might end up with a different value in it than it had before the routine was called! .SH "Arrays" Although it is possible to index outside of array bounds, doing so is very dangerous. In 64V mode, indexed instructions are much faster than 32-bit pointer arithmetic. As a consequence, the compiler generates 16-bit indexed instructions wherever possible. The only side effect of this performance improvement is that indexing outside the bounds of arrays may not give the expected results. .# .SH "Environment" .# No environment is available (look for one in Version 9 of Software Tools). .# In Software Tools, unlike Unix, the Shell's variables are not available .# outside the Shell. Therefore, the environment pointer [bf envp] that .# may be used on Unix is not implemented in the Georgia Tech C compiler. .SH "Identifiers --- Naming Restrictions" Because the C compiler originally generated symbolic assembly language which was then processed by PMA, the Prime Macro Assembler, variable and function names had to follow PMA's naming conventions which require that names begin with an alphabetic character. To achieve the necessary compatibility, variable and function names beginning with an underscore are prefixed with "z$". Even though 'vcg' now generates object code directly, this naming restriction is still in effect. .pp Field names within [bf struct]s must be unique since the C compiler does not maintain a separate symbol table for each [bf struct]. This behavior is in accordance with K&R and the V7 Unix C compiler. (Berkeley Unix, System III, and System V, all keep separate symbol tables for each structure.) .# (We're working on this one, too.) .SH "Character Representation and Conversion" Character values run from 128-255, not 0-127. .pp Characters are not sign extended when promoted to integers. .SH "Numerical" .pp Programs that use data of type [bf double] may lose precision in trade for increased magnitude. .pp See the .ul SWT Math Library User's Guide for more details on Prime's floating point hardware and software. .SH "Library Incompatibilities" The Unix call 'fork' cannot be efficiently implemented because of operating system restrictions, and is therefore not available with Georgia Tech C. .pp 'Read' and 'write' calls that do not use 'sizeof' to compute the buffer length will probably have to be changed. .pp Programs that open other users' terminals can not be supported. .SH "Unix File System Incompatibilities" Programs that depend intimately on the Unix directory structure ('..', directory layout, links) will not be easily converted. .pp Programs that depend on the order and behavior of Unix file descriptors will not be easily converted. .pp You cannot depend on file descriptors 0, 1, and 2 always being connected to standard input, standard output, and standard error respectively. Instead, use the macros STDIN, STDOUT, and STDERR (defined in "=incl=/swt.h", which is automatically included by "=cdefs="). .SH "Tabs" Tabs are not supported in exactly the same manner on the Prime as in Unix. C programs which produce tabs in their output should be run piping their output into the Subsystem program 'detab' ('detab +8' is recommended). .SH "Static Initializers" Initializers for static data objects which involve the "address-of" operator may only consist of "&objectreference". For example, while the statement "static char *x = &A" is okay, the statement "static char *x = &A+1" cannot be handled by the Georgia Tech C compiler. The restriction arises from the inability of PMA/SEG to handle address expressions of external symbols when forming 32-bit pointers. .SH "Registers" In 64V mode, the Prime is essentially a single accumulator machine. Thus, while the compiler recognizes the [bf register] keyword, there is no effect on the size or speed of the generated code. .SH "The Type void" Berkeley and System III Unix introduced the new type [bf void] into the C language. A [bf void] function is one which is guaranteed not to return a value (i.e. a true procedure). Only functions may be declared to be of type [bf void], although you may also cast a function call to [bf void]. Georgia Tech C does not directly support [bf void], but you may get around it with the simple statement: .be #define void int .ee which should allow you to port practically any code which uses [bf void]. Admittedly, this defeats some of the type checking that the new type provides, but it will allow you to port code, without having to modify it.