Open64 compiler enables compilation of programs written in multiple languages, for example, FORTRAN, C, C++ to multiple architectures, for instance x86, x86_64, IA-64 to name some. Even though multiple languages are supported by the compiler’s front-end, its back-end operates on a common Intermediate Representation (IR). This IR is WHIRL.
One of the important features in Open64 is the use of common intermediate representation irrespective of the source language. This concept enables common compilation stages to achieve the desired effect over inputs provided in different languages. For instance, a single compiler implementation of the interprocedural analysis stage and loop nest optimization stage optimizes input programs that are provided in different source languages. Futher, optimization passes communicate easily with the common IR and thus the communication overhead between passes is reduced.
Even though the common IR is WHIRL, several forms of WHIRL come to existence via Open64’s gradual lowering compilation process. The main benefit from Open64’s gradual lowering process is in enhancing the expressibility of certain optimizations. Certain optimization may be effective or easily expressible when the input that is applied is at a high level in IR. By not translating code to a low level and by using gradual lowering, open64 enables different granulaties that suit the expressibility of particular optimizations.
WHIRL at its highest levels is much like a high level language program. It is represented using a tree as a hierarchical representation. To enable control flow and data flow analysis, hierarchical WHIRL is lowered to a control flow graph after lowering high-level constructs to a mid-level. To enable SSA optimization, CFGs are translated to SSA form. In its high-level and lower levels, WHIRL consists of WHIRL expressions and statements. Expressions do not cause side-effects and statements may cause side-effects. This design choice also helps in capturing the implicit control flow between statements in the input programs, irrespective of the source language. This document serves to introduce the Open64 compiler developer to a number of compilation concepts in Open64. This document contains a number of examples which developers may find useful. A brief introduction to components in WHIRL is provided in Chapter 2.
WHIRL consists of WHIRL instructions and WHIRL symbol table. This chapter contains a brief introduction to the general structure shared by every WHIRL instruction. Some of the common instructions in WHIRL are shown with examples. WHIRL symbol table entries are refered by WHIRL instructions. The interface provided for compiler passes to access the symbol table is described in this section. Following sections contains number of examples of different types of WHIRL nodes. Further details of WHIRL may be found in [WHIRL, SYMTAB].
F8F8LDID 0 <2,1,A> T<11,.PREDEF_F8,8>
Note: It is important fact that there are two data types in case of indirect stores and loads. First is either for the result type, in case of load, or for operand type in case of store. Other is for the type of the addressing pointer.
Note: Stores or other statement types are never used as operands of expression WNs. Open64 uses this restriction to delimit stateless expression trees from stateful statement types.
Section 2.1 introduces debugging methods for pass writers. Sections 2.2 contains WHIRL expressions and WHIRL Statement descriptions with examples. For more detail further readings of [WHIRL, SYMTAB] are recommended. Section 2.3 demonstrates interfaces to creating WHIRL nodes with examples, demonstrates iterations over WHIRL nodes and introduces the memory pool dynamic allocation system used in Open64. Section 2.4 demonstrates the interfaces to adding annotations to WHIRL nodes.
The compiler provides an option, -keep, which writes out WHIRL of a given source file as a binary file. The file name extension for this binary file is “.B”. The ir_b2a tool is used to translate a WHIRL binary file to human readable form. ir_b2a tool also supports -st option which enables human readable symbol table data in the output. Examples used in this report are generated using the ir_b2a tool (ir_b2a dumps have been edited here for content). Using ir_b2a on high WHIRL is a particularly effective way to understand WHIRL.
Optimization passes in Open64 can printf a human readable version of WHIRL using utility functions for instance, dump_wn, dump_tree. It will be useful to carefully study the result of the following dump routines on small programs prior to working on optimization.
Note: There are versions of the above functions which print to a FILE*. Look for fdump_* (FILE*, WN*) functions in the source code base.
Human readable versions of WN or a WHIRL tree that are obtained using ir_b2a tool or the dump functions generally consists of WN operator, WN types, WN descriptors, Symbol Table entry, Type Table entry and/or other qualifiers when applicable. Human readable WHIRL data in this document have been obtained with ir_b2a.
Scalar loads and stores are said to use direct addressing. Their addresses are at a known location, which is referred by an entry in the symbol table. This entry is a pointer referencing to type ST and it is available in the WN for scalar loads and stores. The utility function WN_st can be used to obtain the ST from a LDID and STID.
F8F8LDID 0 <2,1,A> T<11,.PREDEF_F8,8>
I4INTCONST 22 (0x16)
I4STID 0 <1,43,i> T<4,.predef_I4,4>
For examples on scalar loads and stores that are qualified with field_ids, consider the C++ or C structure in Figure 2.1. Loads and stores to structure fields have an additional qualifier, field_id, compared to loads and stores to regular program variables.
ILOAD and ISTORE are used in situations when accessing arrays and pointers. The example ISTORE in line 9 of Figure 2.2 indicates that the field_id is 0 and also indicates that an 4-byte integer value is stored using a pointer which is a 8-byte type anonymous pointer. Similar reasoning applies to line 18 of Figure 2.2. The LDA operator is used when address of pointer types are required. An example is line 5 in Figure 2.5. Here the variable ’a’ has been renamed by the front-end as ’anon1’. The type of the pointer is of size 8-bytes as per the last field of the type tuple: T<46,anon_ptr., 8> 1.
ARRAY WNs are used to represent array accessing in WHIRL. One of the children in both the ISTORE and the ILOAD in the Figure 2.2 is an ARRAY WN. A more detailed example of WNs for array access in FORTRAN and C are shown in Figure 2.4 and Figure 2.3 respectively.
However, the front-end may not be able to use ARRAYs for array accessing when pointers are used instead of arrays, for instance, as in Figure 2.6. In these instances, the compiler front-end uses lower level address computation WHIRL trees for address generation. The address compute kid of the ISTORE in line 17 of Figure 2.6 computes the address of b[i][j] by first computing b[i] with the ADD WN in line 11, and following that with arithmetic to obtain b[i][j] with the ADD WN in line 16. Notice that the address pointer computation to obtain b[i] needed a 8-byte scaling with the MPY WN in line 9 but the address computation to obtain b[i][j] only needed a scaling of 4-bytes with the MPY in line 15. This is because the size of array elements in b are 4-bytes.
Figure 2.7 contains an example for implicit type conversions and type promotions. Line 13 in Figure 2.7 demonstrates implicit type conversions. It contains an assignment of an integer variable ’a’ to a float variable ’b’. The WHIRL for this line is in the following lines: lines 14-16. The WN that enables the conversion has a WN_operator of CVT, result type of F4 (4-byte float) and operand type of I4 (4-byte integer). Lines 18-22 in Figure 2.7 shows a type promotion of the variable ’b’ to double precision and then adding to the double precision constant (in line 21). Some other relavent examples are in the following lines.
A WN whose operator is BLOCK is used to hold a list (doubly linked) of subtrees. The first WN in the list is obtained by using WN_first utility function and the last WN in the list is obtained by using the WN_last utility function. Unlike other WNs, this node does not support WN_kid functions or the WN_kid_count. While direct access of a kid is not supported for WN blocks, iteration over the BLOCK is possible using the WN_next and WN_prev utility functions. An example WN for BLOCK is in Figure 2.8 between lines 6-10 (this BLOCK is enclosed by another block between lines 5-16).
Call parameters are kids for the CALL WN. An example for a C-function call is in Figure 2.8. An example for a FORTRAN-function call is in Figure 2.9. FORTRAN also supports SUBROUTINEs. An example for subroutine calling is in Figure . It is important to see the differences between the way parameters are passed in FORTRAN (Figure 2.9), wherein they are passed by reference, and in C (Figure 2.8), wherein parameters are passed by value. The parameters are passed by using a PARM WN. Notice the difference between PARM WN in line 8 in Figure 2.8 and the PARM WN in line 31 in Figure 2.9. WN_kid0 of these PARM nodes are LDID loading the passed value in case of C and a LDA which passes the reference to the value in case of FORTRAN. The differences between them show in WHIRL only when passing arguments in caller and there is no difference while loading from parameters inside the callee.
The data type of the call is obtained with the WN_ty utility function. The call WN in line 9 of Figure 2.8 computes a result of type I4 (4-byte integer). WN_kid utility functions should be used to get parameters used in the CALL WN. WN_st on the call WN obtaines the ST entry of the called function.
C supports using function pointers. An example function pointer invocation in C and the corresponding WHIRL is shown in Figure 2.11. The only difference between direct and indirect calls is that WN_kid0 is a load that dereferences the function pointer. The type of the function pointer (type representing the function declarator) is obtained using the WN_ty utility function. Developers are encouraged to study how function pointers are used in open64 to support C++ virtual functions.
If-else structures from the input program are represented as binary conditional structures as shown in Figure 2.12. Each if-structure has three kids: WN_kid0 gives the conditional expression, WN_kid1 gives the body that is executed when the condition evaluates to non-zero value and WN_kid2 gives the body that is executes when the condition evaluates to zero value. WN_kid1 and WN_kid2 are BLOCK WHIRL nodes.
An example for a FORTRAN DO loop is given in Figure 2.13.
An example for a C FOR loop is given in Figure 2.14. Unlike the FORTRAN do loop in Figure 2.13, C for loops are flattened to lower level WHIRL. Lines 9-12 in Figure 2.14 contain the initial assignment to the induction variable, lines 14-25 contain the body, lines 27-31 contains the increment statement and lines 33-47 contain the iteration test.
To WN create and delete. There are functions in open64 that ease management of WHIRL nodes, viz to create, delete and iterate over them. While there is a generic create function, there are specialized forms of create functions for commonly encountered WNs which are recommended. Some example create utility function prototypes are presented in Figure 2.15. The first prototype, WN_Ldid, creates a LDID with pointer type given by argument "TYPE_ID desc" and with result type that is extracted from argument "ST* sym". The use of other arguments is explained in Section 2.2.1. The second prototype, WN_Add creates an ADD expression with type specified by argument “TYPE_ID rtype”, left child specified by argument “WN* l” and right child specified by argument “WN* r”. If a WN pointer is no longer necessary, developers can choose to delete the WN by using the WN_Delete function. The prototype for the WN_Delete function is also in Figure 2.1523.
To WN iterate. Iterating over all WN in a BLOCK in statement order is possible by use of the WN_first, WN_prev and WN_next utility functions (as explained in 2.2.5). Iterating breadth first or depth first over children is possible by using the WN_kid utility functions, along with the WN_kidcount utility function.
To use memory management with memory pools. Open64 enables fixed-sized memory allocation with Memory Pool management. Most passes create a memory pool at start and free the pool at the end of the pass. Use of memory pools is recommended. The procedure to using memory pools are given below.
Annotations are possible for WNs with the use of the WN_MAP_TAB data structure. WN_MAP_TAB is a collection of maps that map a given WN* against its annotation. There is a WN_MAP_TAB for each PU. WN_MAP_TAB holds all annotation tables for its PU and each of the annotation tables are identified witha WN_MAP integral identifier. Developers are pointed to the comments in osprey/common/com/wn_map.h and osprey/common/com/wn_map.cxx. A typical usage scenario for creating a new WN_MAP and adding annotations is given below.
[WHIRL] WHIRL Intermediate Language Specification, 5/10/2000. SGI Corporation. http://nchc.dl.sourceforge.net/project/open64/open64/Documentation/whirl.pdf
[SYMTAB] WHIRL Symbol Table Specification, 5/10/2000. SGI Corporation. http://nchc.dl.sourceforge.net/project/open64/open64/Documentation/symtab.pdf
1The examples in this document are compiled to 64-bit x86_64 object code. Consequently pointers are 8-bytes long. If the examples are compiled to 32-bit object files, the pointers will be 4-bytes long.
2For further prototypes, please refer to the “osprey/common/com/wn.cxx” source file or this web page: http://www-rocq.inria.fr/~pop/doc/common/wn.txt.
3It is important to note that WN is managed as pointers. Developers need to be congnizant of this fact prior to deleting or pointing to or changing the content of a given WHIRL node.