CSc 520
Principles of Programming
Languages
:
Christian Collberg
Department of Computer Science
University of Arizona
- The most important and popular inter-procedural
optimization is inline expansion, that is
replacing the call of a procedure with the
procedure's body.
- Why would you want to perform inlining? There are
several reasons:
- There are a number of things that happen
when a procedure call is made:
- evaluate the arguments of the call,
- push the arguments onto the stack or move them
to argument transfer registers,
- save registers that contain live values
and that might be trashed by the called
routine,
- make the jump to the called routine,
- continued...
- make the jump to the called routine,
- set up an activation record,
- execute the body of the called routine,
- return back to the callee, possibly
returning a result,
- deallocate the activation record.
- Many of these actions don't have to be
performed if we inline the callee in the caller,
and hence much of the overhead associated with
procedure calls is optimized away.
- More importantly, programs written in modern
imperative and OO-languages tend
to be so littered with procedure/method calls. ...
- ... This is the result of programming with abstract data
types. Hence, there is often very little opportunity
for optimization. However, when inlining is performed
on a sequence of procedure calls, the code from the bodies
of several procedures is combined, opening up a larger
scope for optimization.
- There are problems, of course. Obviously,
in most cases the size of the procedure call code will
be less than the size of the callee's body's code. So,
the size of the program will increase as calls are
expanded.
- A larger executable takes longer to load from
secondary storage and may affect the paging behavior of
the machine. More importantly, a routine (or an inner
loop of a routine) that fits within the instruction-cache
before expansion, might be too large for the cache
after expansion.
- Also,larger procedures need more registers
(the register pressure is higher) than small ones.
If, after expansion, a procedure is so large (or
contains such complicated expressions) that the number of
registers provided by the architecture is not enough,
then spill code will have to be inserted when we run
out of registers.
- Several questions remain. Which procedures should
be inlined? Some languages (C++, Ada, Modula-3)
allow the user to specify (through a keyword or
pragma) the procedures that should be eligible for
expansions. However, this implies that a given
procedure should always be expanded, regardless of
the environment in which it is called.
This may not be the best thing to do. For example,
we might consider inlining a call to 1#1 inside a
tightly nested inner loop, but choose not to do so
in a module initialization code that is only executed
once.
- Some compilers don't rely on the user to decide
on what should be inlined. Instead they use
- A static heuristic, such as ``procedures
which are
- shorter than 10 lines and
have fewer than 5 parameters or
- are leaf routines (i.e. don't make
any calls themselves)
are candidates for inlining''.
- A heuristic based on profiling. After
running the program through a profiler
we know how many times each procedure
is likely to be called from each call
site. Only inline small, frequently
called procedures.
- How do we inline across module boundaries?
We need access to the code of the called procedure.
If the procedure is declared in a separately
compiled module, this code is not available.
What do we do? Good question...
- What's the difference between inlining and
macro expansion?
Inlining is performed after semantic analysis,
macro expansion before.
- At which level do we perform the inlining?
- intermediate code
- Most common.
- source code
- Some
source-to-source translators
perform inlining.
- assembly code
- Doable (with some compiler
cooperation), but unusual.
- Build the call graph:
- Create an empty directed graph 2#2.
- Add a node for each routine
and for the main program.
- If procedure 1#1 calls procedure 3#3
then insert a directed edge
4#4.
CallGraph1
- 2#2 is actually a multigraph since
a procedure might make multiple calls
to the same procedure.
- Beware of indirect calls through procedure
parameters or variables, as well as method
invocations!
- Pick routines to inline. Possible heuristics:
- Discard recursive routines
(Perform a topological sort
of the call graph. Cycles
indicate recursion.) or just
inline them one or two levels deep.
- Select routines with indegree=1.
- Select calls to small routines in inner loops.
- Rely on user-defined INLINE pragmas.
- Use profiling information.
- Consider effects on caching, paging,
register pressure, total code size, ...
- ...
- FOR each call
5#5 in 3#3
to inline procedure
6#6, in reverse
topological order of the call graph DO
- Replace the call
5#5 with 1#1's body.
- Replace references to call-by-reference
formal 7#7 with a reference to the corresponding
actual parameter 8#8.
- For each call-by-value formal parameter 7#7
create a new local 9#9.
Insert code to copy the call-by-value
actual 8#8 into 9#9. Replace references
to the call-by-value formal 7#7 with a
reference to its copy 9#9.
- For each of 1#1's local variables 10#10
create a new local 11#11 in 3#3.
Replace references to local variable
10#10 with a reference to 11#11.
Example:
12#12
Topological Order:
CallGraph2
- Performing the inlining in reverse topological
order saves time. For example, expanding V
in S before expanding S in R
and P is faster than expanding S
in P, then S in R, and then
V in P and R.
- Note: there is no path
13#13.
Maybe T could be deleted?
14#14
15#15
16#16
- Consider a method invocation 17#17. The actual procedure called will depend on
the run-time type of 18#18.
- If more than one method can be invoked at a
particular call site, we have to inline all possible
methods. The appropriate code is selected code by
branching on the type of 18#18.
- To improve on method inlining we would
like to find out when a call
m.P() can call exactly one method.
MethodCall
19#19
- For each type 20#20 and method 21#21 in 20#20, find the set
22#22 of method overrides of 21#21 in the inheritance
hierarchy tree rooted in 20#20.
- If 23#23 is of type 20#20, 22#22 contains the methods
that can be called by 24#24.
- We can improve on type hierarchy analysis by
using a variant of the Reaching Definitions
data flow analysis.
25#25
- The C preprocessor (cpp) is a program that preprocesses
a C source file before it is given to the C compiler.
- The preprocessor's job is to remove comments and apply preprocessor
directives that modify the source code.
- Preprocessor directives
begin with #, such as the directive
#include <stdio.h> in hello.c.
Some popular ones are:
- #include 26#26file27#27
- The preprocessor searches the system directories
(e.g. /usr/include) for a file named file and replaces this line
with its contents.
- #define word rest-of-line
- Replaces word with rest-of-line
thoroughout the rest of the source file.
- #if expression 28#28 #else 28#28 #endif
- If expression is non-zero, the lines up to the #else
are included, otherwise the lines between the #else and the
#endif are included.
#define ARRAY_SIZE 1000
char str[ARRAY_SIZE];
#define MAX(x,y) ((x) > (y) ? (x) : (y))
int max_num;
max_num = MAX(i,j);
#define DEBUG 1
#define ARRAY_SIZE 1000
#ifdef DEBUG
printf("got here!")
#endif
#if defined(DEBUG)
#define Debug(x) printf(x)
#else
#define Debug(x)
#endif
Debug(("got here!"));
- A macro is expanded before syntactic and semantic analysis
takes place.
- An inline function is expanded after semantic analysis.
- Hence, the body of a macro is analyzed statically in the
environment of the caller, the body of an inlined procedure
is analyzed in the environment of the callee (itself).
- If foo is an inline procedure it will print "yes".
- If foo is a macro it will not compile.
29#29
- Read Scott: pp. 441-442, 301-303.
- Read the Dragon book: 530-532, 585-602.
Christian S. Collberg
2005-04-22