Thursday, March 17, 2011

the drawing board

for my own convenience, here is a useful link.

Project Proposal: Interprocedural Register Allocation for LLVM

Overview

There is much to be said in the land of register allocation for seeking to avoid having to save and restore callee-save and caller-save registers. Interprocedural register allocation is a strategy that seeks to compare multiple functions against each other for the respective sets of registers used, and subsequently allocate registers for several functions at once to minimise overlap. This can eliminate register saves and restores around function boundaries, saving on both execution time and stack usage.

Schedule

We aim to determine whether cross-function register allocation is an effective optimisation. In theory, we expect it to reduce the number of register saves and restores for many functions, and potentially also to reduce stack usage. These benefits may be tested with a suite of computationally expensive programs and with a suite of memory intensive programs respectively (though we expect the latter may be infeasible to pursue in the time provided, since heap usage will not be affected) (It may also be prudent to test this optimisation's effectiveness as compared to straightforward function inlining.). Our target featuresets are as follows:
  1. 75%: Cross-function register allocation works on simple programs, and its effectiveness can be measured with a rudimentary benchmarking suite.
  2. 100%: Support for programs including recursive call graphs (will require graph analysis regarding strongly connected components), more polished test suite.
  3. 125%: Support for one or more of the following possibilities: Cross-module link time optimisation, caller stack slot re-usage within callees, additional architectures
A week-by-week estimated progress schedule:
  1. Research into LLVM's infrastructure, general planning of implementation details
  2. Implementation
  3. Implementation, possibly starting on benchmark strategies
  4. Wrapping up basic implementation, developing benchmark suite - By April 14th (the checkpoint), we hope to have met the 75% goal.
  5. Polishing basic implementation, adding recursion support, more sophisticated benchmarking, possibly considering one or two of the 125% goals
  6. Final polishing, further consideration of a 125% goal
Software & Implementation Details

Our code will be written in the LLVM framework. All software that will be used as part of this project is either already included in the LLVM source tree or shall be self-supplied.

We will initially be targetting the x86-64 architecture. x86-64 contains 16 general purpose registers that may be used by the compiler, subject to calling conventions specified by Intel. While it is recommended that compilers generally follow calling conventions, interprocedural register allocation seeks to decrease the number of loads and stores (pushes and pops) in emitted code. We will therefore selectively choose to ignore certain conventions.

The CSD fish machines (fish.ics.cs.cmu.edu) currently run Intel Xeon E5520 CPUs. We also both own machines that contain Intel Core 2 Duo CPUs.

Related Work

Peter A. Steenkiste and John L. Hennessy. A simple interprocedural register allocation algorithm and its effectiveness for lisp. ACM Trans. Program. Lang. Syst. 11:1--32, January 1989.

David W. Wall. Global register allocation at link time. SIGPLAN Not., 21:264--275, July 1986.

The above papers consider register allocation in an interprocedural framework. We will take these papers into consideration when designing our implementation in LLVM.

Furthermore, as a version of global register allocation has recently been implemented in GCC, we will be looking closely at the current work that has been done on interprocedural register allocation in GCC. We will also be looking at other research done on interprocedural register allocation.

No comments:

Post a Comment