Lazy code motion

Loop invariant operations

for (i = 0; i < n; i++) {
  A[i] = m + n;
}

to (loop invariant code motion)

t = m + n;
for (i = 0; i < n; i++) {
  A[i] = t;
}

Partial Redundancy Elimination (aka lazy code motion)

Subsumes

Common subexpression elimination
Loop invariant code motion

plus new capability

BB1->BB2
BB1->BB3
BB2->BB4
BB3->BB4

BB2: a = b + c
BB3: b = 7
BB4: d = b + c

can be changed to

BB1->BB2
BB1->BB3
BB2->BB4
BB3->BB4

BB2: t = b + c
     a = t
BB3: b = 7
     t = b + c
BB3: d = t

Can we place calculations of b+c such that no path re-executes the same expression? Algorithm based on PLDI92 paper by Knoop et. al. on "Lazy code motion"

Propose separate forward and backward passes.
Problem: critical edges
```
BB1 --> BB3
BB2 --> BB3
BB2 --> BB4

BB1: c = a+b
BB3: d = a+b
		
```
A critical edge:
- Source basic block has multiple successors
- Destination basic block has multiple predecessors

Introduce a new basic block to eliminate critical edges:

BB1 --> BB3
BB2 --> newBB
BB2 --> BB4
newBB --> BB4

BB1: t = a+b
     c = t
newBB: t = a+b
BB3: d = t

Algorithm Prelims:

Assume every statement is a basic block
We will only place statements at the beginning of a basic block
Add a basic block for every edge that leads to a basic block with multiple predecessors

Algorithm Overview:

Determine where the expressions are to be computed
Consists of four data-flow problems:
- Optimal computation (passes 1 and 2)
  - No calculation of expression that is not used, along any execution path.
  - No redundancy along any execution path.
- Operations are executed at the latest possible time to minimize register usage (pass 3)
- No unnecessary copy assignments (pass 4)

Some examples

Example1:

BB1-->BB2
BB1-->BB3
BB1-->BB4
BB2-->BB5
BB3-->BB5
BB4-->BB6

BB2: x = a + b
BB5: y = a + b
BB4: a = 10
BB6: z = a + b

change to

BB1-->BB2
BB1-->BB3
BB1-->BB4
BB2-->BB5
BB3-->BB5
BB4-->BB6

BB2: x = a + b
BB3: t = a + b
BB5: y = t
BB4: a = 10
     t = a + b
BB6: z = t

Notice that we put t = a+b where a+b is "needed", i.e., it is needed on all paths in future

Example 2

BB1-->BB2
BB2-->BB3
BB3-->BB2
BB3-->BB4

BB2: x = a + b

change to

BB1-->BB2
BB2-->BB3
BB3-->BB2
BB3-->BB4

BB1: t = a+b
BB2: x = t

Again, t=a+b is "needed" on all paths in future

Example 3

BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB4-->BB6
BB5-->BB3
BB6-->BB3

BB1: x = a+b
BB4: y=a+b
BB6: a=10

change to

BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB4-->BB6
BB5-->BB3
BB6-->newBB
newBB-->BB3

BB1: t = a+b
     x=t
BB4: y=t
BB6: a=10
newBB: t=a+b

Example 4 (not all redundancy can be eliminated)

BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB5-->BB3
BB3-->BB6
BB6-->exit

BB1: x=a+b
BB5: y=a+b
BB6: a=10

change to

BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB3-->BB6
BB6-->newBB
newBB-->BB3
BB6-->exit

BB1: t=a+b
     x=t
BB5: y=t
BB6: a=10
newBB: t=a+b

There is some redundancy in the transformed code which is unavoidable. Notice that the path from newBB to exit computes a+b even though it was not required. The other option is:

BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB5-->BB3
BB3-->BB6
BB6-->exit

BB1: t=a+b
     x=t
BB5: t = a+b
     y=t
BB6: a=10

But now a+b is computed twice on the path B2->B3->B4->B5 (this is the solution that our analysis will yield).

Pass 1: Needed expressions

Data-flow analysis

Domain of values: sets of expressions
Direction: backward
Meet operator: intersection. Dictates the transfer function on successor-predecessor edges
Transfer function across an instruction: F(x) = EUse \union (x - EKill). Euse is the set of expressions used by the instruction (e.g., {a+b} for x=a+b). Ekill is the set of expressions killed by this expression, e.g., all expressions containing x are killed by x=a+b.
Boundary condition: Needed(exit,in)=empty.
Initialization: for all b (not exit), in[b] = set of all expressions

Notice that while we are doing all expressions at once, we could also have done one expression at a time. In which case, my domain of values would simply be "needed" (definitely-needed) or "not-needed" (can't be sure if needed or not).

Proposal 1:

Place the expression at the frontier of "needed expressions" (between "not needed" and "needed")
- But does it eliminate all redundancies? (e.g., may retain some redundancies)
- And is that the best placement? (e.g., may be too early)

Some examples where proposal1 will fail:

Example1

BB1->BB2
BB2->BB3
BB3->BB4
BB3->exit

BB1: x=a+b
BB4: y=a+b

Now, a+b is needed at BB4 entry but not needed at BB3 exit. Yet computing t=a+b on the edge BB3->BB4 is redundant because the expression a+b is already available (or not missing) at BB4.

Example2

BB1->BB2
BB1->BB3
BB1->BB4
BB2->BB5
BB3->BB5

BB4: a=10
BB5: y=a+b

Here, the frontier occurs at the end of BB1 but that is too early to compute t=a+b as we will take up an extra register for a long time unnecessarily!

Example3 (a more convincing example of why you should not compute a value too early)

BB1->BB2
BB1->BB3
BB2->BB4
BB4->BB2
BB4->BB5
BB5->BB6
BB3->BB7
BB7->BB6

BB3: a=b+c
BB6: y=b+c

The frontier is at the end of BB1 but that would eat-up a register for the entire duration of the BB2-BB4 loop which seems too wasteful.

Pass 2: Missing Expressions

Assume: an expression will be calculated at the earliest point it is needed.
Forward pass: Missing expression
An expression is missing at p if its value has not been "needed" by any basic block along SOME path reaching p. Will solve the redundancy problem with the first example.
Domain of values: sets of expressions
Direction: forward
Transfer function at instruction b: f(x) = EKill \union (x - Needed.in[b]). It is missing if it has been killed. If it is needed then it is not missing.
Meet = union
Boundary condition: out[entry] = set of all expressions.
Initialization for all other b (not entry): out[b] = empty set.

Proposal 2

Place the expression at the earliest point we need it and is missing

earliest(b)=needed.in[b] \intersect missing.in[b]
Algorithm: for all basic block b, if x+y \in earliest[b], at beginning of b, create a new variable t, such that t=x+y, and replace every original x+y by t.
But it may be too early! Solves example1 but the problems with example2 and example3 remain.

Postponed expressions

An expression e is postponed at a program point p if: for all paths leading to p, we have seen the earliest placement of e but not a subsequent use. Take the third example again:

BB1->BB2
BB1->BB3
BB2->BB4
BB4->BB2
BB4->BB5
BB5->BB6
BB3->BB7
BB7->BB6

BB3: a=b+c
BB6: y=b+c

Here, b+c is postponed at the beginning of BB3 and on the edge BB5->BB6.

Domain of values: sets of expressions
Direction: forward
Transfer function for instruction b: f(x)=(earliest[b] \union x) - EUse
Meet : Intersection
Boundary condition: out[entry] = empty set
Initialization for all b (not entry): out[b] = set of all expressions

Latest: frontier at the end of "postponed" wave.

latest[b] = (earliest[b] \union postponed.in[b]) \intersection (EUse \union ExprsForWhichSomeSuccessorIsNotPostponedOrEarliest)
ExprsForWhichSomeSuccessorIsNotThePostponedOrEarliest = \union_over_successors (\not earliest[b] \and \not postponed.in[b]).

This will solve the issues with Example2 and Example3

Pass 4: Cleaning up

BB1->BB2
BB2->BB3

BB1: x=a+b

No need to generate temporaries in this case. But our algorithm will. Do a liveness analysis to see if the expression is used later (without an intervening latest as a latest indicates recomputation).

Domain of values: set of expressions
Direction: backward
Transfer function: f(x) = (EUse \union x) - latest[b]
Meet operator: union
Boundary condition: in[exit] = empty set
Initialization: in[b] = empty set

Lazy code motion (or partial-redundancy elimination) algorithm

Backward pass to compute needed expressions
Forward pass to compute missing expressions
Earliest = intersection of needed and missing
Forward pass using earliest placement to compute (can be)-postponed expressions
Latest = frontier of (can be)-postponed expressions
Backward pass to eliminate unused temporary variable assignments
For all basic blocks b: if x+y \in (latest[b] \intersect used.out[b]):
- at beginning of b: add new t=x+y
- replace original x+y by t only if:
  - x+y \in (EUse \intersect \not(latest[b] \insersect \not(used.out[b])))
  - If it is in latest[b] and not in used.out[b], this use of the expression does not need to be (and has not been) replaced by t, so let it be as it is.