Lazy code motion

Loop invariant operations

for (i = 0; i < n; i++) {
  A[i] = m + n;
}
to (loop invariant code motion)
t = m + n;
for (i = 0; i < n; i++) {
  A[i] = t;
}

Partial Redundancy Elimination (aka lazy code motion)

Subsumes

Algorithm Prelims:

Algorithm Overview:

Some examples

Example1:

BB1-->BB2
BB1-->BB3
BB1-->BB4
BB2-->BB5
BB3-->BB5
BB4-->BB6

BB2: x = a + b
BB5: y = a + b
BB4: a = 10
BB6: z = a + b
change to
BB1-->BB2
BB1-->BB3
BB1-->BB4
BB2-->BB5
BB3-->BB5
BB4-->BB6

BB2: x = a + b
BB3: t = a + b
BB5: y = t
BB4: a = 10
     t = a + b
BB6: z = t
Notice that we put t = a+b where a+b is "needed", i.e., it is needed on all paths in future

Example 2

BB1-->BB2
BB2-->BB3
BB3-->BB2
BB3-->BB4

BB2: x = a + b
change to
BB1-->BB2
BB2-->BB3
BB3-->BB2
BB3-->BB4

BB1: t = a+b
BB2: x = t
Again, t=a+b is "needed" on all paths in future

Example 3

BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB4-->BB6
BB5-->BB3
BB6-->BB3

BB1: x = a+b
BB4: y=a+b
BB6: a=10
change to
BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB4-->BB6
BB5-->BB3
BB6-->newBB
newBB-->BB3

BB1: t = a+b
     x=t
BB4: y=t
BB6: a=10
newBB: t=a+b

Example 4 (not all redundancy can be eliminated)

BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB5-->BB3
BB3-->BB6
BB6-->exit

BB1: x=a+b
BB5: y=a+b
BB6: a=10
change to
BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB3-->BB6
BB6-->newBB
newBB-->BB3
BB6-->exit

BB1: t=a+b
     x=t
BB5: y=t
BB6: a=10
newBB: t=a+b
There is some redundancy in the transformed code which is unavoidable. Notice that the path from newBB to exit computes a+b even though it was not required. The other option is:
BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB5-->BB3
BB3-->BB6
BB6-->exit

BB1: t=a+b
     x=t
BB5: t = a+b
     y=t
BB6: a=10
But now a+b is computed twice on the path B2->B3->B4->B5 (this is the solution that our analysis will yield).

Pass 1: Needed expressions

Data-flow analysis Notice that while we are doing all expressions at once, we could also have done one expression at a time. In which case, my domain of values would simply be "needed" (definitely-needed) or "not-needed" (can't be sure if needed or not).

Proposal 1:

Some examples where proposal1 will fail:

Example1

BB1->BB2
BB2->BB3
BB3->BB4
BB3->exit

BB1: x=a+b
BB4: y=a+b
Now, a+b is needed at BB4 entry but not needed at BB3 exit. Yet computing t=a+b on the edge BB3->BB4 is redundant because the expression a+b is already available (or not missing) at BB4.

Example2

BB1->BB2
BB1->BB3
BB1->BB4
BB2->BB5
BB3->BB5

BB4: a=10
BB5: y=a+b
Here, the frontier occurs at the end of BB1 but that is too early to compute t=a+b as we will take up an extra register for a long time unnecessarily!

Example3 (a more convincing example of why you should not compute a value too early)

BB1->BB2
BB1->BB3
BB2->BB4
BB4->BB2
BB4->BB5
BB5->BB6
BB3->BB7
BB7->BB6

BB3: a=b+c
BB6: y=b+c
The frontier is at the end of BB1 but that would eat-up a register for the entire duration of the BB2-BB4 loop which seems too wasteful.

Pass 2: Missing Expressions

Proposal 2

Place the expression at the earliest point we need it and is missing

Postponed expressions

An expression e is postponed at a program point p if: for all paths leading to p, we have seen the earliest placement of e but not a subsequent use. Take the third example again:
BB1->BB2
BB1->BB3
BB2->BB4
BB4->BB2
BB4->BB5
BB5->BB6
BB3->BB7
BB7->BB6

BB3: a=b+c
BB6: y=b+c
Here, b+c is postponed at the beginning of BB3 and on the edge BB5->BB6. Latest: frontier at the end of "postponed" wave. This will solve the issues with Example2 and Example3

Pass 4: Cleaning up

BB1->BB2
BB2->BB3

BB1: x=a+b
No need to generate temporaries in this case. But our algorithm will. Do a liveness analysis to see if the expression is used later (without an intervening latest as a latest indicates recomputation).

Lazy code motion (or partial-redundancy elimination) algorithm