

# Fence Synthesis under the C11 Memory Model

Sanjana Singh, Divyanjali Sharma and Subodh Sharma

(Indian Institute of Technology Delhi, India)

Ishita Jaju

(Uppsala University, Sweden)

Order might be critical for correctness



Order might be critical for correctness



### C/C++11 (C11) memory orders



Order might be critical for correctness







Order might be critical for correctness







memory order specification to ensure performance and correctness should not be left to humans.

Oberhauser et al., ASPLOS'21

- Order might be critical for correctness
- Fences restore order



### Fence synthesis for automated correction



ATVA 2022

### C11 fences

- Tools for ordering restrictions.
- Support degrees of ordering guarantees



### C11 fences

- Tools for ordering restrictions.
- Support degrees of ordering guarantees

Synthesis challenges:

How many and where? Which memory order?



### Existing fence synthesis techniques

- Imprecise (Existing techniques assume an axiomatic definition of ordering)
  - Strong implicit ordering  $\Rightarrow$  miss C11 bugs + insufficient barriers
  - Weak implicit ordering ⇒ unnecessarily strong barriers
- Reduced portability

### Existing fence synthesis techniques

- Imprecise (Existing techniques assume an axiomatic definition of ordering)
  - Strong implicit ordering ⇒ miss C11 bugs + insufficient barriers
  - Weak implicit ordering ⇒ unnecessarily strong barriers
- Reduced portability

#### Fence synthesis for C11

- Precisely detect C11 traces
- Synthesize portable C11 fences

### Fensying: Optimal C11 fence synthesis

### Optimal fence synthesis

- Smallest set of fences
- Weakest type of fences

solution not unique

sb sequenced-beforerf reads-frommo modification-order

**Step 1** get the set of buggy traces



buggy trace generator (BTG): CDSChecker, open source SMC [Norris and Demsky, OOPSLA'13]

Step 2 generate intermediate trace



sb sequenced-beforerf reads-frommo modification-order

sb sequenced-beforerf reads-frommo modification-order

Step 2 generate intermediate trace (additional ordering with fences)



maximum possible fence ordering

*Step 3* detect violations of coherence



C11 coherence conditions:

hb is irreflexive

rf; hb is irreflexive

mo; hb is irreflexive

mo; rf; hb is irreflexive

mo; hb; rfinv is irreflexive

mo; rf; hb; rfinv is irreflexive

sb U rf U mo U (rfinv; mo) is irreflexive

[Lahav et al. PLDI 2017, Lahav Siglog News2019]

*Step 3* detect violations of coherence





*C11 coherence conditions:* 

hb is irreflexive

rf; hb is irreflexive

mo; hb is irreflexive

mo; rf; hb is irreflexive

mo; hb; rfinv is irreflexive

mo; rf; hb; rfinv is irreflexive

sb U rf U mo U (rfinv; mo) is irreflexive

[Lahav et al. PLDI 2017, Lahav Siglog News2019]

Johnson's algorithm for cycle detection

[Johnson, D.B, SICOMP'1975]

Step 4 find the smallest set of fences min-model of a SAT query



Step 4 find the smallest set of fences min-model of a SAT query

Optimal fence synthesis

- Smallest set of fences 🗸
- Weakest type of fences



Step 5 find weakest order





#### Optimal fence synthesis

- Smallest set of fences
- Weakest type of fences

Step 5 find weakest order



Optimal fence synthesis

- Smallest set of fences
- Weakest type of fences





Step 5 find weakest order





### Fensying: Optimal C11 fence synthesis

- Smallest set of fences
- Weakest type of fences

NP-hard [Taheri et al., DISC'19]



### fFensying: near-Optimal C11 fence synthesis

(fast-Fensying)

#### Fensying

- Sound
- Optimal
- Slow
- Doesn't scale

#### **fFensying**

- Sound
- near-Optimal
- Fast
- Scales

Sound: stops a buggy trace that can be stopped.

Optimal: synthesizes precise fences.

Near-optimal: provably optimal for one trace, and empirically optimal for all traces in 99.5% tests







### Fensying

VS

fFensying

*Theorem*: Fensying is sound.

*Theorem*: Fensying is optimal.

*Theorem*: fFensying is sound.

Sound: stops a buggy trace that can be stopped.

Optimal: synthesizes precise fences.

tested on 1389 litmus tests of buggy C11 programs

Fensying and fFensying stop buggy traces

Fensying performs optimally



Litmus tests source: Abdulla at al., OOPSLA'18



<sup>\*</sup> tests that timeout for both Fensying and fFensying



\* tests that timeout for both Fensying and fFensying



\* tests that timeout for both Fensying and fFensying

#### fFensying analysis

#### **≤2** traces for ~85% of tests



\* tests that timeout for both Fensying and fFensying

non-optimal (fFensying)

0.005% tests

extra fences (fFensying)

### 1.57 average



\* tests that timeout for both Fensying and fFensying

### Benefit of portability







### Applications

- Automated correction
- Ease of development (write the most relaxed program and use fensying)
- Automated weakening (weak memory optimization)
- Generating stress tests

# (f)Fensying tool

open source

https://github.com/singhsanjana/fensying

Improve BTG time

Improve fence synthesis time





Improve BTG time

Improve fence synthesis time



Improve BTG time

Improve fence synthesis time



Improve BTG time

Improve fence synthesis time



ε: set of events of buggy trace

E: #pairs of events in ε, in  $O(|ε|^2)$ 

C: #cycles of buggy trace, in  $O(|\epsilon|!)$ 

Improve BTG time

Improve fence synthesis time



#### Depth bound

| Bound    | Avg. Time | Accuracy | Scale  |
|----------|-----------|----------|--------|
| $\infty$ | 7.46      | 100%     | 96.76% |
| 10       | 0.07      | 100%     | 100%   |
| 8        | 0.05      | 100%     | 100%   |
| 6        | 0.05      | 100%     | 100%   |
| 4        | 0.04      | 99.93%   | 100%   |
| 3        | 0.04      | 78.33%   | 100%   |
| 2        | -         | -        | 0%     |

#### Cycle bound

| Bound    | Avg. Time | Accuracy | Scale  |
|----------|-----------|----------|--------|
| $\infty$ | 7.46      | 100%     | 96.76% |
| 4        | 0.05      | 100%     | 100%   |
| 3        | 0.05      | 100%     | 100%   |
| 2        | 0.04      | 100%     | 100%   |
| 1        | 0.05      | 100%     | 100%   |

#### Fence bound

| Bound    | Avg. Time | Accuracy | Scale  |
|----------|-----------|----------|--------|
| $\infty$ | 7.46      | 100%     | 96.76% |
| 5        | 0.17      | 100%     | 100%   |
| 4        | 0.08      | 99.93%   | 100%   |
| 3        | 0.05      | 78.33%   | 100%   |
| 2        | 0.04      | 10.66%   | 100%   |
| 1        | -         | -        | 0%     |

Down d Aren Times Assumes

Improve BTG time

Improve fence synthesis time



#### Depth bound

| Bound    | Avg. Time | Accuracy | Scale  |
|----------|-----------|----------|--------|
| $\infty$ | 7.46      | 100%     | 96.76% |
| 10       | 0.07      | 100%     | 100%   |
| 8        | 0.05      | 100%     | 100%   |
| 6        | 0.05      | 100%     | 100%   |
| 4        | 0.04      | 99.93%   | 100%   |
| 3        | 0.04      | 78.33%   | 100%   |
| 2        | -         | -        | 0%     |

#### Cycle bound

| Bound    | Avg. Time | Accuracy | Scale  |
|----------|-----------|----------|--------|
| $\infty$ | 7.46      | 100%     | 96.76% |
| 4        | 0.05      | 100%     | 100%   |
| 3        | 0.05      | 100%     | 100%   |
| 2        | 0.04      | 100%     | 100%   |
| 1        | 0.05      | 100%     | 100%   |

#### Fence bound

| Bound    | Avg. Time | Accuracy | Scale  |
|----------|-----------|----------|--------|
| $\infty$ | 7.46      | 100%     | 96.76% |
| 5        | 0.17      | 100%     | 100%   |
| 4        | 0.08      | 99.93%   | 100%   |
| 3        | 0.05      | 78.33%   | 100%   |
| 2        | 0.04      | 10.66%   | 100%   |
| 1        | _         | -        | 0%     |

Tests with extra fences

1.3%

3.2%

0%

Thank You Questions? Looking for post-doc positions Indian Institute of Technology Delhi **SERI 2023** Fence Synthesis under the C11 Memory Model

"We still do not have an acceptable way to make our informal (since C++14) prohibition of out-of-thin-air results precise. The primary practical effect of that is that formal verification of C++ programs using relaxed atomics remains unfeasible.

The paper [Lahav et al. PLDI'17] suggests a solution similar to <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3710.html">http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3710.html</a> .

We continue to ignore the problem here, but try to stay out of the way of such a solution."

source: <a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r5.html">https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r5.html</a> (Bullet 4. under 'Revising the C++ memory model')

# Fensying technique

# Step 3 detect violations of coherence (strong-fensying)

introduce *sc-order* (so) cycle in so ⇒ to cannot be formed



$$R = \rightarrow_{\tau}^{\mathbf{hb}} \cup \rightarrow_{\tau}^{\mathbf{mo}} \cup \rightarrow_{\tau}^{\mathbf{rf}} \cup \rightarrow_{\tau}^{\mathbf{fr}}$$

#### inability to create a total-order



### C11 fences do not restore sequential consistency



Interpreting barriers from memory orders is not precise





### Interpreting barriers from memory orders is not precise





barriers on ARM





barriers on ARM

### Interpreting barriers from memory orders is not precise







barriers on ARM

barriers on power

barriers on ARM

barriers on power

## Verifying optimality



## Reason (≤2 traces for ~85% of tests)

affect assert conditiondoes not affect assert condition

