# FluidCheck: A Redundant Threading based Approach for Reliable Execution in Manycore Processors

**Rajshekar Kalayappan, Smruti R. Sarangi** Dept of Computer Science and Engineering Indian Institute of Technology Delhi New Delhi, India.



# Soft Errors

• Temporary nature



[img src : aviral.lab.asu.edu]

- Occurs due to particle strikes on the silicon
- Source of particles :
  - Solar ion flux
  - Explosion of distant stars
  - Impurities in the chip

# Soft Errors

- Rare event
  - Particles need to strike at the right place, at the right angle, with the right amount of energy
- Not rare enough to be ignored
  - The critical charge required to flip a bit reduces with reducing feature size and operating voltage

# Soft Errors

- Solutions
  - Device level radiation hardening
    - Two to four generations behind commercial counterparts [Courtland2015]
  - System level hardening techniques required
    - Redundancy



#### **Problem Statement**

• To *efficiently* execute a set of applications on a chip multi-processor (homogeneous SMT-capable cores), while ensuring *reliability* in the face of soft errors

## Related Work : DIVA [Austin1999]

•Meant to provide reliability.



### Related Work



#### SRT [Reinhardt2000], AR-SMT [Rotenberg1999]

- •Saves area
- •Better throughput per core



#### CRT [Mukherjee2002]

Improvement over SRT
Circumvents hazards borne out of resource requirement similarity between a leader-checker pair
Better throughput per core



#### Without any checking, throughput = 4.84 instructions per cycle



Throughput = 3.24
Similarity in resource requirement
High throughput threads together



















Arbiter



# Challenges to achieving FluidCheck

- Reactive phase-based scheduler
- Efficient transfer of hints
- Efficient forwarding of cache lines from the leader to the checker
- Circumventing subtle livelock scenarios

#### Hardware Architecture



### **Overview of Redundant Execution**





L2















































### Memory Checkpointing

































#### Forwarding Filters Do Not Forward Leader Pipeline Ct Miss! Hit! L1 RFB























# Arbiter Logic: I

- Activity
  - IPC
  - WIPC(x)
- Mapping a Single Thread
  - Select the core with minimum *activity* that has free SMT slots
  - If activity is IPC, scheme is termed *minIPC*
  - If activity is WIPC(x), scheme is termed minWIPC\_x

### Arbiter Logic: II

- Mapping a Set of Threads
  - Scheduling Policies:
    - Pinned Leaders (SP-PL)
    - Unpinned Leaders (SP-UL)
    - Unpinned Leaders All Leaders First (SP-UALF)
- SMT Fetch Policy
  - Full Simultaneous Issue [Tullsen1995]
  - If *n* threads on a core have activities  $A_1, A_2 ... A_n$ , then the *i*<sup>th</sup> thread gets fetch cycles (cycle block of size *B* considered)  $\frac{A_i}{\sum_{k=1}^n A_k} \times B$

#### **Evaluation: Simulation Parameters**

- 16-core processor, 4-way SMT
- Core configuration based on Intel Sandybridge and IBM Power7

| Parameter           | Value           |
|---------------------|-----------------|
| Pipeline width      | 4               |
| i-cache and d-cache | 32 kB           |
| Shared L2 cache     | 12 MB           |
| NOC topology        | 2D torus        |
| Hint buffer         | 512 entry       |
| Victim Cache        | 32 entry        |
| RFB and LFB         | 64 entries each |

# **Evalation Methodology**

- Tools
  - Tejas Architectural Simulator
  - McPAT and Orion2 models
- Workloads
  - "low": 16 applications (16 + 16 threads)
  - "medium": 24 applications (24 + 24 threads)
  - "high": 32 applications (32 + 32 threads)
  - In each case 100 random combinations of SPEC CPU2006 benchmarks were considered
- Comparison Metric

 $\sqrt[W]{\prod_{b \in W} \frac{\text{cycles taken to reliably execute } b}{\text{cycles taken to unreliably execute } b}} - 1$ 

#### **Evaluation:** Results



#### FluidCheck's Mapping Ability



#### Performance of Forwarding Filters



# Comparison with Generic Scheduling Schemes



#### Conclusions

- Efficient system-level solutions to handle soft errors are critically sought
- The protection of modern multi-core, multithreading capable processors presents interesting challenges
- Our solution FluidCheck achieves reliability with a mere 27% reduction in performance on average, while seminal works such as SRT (47%) and CRT(37%) present much higher slowdowns

#### Extra slides

#### **DIVA** : Checker Operation



#### **DIVA : Execution Assistance**

- The DIVA checker
  - Faces no data hazards
    - Operand value hints are passed from leader
  - Faces no control hazards
    - The stream of packets from the leader are in correct dynamic order (if no soft error struck the prediction or branching logic)
    - If a soft error occurred (rare event), it is detected when the branch condition is evaluated at the checker

#### DIVA : Consequence of Execution Assistance

- What gains can be achieved through execution assistance?
  - Checker can be made simpler
  - Checker can be made slower
  - Checker can be made to do more work

### Resolving Livelock Issues

- Suppose a checker thread faces a decode stall since the ROB was full
- Suppose some other leader thread on the same core is occupying the head of the ROB and is facing a long latency miss
- The checker thread is forced to migrate
- Possibility of multiple forced migrations in quick succession – detrimental to performance
- Solution Reservation. If a resource is greater than 95% full, it will not accept any more leader entries