Transactional Memory: Architectural support for lock free data structures. Just like CAS is a transactional primitive for a single memory location, transactional memory implements "k"-CAS, where "k" is as big as the size of the L1 cache on the architecture. Lock free data structures avoid common problems associated with conventional locking techniques in highly concurrent systems: Priority Inversion: occurs when a low priority process is preempted while holding a lock needed by higher priority processes. Convoying: occurs when a process holding a lock is de-scheduled, perhaps by exhausting its scheduling quantum, by a page fault or by some other kind of interrupt. When such interrupt occurs, other processes capable of running may be unable to progress. What is lock convoy? Is it a performance issue? http://en.wikipedia.org/wiki/Lock_convoy Transaction: A transaction is a finite sequence of machine instructions, executed by a single process, satisfying the following properties: Serializability: Transactions appear to execute serially, meaning that the steps of one transaction never appear to be interleaved with the steps of another. Committed transactions are never observed by different processors to execute in different orders. Serializability is a form of "consistency semantics". Atomicity: each transaction makes a sequence of tentative changes to shared memory. When the transaction completes, it either commits, making its changes visible to other processes instantaneously or it aborts, causing its changes to be discarded. Instructions: It provides the following primitives instructions for accessing memory: Load Transactional (LT): reads the value of a shared memory location into a private register. Load Transactional Exclusive (LTX): reads the value of a shared memory location into a private register, hinting that the location is likely to be updated. Store Transactional (ST): tentatively writes a value from a private register to a shared memory location. Transaction Read Set: is the set of locations read by LT. Transaction Write Set: is the set of locations accessed by LTX and ST. Transaction Data Set: is the union of read set and write set. Commit: it attempts to make the transaction's tentative changes permanent. It succeeds only if no other transaction has updated any location in the transaction's data set and no other transaction has read any location in this transaction's write set. Abort: discards all updates to the write set. Validate: tests the current transaction status. It returns true when the current transaction has not aborted (although it may do so later). A false return value indicates that the current transaction has aborted and discards the transaction's tentative updates. What is the use of Validate instruction? Does it improve performance by reducing the number of checks (invariants checks) in the program? Intended Use: transactions are intended to replace short critical sections. Instead of acquiring a lock, executing the critical section, and releasing the lock, a process would typically do Use LT/LTX to read from a set of locations. Use Validate to check that the values read are consistent. Use ST to modify a set of locations and Use COMMIT to make changes permanent. If either the Validate/Commit fails, the process returns to step(1). What is an orphan transaction? Is it an aborted transaction? Notice that the hardware does not have enough information to be able to abort a transaction (for example, to which state should it revert? there was not begin-transaction. Also, what if it has made some changes to the global state?). Implementation: Transactional memory is implemented by modifying standard multiprocessor cache coherence protocol. It exploits access rights, which are usually connected with cache residence. In general, access may be non exclusive (shared) permitting reads, or exclusive, permitting writes. At any time a memory location is either not immediately accessible by any processor (i.e. in memory only), accessible non-exclusively by one or more processors, or accessible exclusively by exactly one processor. The basic idea is simple; any protocol capable of detecting accessibility conflicts can also detect transaction conflicts at no extra cost. Before a processor P can load the contents of a location, it must acquire non-exclusive access to that location. Before another processor Q can store to that location, it must acquire exclusive access, and must therefore detect and revoke P's access. This implementation aborts any transaction that tries to revoke access of a transactional entry from another active transaction. How timer interrupt does help abort a stalled instruction? Is it because interrupts on the processor aborts the active transaction? Notice that the transaction has a bound on size (L1 size) and time (timer interrupt frequency). Example Implementation: To minimize impact on processing non-transactional loads and stores, each processor maintains two caches, a regular cache for non-transactional operations and a transactional cache for transactional operations. Transactional Cache is a small, fully associative cache with additional logic to facilitate transaction commit and abort. Cache Line State: Each cache line has one of the states in Table 1. The cache augments these states with separate tags shown in Table 2. What tag is associated with the cache line that contains the original value read from the main memory? Is it XCOMMIT? What tag is associated with the cache line that gets updated by transactional operations? Is it XABORT? Busy Signal: Transactions requests can be refused by responding with a BUSY signal. Busy helps prevent transactions from aborting each other too much. When a transaction receives a BUSY response, it aborts and retires, preventing deadlock or continual mutual aborts. Simulations: Does an object of a word size require a lock for synchronization? A word size is the size of data bus. Transactional memory requires "fewer" memory accesses than lock-based programs because there is no lock. Effect seen in experimental results. Performance results are simulations of bus-based and network-based architectures. LL/SC is similar to transactions for single word. Producer/Consumer Benchmark: Code in Figure 2 and comparison in Figure 5. Doubly Linked List Benchmark: The locks are not appropriate for state dependent concurrency. The most important take-away from the experiments section is the code of the example programs.