Duality paper
-------------

An OS has many actors operating on a shared set of resources. Important question is, how to organize things for performance and readability/maintainability.

complexity: devices, concurrency, lack of trust

Example: multiple subsystems: VM, FS, Disk driver

Procedure-oriented (Linux, BSD, aka monolithic): thread makes an FS call, or thread page faults thus implicitly making a VM call, and so on.

Message passing (Microkernel, eg. Mach, QWX, NT):
All processes have different address spaces
page 6: sync done using message queues, process stops modifying data before sending message (to prevent races), process per device for example, priorities
static contexts? address spaces in 1-to-1 correspondence with processes
but for efficiency, may need to do message passing through shared memory
SendMessage[messageChannel, messageBody]
AwaitReply[messageId] returns [messageBody]
waitForMessage[set of messagePort] returns [messageBody, messageId, messagePort]
SendReply[messageId, messageBody]

Why do you need sendReply and awaitReply, can't you emulate them using sendMessage and waitForMessage? No, because you can only send messages at pre-defined ports and the instruction following the sendMessage instruction may not one such port.

A port is a queue of a class of messages. A message is always destined to one channel. If a process waits on a port, it only receives packets destined for itself.

Question: how will you implement a multiple-consumer queue? Answer: Have one listener process that sends to multiple consumers.

process = local data + message ports + message channels
Preemption of processor occurs if a message arrives for a waiting process with higher priority

Procedure oriented
preemption could occur if a process releases a lock on which a higher priority thread was waiting.

Claims:
* Provide a map between MPP and Proc
* Performance is identical
  not true. Proc is typically faster than MPP on current architectures because of copying. MPP is faster on distributed systems (imagine replacing send/receive with call/ret on distributed system, VM handling will make things slower)
* It doesn't matter what you choose. Very easy to make such claims but have seldom stood the test of time. e.g., Java --- assembly

Implementing a semaphore:
Proc:
  int sem
  condition c
  Entry P()
    while (!sem) {
      wait(c);
    }
    sem--;

  Entry V()
    sem++
    signal(c)

MPP:
  begin
    i : message-id
    p : port-id
    s : set of port-ids
    initialize
      do forever
        [m, i, p] <-- wait for message[s]
        case p of
          port1 : if (!sem) {
                    s <- s - port1
                  } else {
                    sendReply(i);
                  }
                  break;
          port2 : sem++;
                  s <- s + port1
                  sendReply(i)
        endcase
      endloop
    end
           
Proc                               MPP
Monitor invariant                  Loop invariant
Global Data                        Global Data
return address                     msg ID
callsite                           channel
parameter, return val              fields in message
linking                            map channels to ports
involuntary pre-emption            same
Hoare semantics                    signal should just jump to the wait statement
                                   also, the priority of the signaling process should be decreased so that the signaled process continues first.
                                   this does not seem possible
Deadlock?                          same


Exceptions:
* Proc calls wait in the middle of a procedure. No counterpart in MPP. Paper argues this is a poor method of synchronization
* Proc uses non-mutual-exclusive synchronization (e.g. rw locks, rcu, cas, etc.)

seems like Proc is more "powerful" (you can do more here)
seems like MPP is easier to determinize (less non-determinism)

Possible to enforce language typesystems on both

Performance:
paper divides into three parts:
1. execution time of the programs themselves (obvious)
2. computational overhead of the primitive system ops they call (authors assert this is true)
3. queueing and waiting times (obvious if 1 and 2 are true)

Claims that sending a message and receiving it has the same overhead as fork/join. True. But the comparison fails for synchronous procedure calls, where Proc is much cheaper than MPP.

What if the module is not a monitor? MPP serializes unnecessarily. Could use threads (proc) inside an MPP process to parallelize