SEDA paper
-----------

This is a 2001 paper, Internet had taken off but "highly scalable services" were relatively newer. Dynamic content interspersed with static. Peak load orders of magnitude greater than the average (replication wasteful).

Paper argues that processes and threads have high overhead in terms of context-switch time and memory footprint. "Transparent" resource virtualization prevents applications from making informed decisions.

SEDA Approach: largely event driven. In a way, two-tier event driven. Tier 1 is system-provided and deals with scheduling among stages. The user programs the stages. Tier 2 is handled by user and could be a mix of batching, multi-threading and event-driven (depending on user choice).

Advantage: simpler than ED; more transparent than MT, MP. App has visibility into queue lengths and could control each stage's behaviour based on that (dynamically).

Paper argues that app-level visibility and control allows the user to make her service "well-conditioned", i.e., on high load, throughput saturates (does not decrease) and response time increases linearly with number of clients.

Why is MT not well-conditioned? Scheduler arbitrarily decides (e.g. FCFS) who gets served and who gets queued (unfair). No sensitivity on the variable nature of different requests. If not implemented well, can cause MT overheads to dominate causing throughput degradation (scheduling overhead, memory overhead, cache and TLB misses).

Solution: resource containers. e.g., CGI processes will not take more than x% CPU. This is a valid technique and will solve a big part of the problem. But these container values will have to be decided a priori -- SEDA allows the App to adapt.

What will happen to Figure 2 if you use resource containers (e.g., bounded thread pool)? The throughput curve will start looking better (close to ideal). But the latency curve will have large fluctuations for different requests (unfair, ungraceful).

Problems with large number of threads:
- Large context-switch overhead
- Large memory footprint (cache, TLB misses even if not disk-bound)
- (Bigger issue) Poor scheduling choices can cause wasted work. Requests, on which effort has already been spent, may time out because scheduler unaware. Could potentially livelock.

Event-driven: what is a continuation? The throughput/latency curves look good, what is the problem?
- Programmer needs to ensure that event-processing threads will not block. hard in general!
- Programmer must perform careful scheduling/multiplexing of multiple events.
- Modularity is hard. Need to know if the module being called is non-blocking. What if the module becomes non-blocking in future. Need to trust other event handlers to not consume too many resources.

Structured event queues: e.g., Click packet router. similar to SEDA but app-specific

Stage: event handler, incoming event queue, thread pool. Each thread pulls a batch of requests. Some dynamic params: size of thread pool, batching size.

Compare SEDA with three other programming frameworks:
MT (Multithreaded): implicit communication, system-controlled scheduling
MP (Message Passing): explicit communication, system-controlled scheduling
ED (Event Driven): explicit communication, user-controlled scheduling

SEDA degenerates to MP when each stage is constrained to have one thread (no dynamic controllers, etc).
SEDA degenerates to MT when you have one stage implementing the entire logic and have multiple threads for them.
SEDA degenerates to ED when you have one stage with one thread implementing the event-loop.

Some examples of dynamic resource control in SEDA:
- Thread pool controller
- Batching controller
- Load shedding and backpressure (these can be statically implemented in MT and MP by using resource containers and in ED by writing a "smart" scheduler).

Aync IO layer:
Each asyncSocket stage services two separate event queues and toggles between them using timeout mechanism (different from other stages which service only one queue). readStage can do load shedding at ingress by introducing artificial delays in event processing loop. writeStage could be thresholded to prevent "slow" sockets from consuming too many resources.

Figure 11: