Scheduler thread pools

Scheduler and Thread Pools Deep Dive¶

Overview¶

Coroutine scheduling ultimately depends on thread pools. Understanding queueing and starvation is essential for production troubleshooting.

Core Concepts¶

dispatchers route work onto pools/executors
pool saturation increases queueing latency
starvation occurs when tasks cannot obtain execution time
blocking calls can consume all available workers

Internal Implementation¶

Dispatchers maintain task queues and worker handoff logic. Fairness is best-effort; pathological workloads can monopolize workers.

Typical starvation triggers:

long blocking calls on shared dispatcher
unbounded fan-out launching thousands of jobs
CPU-heavy loops without yielding/cooperative checks

Threading Model¶

Main thread has strict responsiveness requirements. Background pools balance throughput and latency under contention. Treat thread pools as finite resources, not infinite capacity.

Coroutine / Flow Behavior¶

Flow pipelines can overload pools when expensive operators run in parallel or multiple collectors duplicate heavy upstream work. Sharing and throttling are often more effective than adding more workers.

Code Examples¶

val ioLimited = Dispatchers.IO.limitedParallelism(16)

suspend fun loadBatch(ids: List<String>) = coroutineScope {
    ids.map { id ->
        async(ioLimited) { repository.fetch(id) }
    }.awaitAll()
}

Common Interview Questions¶

Q: What causes thread starvation in coroutine apps? A: Lead with correctness then throughput: choose dispatcher by workload type, keep critical sections small, cap parallelism, and monitor tail latency and queue depth.
Q: Why can IO dispatcher still saturate? A: Lead with correctness then throughput: choose dispatcher by workload type, keep critical sections small, cap parallelism, and monitor tail latency and queue depth.
Q: How do you identify scheduler bottlenecks? A: State load and SLO assumptions first, identify the first bottleneck, choose scaling and consistency strategy, and explain fallback behavior for partial failures.
Q: Should you create custom pools for every feature? A: Answer with correctness first and throughput second: cancellation model, dispatcher choice, bounded parallelism, and contention or latency measurements.

Production Considerations¶

monitor queue depth and task latency
isolate particularly expensive workloads
avoid blocking shared pools when possible
prefer bounded concurrency to brute-force parallelism

Performance Insights¶

Throughput tuning must consider tail latency and fairness. A slightly lower parallelism cap can improve p95/p99 by reducing contention.

Senior-Level Insights¶

Senior answers should include observability strategy: what signals indicate starvation, and what mitigation playbooks teams follow during incidents.