Scheduler thread pools
Scheduler and Thread Pools Deep Dive¶
Overview¶
Coroutine scheduling ultimately depends on thread pools. Understanding queueing and starvation is essential for production troubleshooting.
Core Concepts¶
- dispatchers route work onto pools/executors
- pool saturation increases queueing latency
- starvation occurs when tasks cannot obtain execution time
- blocking calls can consume all available workers
Internal Implementation¶
Dispatchers maintain task queues and worker handoff logic. Fairness is best-effort; pathological workloads can monopolize workers.
Typical starvation triggers:
- long blocking calls on shared dispatcher
- unbounded fan-out launching thousands of jobs
- CPU-heavy loops without yielding/cooperative checks
Threading Model¶
Main thread has strict responsiveness requirements. Background pools balance throughput and latency under contention. Treat thread pools as finite resources, not infinite capacity.
Coroutine / Flow Behavior¶
Flow pipelines can overload pools when expensive operators run in parallel or multiple collectors duplicate heavy upstream work. Sharing and throttling are often more effective than adding more workers.
Code Examples¶
val ioLimited = Dispatchers.IO.limitedParallelism(16)
suspend fun loadBatch(ids: List<String>) = coroutineScope {
ids.map { id ->
async(ioLimited) { repository.fetch(id) }
}.awaitAll()
}
Common Interview Questions¶
- Q: What causes thread starvation in coroutine apps? A: Lead with correctness then throughput: choose dispatcher by workload type, keep critical sections small, cap parallelism, and monitor tail latency and queue depth.
- Q: Why can IO dispatcher still saturate? A: Lead with correctness then throughput: choose dispatcher by workload type, keep critical sections small, cap parallelism, and monitor tail latency and queue depth.
- Q: How do you identify scheduler bottlenecks? A: State load and SLO assumptions first, identify the first bottleneck, choose scaling and consistency strategy, and explain fallback behavior for partial failures.
- Q: Should you create custom pools for every feature? A: Answer with correctness first and throughput second: cancellation model, dispatcher choice, bounded parallelism, and contention or latency measurements.
Production Considerations¶
- monitor queue depth and task latency
- isolate particularly expensive workloads
- avoid blocking shared pools when possible
- prefer bounded concurrency to brute-force parallelism
Performance Insights¶
Throughput tuning must consider tail latency and fairness. A slightly lower parallelism cap can improve p95/p99 by reducing contention.
Senior-Level Insights¶
Senior answers should include observability strategy: what signals indicate starvation, and what mitigation playbooks teams follow during incidents.