Cost optimization
Cost Optimization Deep Dive¶
Overview¶
This topic is central to system design interviews because it forces explicit choices about unit economics, storage tiering, and right-sizing.
Core Concepts¶
- Define success metrics before proposing infrastructure.
- Connect architecture choices to latency, reliability, and cost.
- Keep rollback and migration strategy visible in the design.
Internal Architecture¶
- Separate control plane concerns from request-serving paths.
- Isolate heavy or failure-prone processing behind async boundaries.
- Keep ownership boundaries aligned with team structure.
Data and Request Flow¶
- Document request flow from ingress to persistence.
- Identify where state is source-of-truth vs derived.
- Make retry, dedupe, and idempotency behavior explicit.
Scalability and Reliability¶
- Use bounded concurrency and backpressure at choke points.
- Add timeouts, retries with jitter, and fail-fast guards.
- Measure saturation signals and scale before user impact.
Code Examples¶
Client -> Gateway -> Service -> Cache/DB
-> Queue -> Workers
Common Interview Questions¶
- Q: How does this design fail under traffic spikes? A: State load and SLO assumptions first, identify the first bottleneck, choose scaling and consistency strategy, and explain fallback behavior for partial failures.
- Q: Which dependency is the primary bottleneck and why? A: State load and SLO assumptions first, identify the first bottleneck, choose scaling and consistency strategy, and explain fallback behavior for partial failures.
- Q: What is your mitigation strategy for partial outages? A: State load and SLO assumptions first, identify the first bottleneck, choose scaling and consistency strategy, and explain fallback behavior for partial failures.
Production Considerations¶
- Define SLOs and map alerts to user-visible impact.
- Add capacity tests before major launches.
- Keep data retention and compliance requirements explicit.
Tradeoffs¶
- Simpler architecture today vs flexibility tomorrow.
- Lower latency vs stronger consistency guarantees.
- Higher redundancy vs higher operating cost.
Senior-Level Insights¶
- Strong candidates justify why they rejected alternatives.
- Staff-level answers include phased rollout and safe rollback details.