AOSP¶

Explain Binder IPC transaction lifecycle from client stub to server thread¶

advanced aosp binder ipc internals

View Answer

Binder is Android's high-performance RPC mechanism that moves typed parcels across process boundaries with kernel arbitration.

In interviews, cover:

client proxy marshals args into Parcel and calls ioctl on /dev/binder
Binder driver routes transaction to target process work queue
target process Binder thread pool dequeues and dispatches to stub
reply travels back over same kernel-managed transaction context
latency cost comes from context switch, marshalling, and queue contention

Strong answer tip:

explain where tail latency appears under load: binder thread starvation, oversized parcels, and lock contention in server process code.

🚀 See Full Deep Dive

How do Binder thread pools affect system service throughput and latency¶

advanced aosp binder threading performance

View Answer

Binder thread pool sizing directly controls how many incoming IPC calls a service can process concurrently before queueing delay explodes.

In interviews, cover:

every process has binder worker threads plus caller thread handoff rules
undersized pool causes head-of-line blocking for unrelated requests
oversized pool can increase CPU contention and lock pressure
long-running calls should be offloaded from binder thread quickly
instrumentation should track queue depth, p95 call time, timeout rates

Strong answer tip:

recommend short binder handlers that validate, enqueue, and return, moving expensive work to dedicated executors.

🚀 See Full Deep Dive

What is a Binder death recipient and when should you use it¶

intermediate aosp binder reliability ipc

View Answer

A death recipient lets a client observe remote process death and clean up stale state instead of silently using a dead binder handle.

In interviews, cover:

register linkToDeath on remote binder interface
binderDied callback signals remote object is no longer valid
client must drop cached interface and trigger reconnection flow
avoid memory leaks by unlinking recipients on normal teardown
use idempotent recovery paths to survive rapid process restarts

Strong answer tip:

connect death handling to real user impact: frozen UI actions, orphan sessions, or stuck foreground features after service crash.

🚀 See Full Deep Dive

How do you debug slow Binder calls in production traces¶

advanced aosp binder perfetto debugging

View Answer

Slow binder diagnosis needs end-to-end trace correlation across caller, kernel binder events, and server-side critical sections.

In interviews, cover:

capture Perfetto with binder, sched, freq, and userspace slices enabled
identify long binder transaction slices and waiting states
separate marshalling overhead from server execution time
inspect lock contention and synchronous call chains in service process
validate fix with before/after p95 and p99 binder latency metrics

Strong answer tip:

call out anti-pattern: nested synchronous binder calls across services creating cascading tail latency and ANR risk.

🚀 See Full Deep Dive

Walk through ActivityManagerService process state transitions¶

advanced aosp ams lifecycle process-state

View Answer

AMS continuously recalculates process importance and OOM adjustment based on visible components, foreground work, and dependency chains.

In interviews, cover:

state classes: top, foreground service, visible, service, cached
transition triggers: activity visibility, bindings, broadcasts, jobs
OOM adj influences LMKD victim selection under memory pressure
cached processes improve warm start performance but are expendable
policy differs by API level for background execution restrictions

Strong answer tip:

connect process state to user-perceived behavior: startup speed, background reliability, and kill likelihood after app switch.

🚀 See Full Deep Dive

How does AMS decide what to kill under memory pressure¶

advanced aosp ams memory lmkd

View Answer

AMS and LMKD cooperate: AMS provides process importance signals; LMKD enforces pressure-based kills when reclaim is insufficient.

In interviews, cover:

AMS computes OOM score adjustment from component importance
LMKD monitors PSI and memory watermarks to trigger eviction
cached/background processes are preferred kill targets
kill decisions balance reclaim speed against restart churn
repeated kill loops indicate bad memory budget or background policy

Strong answer tip:

explain that reducing process RSS and startup cost together is better than only trying to avoid kills at all costs.

🚀 See Full Deep Dive

What are common AMS lifecycle race conditions and mitigations¶

senior aosp ams lifecycle concurrency

View Answer

Lifecycle races happen when component callbacks, async work, and process state changes interleave in ways app code did not model safely.

In interviews, cover:

stale callback updating destroyed Activity or Fragment view
service bind/unbind timing races around configuration change
broadcast receiver work running after process priority dropped
mitigation with lifecycle-aware scopes and state guards
idempotent cleanup paths to handle duplicate teardown events

Strong answer tip:

use concrete example: network callback arriving after onStop and triggering illegal UI access or leaked window exception.

🚀 See Full Deep Dive

How do background execution limits change AMS behavior across API levels¶

senior aosp ams background policy

View Answer

Android tightened background policy over releases, shifting work from free background services toward explicit scheduled or foreground paths.

In interviews, cover:

API 26 service limits and broadcast restrictions
JobScheduler and WorkManager as policy-compliant execution paths
foreground service requirements and user-visible notifications
app standby buckets and quota effects on deferred work
compatibility strategy for mixed minSdk and targetSdk population

Strong answer tip:

tie policy changes to battery goals and abuse prevention, not just platform limitation framing.

🚀 See Full Deep Dive

Explain frame production pipeline from app render to SurfaceFlinger composition¶

advanced aosp rendering surfaceflinger graphics

View Answer

Android rendering is a producer-consumer chain where app threads produce buffers and SurfaceFlinger composes them under vsync deadlines.

In interviews, cover:

app UI thread records display list and RenderThread submits GPU work
buffers queued via BufferQueue from producer to consumer side
SurfaceFlinger latches latest ready buffers per layer
Hardware Composer composes layers and presents to display
misses in any stage create jank or frame drop at display boundary

Strong answer tip:

describe deadline math: 16.6 ms at 60 Hz, 8.3 ms at 120 Hz, with tighter budget at higher refresh rates.

🚀 See Full Deep Dive

How do WindowManager and InputDispatcher coordinate focus and input routing¶

advanced aosp windowmanager input focus

View Answer

Input routing depends on trusted window focus and z-order policy managed by WindowManager and consumed by InputDispatcher.

In interviews, cover:

focused window token determines primary key event target
touch routing respects hit-test, obscured windows, split focus rules
policy checks prevent taps into blocked or secure overlay scenarios
ANR in input target can trigger dispatcher timeout diagnostics
window transitions must keep focus updates atomic to avoid ghost input

Strong answer tip:

mention security angle: tapjacking defenses and obscured-window checks are part of routing policy, not only UI behavior.

🚀 See Full Deep Dive

What causes jank in SurfaceFlinger pipeline and how do you isolate it¶

advanced aosp jank graphics perfetto

View Answer

Jank root cause can be app-side, GPU-side, compositor-side, or scheduler interference; trace correlation is required before optimizing.

In interviews, cover:

app misses: expensive layout, overdraw, sync binder waits
GPU misses: shader cost, texture upload spikes, driver stalls
compositor misses: late buffer latch, composition complexity
scheduler effects: CPU throttling, frequency scaling, thread preemption
use FrameTimeline and Perfetto to attribute missed deadlines accurately

Strong answer tip:

avoid vague answer; name at least one metric per layer (UI frame time, GPU completion, SF present delay).

🚀 See Full Deep Dive

How do buffer queue and triple buffering trade latency for smoothness¶

advanced aosp graphics bufferqueue latency

View Answer

Additional in-flight buffers reduce producer blocking and smooth frame delivery, but increase end-to-end input-to-photon latency.

In interviews, cover:

double buffering minimizes latency but risks producer stalls
triple buffering increases tolerance to jitter under variable workload
deep queueing can hide jitter while making interaction feel delayed
ideal setting depends on refresh rate, workload burstiness, UX goals
validate with frame pacing plus touch latency measurements

Strong answer tip:

explain this as an explicit product tradeoff, not a universal setting.

🚀 See Full Deep Dive

Why does Android use Zygote and copy-on-write process forking¶

advanced aosp zygote startup memory

View Answer

Zygote preloads common classes and resources once, then forks app processes so they share clean pages via copy-on-write.

In interviews, cover:

preload phase amortizes class/resource initialization cost
fork creates child quickly with shared read-only memory pages
dirty pages become private when app mutates state
startup gains come from less initialization and fewer page faults
wrong preload choices can bloat baseline memory for all apps

Strong answer tip:

connect COW behavior to real memory KPIs: PSS growth and page dirtying.

🚀 See Full Deep Dive

Explain ART startup path and dex optimization modes in modern Android¶

advanced aosp art startup compiler

View Answer

ART combines install-time and runtime compilation to balance install cost, startup speed, and steady-state throughput.

In interviews, cover:

baseline profile driven compilation for hot startup paths
JIT for dynamic hotspots during normal execution
AOT artifacts used where profile confidence is high
dex2oat mode and profile quality impact startup variance
stale or missing profiles regress first-run and cold start metrics

Strong answer tip:

mention operational loop: generate profile, ship, measure startup, refresh profile after major code-path changes.

🚀 See Full Deep Dive

What startup costs are hidden in class loading and static initialization¶

senior aosp art startup classloading

View Answer

Static initialization often hides synchronous work that looks harmless in code review but blocks critical startup path.

In interviews, cover:

class verifier and linker work triggered by first touch
static initializers doing I/O or heavy object graph construction
dependency chains that pull many classes into startup path
lazy initialization or deferral after first frame where acceptable
startup tracing to map class load events to frame misses

Strong answer tip:

give example of a singleton init removed from Application and moved to lazy path, with measured cold start improvement.

🚀 See Full Deep Dive

How do Baseline Profiles interact with ART and app startup performance¶

intermediate aosp art baseline-profile performance

View Answer

Baseline Profiles tell ART which methods to precompile for startup and critical interactions, reducing JIT warmup penalties.

In interviews, cover:

profile captures method hot paths from representative journeys
install-time compilation uses profile to pre-optimize selected methods
improves cold start and first-use latency after install/update
poor coverage leaves startup paths interpreted or JIT-compiled
must be regenerated when navigation and hot paths evolve

Strong answer tip:

discuss governance: benchmark gate in CI to prevent profile regressions.

🚀 See Full Deep Dive

Walk through Android boot flow from bootloader to launcher ready¶

advanced aosp boot init system-server

View Answer

Boot flow is a staged chain where each phase establishes trust and starts the next runtime layer until framework services can launch home.

In interviews, cover:

bootloader verifies and loads kernel plus ramdisk
init parses rc scripts, mounts filesystems, starts core daemons
zygote and system_server start framework service graph
package and activity services prepare app/runtime state
launcher intent starts once core readiness conditions are satisfied

Strong answer tip:

identify where boot time is usually spent: I/O init, service startup, and package scanning overhead.

🚀 See Full Deep Dive

What is system_server and why is it the most critical process¶

advanced aosp system-server services reliability

View Answer

system_server hosts core framework services; instability here cascades across the entire user experience and can force device restart loops.

In interviews, cover:

contains ActivityManager, PackageManager, WindowManager, etc.
services communicate heavily over binder with app processes
crash in critical service can trigger watchdog recovery actions
strict threading and watchdog boundaries prevent global stalls
service startup order and dependencies affect boot reliability

Strong answer tip:

explain why service teams keep binder handlers short and avoid blocking operations in main/system threads.

🚀 See Full Deep Dive

How do init rc scripts influence security and boot reliability¶

senior aosp init boot security

View Answer

init rc scripts define service startup, permissions, and mount behavior, making them a high-leverage reliability and security control point.

In interviews, cover:

service class and trigger conditions control startup ordering
wrong permissions or context labels can break service bring-up
restart policies can hide flapping failures or amplify boot loops
property-triggered actions must avoid unsafe race-prone sequencing
rc audits should include least privilege and deterministic ordering

Strong answer tip:

mention validating init changes with boot-time trace and failure-inject tests before broad rollout.

🚀 See Full Deep Dive

How do watchdog mechanisms protect Android from system service hangs¶

senior aosp watchdog reliability system-server

View Answer

Watchdog monitors critical threads and service responsiveness; if progress stalls beyond thresholds, it triggers diagnostics and recovery actions.

In interviews, cover:

monitored loopers and handler checkpoints in core processes
timeout policy distinguishes transient load from hard deadlock
capture traces/tombstones before restart to preserve root-cause data
avoid false positives via bounded work and asynchronous design
recurring watchdog resets indicate architecture or lock-order defects

Strong answer tip:

connect watchdog events to postmortem quality: restart alone is not success unless diagnostics explain recurrence drivers.

🚀 See Full Deep Dive

Explain Android sandbox model and SELinux role in defense in depth¶

advanced aosp security selinux sandbox

View Answer

Android security layers isolate apps with UID sandboxing, permission mediation, and SELinux mandatory access control on top of DAC.

In interviews, cover:

per-app UID/process isolation limits direct data access
binder and permission checks gate privileged capabilities
SELinux policy constrains even privileged process behavior
denials can block exploit chains that pass app-level checks
security posture depends on policy quality and update hygiene

Strong answer tip:

clarify that SELinux is not optional hardening; it is core runtime policy enforcement in production Android builds.

🚀 See Full Deep Dive

How do you debug SELinux denials without weakening policy¶

senior aosp selinux security debugging

View Answer

Denial debugging should identify minimal required allow rules while preserving least privilege and preventing policy drift.

In interviews, cover:

collect avc logs and map source-target class permissions
verify labeling and domain transition correctness first
prefer fixing context/type assignment over broad allow rule
test policy in realistic scenarios and regression suites
reject permissive shortcuts in release builds

Strong answer tip:

explain review process: security sign-off for policy deltas with threat rationale and rollback plan.

🚀 See Full Deep Dive

What are common privilege escalation paths in Android service architecture¶

advanced aosp security privileges services

View Answer

Escalation paths usually exploit trust boundary mistakes between app code, binder interfaces, and privileged service operations.

In interviews, cover:

missing caller identity validation in binder service methods
confused deputy flows where privileged service executes untrusted intent
exported component abuse forwarding into privileged code path
insufficient input validation on file, URI, or command parameters
mitigate with identity checks, allowlists, and capability minimization

Strong answer tip:

give one concrete guard pattern: enforceCallingPermission plus package signature verification before dangerous operations.

🚀 See Full Deep Dive

How do runtime permissions map to framework and kernel enforcement¶

intermediate aosp permissions security framework

View Answer

Runtime permission grant state is tracked in framework policy, then enforced at API entry points before lower-level operations execute.

In interviews, cover:

package manager stores grant state per UID and user profile
framework APIs check permission state on privileged operations
app ops can add finer-grained runtime policy gates
kernel/SELinux still enforce final access constraints independently
revocation and one-time grants require robust app fallback behavior

Strong answer tip:

distinguish user-consent policy (framework layer) from capability enforcement primitives (kernel and SELinux layers).

🚀 See Full Deep Dive

How does ART garbage collection interact with UI jank and latency¶

advanced aosp art gc performance

View Answer

GC pauses, concurrent marking work, and allocator behavior can all affect frame timing, especially when allocation rate spikes on UI paths.

In interviews, cover:

stop-the-world pause windows still exist despite concurrent collectors
allocation churn in UI path increases GC frequency and pause risk
large object and bitmap patterns stress heap fragmentation
tune by reducing allocations, pooling wisely, deferring heavy work
verify with frame metrics plus GC event correlation in traces

Strong answer tip:

focus on prevention via allocation discipline, not collector tuning only.

🚀 See Full Deep Dive

Explain Linux CFS scheduling effects on Android thread priorities¶

advanced aosp scheduler threads linux

View Answer

Android relies on CFS plus cgroups and priority hints to allocate CPU fairly while protecting interactive responsiveness.

In interviews, cover:

runnable threads compete by virtual runtime under CFS
priority and cgroup class influence CPU share and latency
foreground UI/render threads need predictable scheduling budget
background compute can starve interaction if priorities are misused
scheduler tuning must be validated with end-user latency metrics

Strong answer tip:

warn against blindly boosting priorities; it can shift starvation to other critical threads and increase thermal throttling risk.

🚀 See Full Deep Dive

What thread priority anti-patterns commonly break Android performance¶

senior aosp threads scheduler performance

View Answer

Priority abuse can mask local latency while damaging global system health, causing starvation, lock contention, and thermal instability.

In interviews, cover:

over-prioritizing background workers near UI priority classes
long CPU bursts on inherited high-priority threads
binder handler doing blocking I/O at elevated priority
lock inversion between high and low priority threads
use bounded executors, priority discipline, and trace validation

Strong answer tip:

present a policy: only latency-critical paths get elevated priority, with explicit SLO and rollback criteria.

🚀 See Full Deep Dive

How do you measure and reduce context-switch overhead in heavy IPC flows¶

advanced aosp binder scheduler performance

View Answer

IPC-heavy architectures can spend meaningful time in scheduling and wakeup overhead rather than business logic.

In interviews, cover:

quantify voluntary and involuntary context switches per request
identify synchronous call chains crossing many processes
collapse unnecessary hops or batch operations to reduce transitions
co-locate tightly coupled services where practical
verify gains with CPU time, tail latency, and energy impact metrics

Strong answer tip:

frame optimization as architecture-level change, not micro-tuning only.

🚀 See Full Deep Dive

Explain Doze internals and maintenance windows for deferred work¶

advanced aosp power doze jobs

View Answer

Doze reduces idle battery drain by deferring network and wake activity, allowing periodic maintenance windows for batched background work.

In interviews, cover:

trigger conditions and progressive idle state deepening
maintenance window cadence and work batching semantics
high-priority FCM and alarms with explicit policy exceptions
reliability strategy for tasks delayed by long idle periods
battery-user trust tradeoff when requesting exemptions

Strong answer tip:

recommend designing for eventual completion, not exact timing, unless feature is truly user-critical and policy-eligible for exemption.

🚀 See Full Deep Dive

How do App Standby Buckets and quotas impact background job reliability¶

senior aosp power standby-buckets workmanager

View Answer

Standby bucket classification controls execution quotas, alarms, and network access frequency for less-used apps.

In interviews, cover:

bucket states from active to restricted with tighter quotas
job and alarm throttling behavior by bucket assignment
user interaction can promote bucket and relax constraints
WorkManager should be used with resilient retry/backoff policies
observability needs bucket-aware failure analysis in production

Strong answer tip:

show how product teams set realistic freshness expectations given quota policy instead of assuming near-real-time background execution.

🚀 See Full Deep Dive

WorkManager vs JobScheduler vs ForegroundService at framework level¶

advanced aosp workmanager jobscheduler foreground-service

View Answer

These APIs target different execution guarantees, visibility requirements, and policy constraints under modern Android background limits.

In interviews, cover:

WorkManager for durable deferrable work with constraint awareness
JobScheduler as underlying scheduler on modern API levels
ForegroundService for user-visible ongoing work requiring immediacy
misuse of foreground service harms UX and policy compliance
choose by SLA, user visibility, and platform policy eligibility

Strong answer tip:

answer with a decision matrix: immediacy, durability, user awareness, and tolerance for delayed execution.

🚀 See Full Deep Dive

How do wakelock anti-patterns cause battery and thermal regressions¶

senior aosp power wakelock battery

View Answer

Wakelocks prevent sleep transitions; incorrect acquisition or release can drain battery quickly and increase thermal throttling risk.

In interviews, cover:

partial wakelock held across slow network or retry loops
missing timeout or release path in failure branches
chaining wakelock with frequent alarms compounds drain
use WorkManager constraints and opportunistic batching instead
validate with batterystats and thermal event correlation

Strong answer tip:

describe guardrails: strict timeout defaults, ownership logging, and automated tests that assert release behavior on all code paths.

🚀 See Full Deep Dive