Safety Car: Coroutine Exception Handling in KMP

KMP Bits Cover

When a car spins into the gravel, the safety car comes out. Its whole job is containment. One incident happened in one corner, and without intervention the cars behind would pile into it at full speed. The safety car gathers the field, slows everything to a controlled pace, and lets the marshals clear the wreck. The race keeps going. One crash stays one crash.

Coroutines need the same thing. One launch block throws, and if nothing contains it, the failure climbs the job hierarchy, cancels every sibling, and on a bad day takes the whole app down with it. The question isn’t whether a coroutine will throw. It’s what happens to everything around it when it does.

Two coroutines, two completely different failure modes

Before any of the containment tools make sense, you have to know that launch and async fail differently. This trips up people who assume a coroutine is a coroutine.

launch throws eagerly. The moment the exception fires, it propagates up through the job hierarchy. Nobody has to be listening.

scope.launch {
    throw IllegalStateException("boom")
    // propagates immediately, cancels the parent job
}

async holds the exception until you call await. The Deferred carries the failure with it, and the throw only surfaces when you ask for the result.

val deferred = scope.async {
    throw IllegalStateException("boom")
}
// nothing has thrown yet
deferred.await() // the exception surfaces here

That difference matters for where you put your try/catch. Around an async, the catch goes around await, not around the async block. I have watched people wrap the async itself, wonder why nothing was caught, and conclude that coroutines are broken. They aren’t. The exception was just waiting at await the whole time.

The hierarchy is the part the difficult part

Here’s the behaviour that surprises people coming from callbacks or plain threads. By default, a child coroutine that fails cancels its parent, and the parent then cancels all of its other children. Structured concurrency treats a failure as a reason to tear down the whole scope.

val scope = CoroutineScope(Job() + Dispatchers.Default)

scope.launch { loadProfile() }   // healthy
scope.launch { loadOrders() }    // throws
scope.launch { loadSettings() }  // gets cancelled too

If loadOrders() throws, the regular Job propagates that upward, the scope is cancelled, and the two healthy coroutines are taken out along with it. For a screen that loads three independent sections, that’s the wrong outcome. One failed section shouldn’t blank the other two.

This is exactly the pile-up the safety car prevents. You want the incident contained to its own corner.

SupervisorJob and supervisorScope: containment for siblings

The fix is to change the parent so that a child’s failure doesn’t bring down its siblings. There are two ways to get there, and picking the right one is mostly about scope lifetime.

For a long-lived scope, like one tied to a screen or a ViewModel, build it with a SupervisorJob:

val scope = CoroutineScope(SupervisorJob() + Dispatchers.Main)

scope.launch { loadProfile() }   // survives
scope.launch { loadOrders() }    // throws, fails alone
scope.launch { loadSettings() }  // survives

With a SupervisorJob, failure flows down but not sideways. A child can fail without dragging its siblings into the gravel with it.

For a contained block of work where you want the same isolation only inside that block, use supervisorScope:

suspend fun loadDashboard() = supervisorScope {
    launch { loadProfile() }
    launch { loadOrders() }   // can fail independently
    launch { loadSettings() }
}

One thing that catches people: a plain SupervisorJob only isolates the direct children of that scope. If a direct child launches its own children with a regular Job, those grandchildren still follow the normal cancel-the-siblings rule among themselves. Supervision isn’t inherited all the way down. It applies at the level where you put it.

CoroutineExceptionHandler is the marshal, not the airbag

Supervision keeps a failure from spreading. It does nothing about the exception itself, which is still uncaught and still has to go somewhere. That somewhere is CoroutineExceptionHandler.

Think of it as the marshal at the crash site. It doesn’t prevent the crash and it doesn’t run your retry logic. It’s the last station the exception passes through before it would otherwise become a hard app crash, and it gives you one place to log it or report it.

val handler = CoroutineExceptionHandler { _, throwable ->
    log.error("Uncaught in coroutine", throwable)
    crashReporter.record(throwable)
}

val scope = CoroutineScope(SupervisorJob() + Dispatchers.Main + handler)

scope.launch {
    throw IllegalStateException("boom") // routed to handler
}

Two rules decide whether it actually fires, and both catch people out:

It only works with launch, never async. An async keeps its exception in the Deferred, so the handler is bypassed completely. You handle that one at await.

It only works when installed on the scope or on a root coroutine. Putting a handler on a nested launch inside another launch does nothing, because the inner coroutine isn’t a root. The exception propagates up to the root first, and only the root’s context handler runs.

So my mental model is: try/catch for the failures I expect and want to recover from, CoroutineExceptionHandler as the safety net for the ones I didn’t. The handler is not a substitute for handling errors you can actually do something about.

The one exception you must never swallow

CancellationException is how coroutines signal that they’ve been cancelled. It’s not an error. It’s the normal mechanism for stopping work cleanly, and the machinery relies on it propagating.

The trap is the broad catch:

try {
    doNetworkCall()
} catch (e: Exception) {       // also swallows CancellationException
    showError()
}

When you catch Exception, you also catch CancellationException, and now a coroutine that was supposed to stop keeps running as if nothing happened. On a screen the user already left, that means work continuing against a scope that’s meant to be dead. The fix is to let cancellation through:

try {
    doNetworkCall()
} catch (e: CancellationException) {
    throw e                    // never swallow this
} catch (e: Exception) {
    showError()
}

This bites harder in KMP than in Android-only code, because the lifecycle events that trigger cancellation arrive through different platform paths on Android and iOS. If your shared code swallows cancellation, you’ve created a leak that behaves differently on each platform and is miserable to track down.

What this looks like in a ViewModel

Putting the pieces together, here’s the shape I reach for in a shared ViewModel. The scope is supervised so one failed load doesn’t kill the others, there’s a handler as the last-resort net, and the expected failures are caught explicitly and turned into UI state.

class DashboardViewModel(
    private val repo: DashboardRepository,
) : ViewModel() {

    private val handler = CoroutineExceptionHandler { _, t ->
        crashReporter.record(t)
    }

    private val scope =
        viewModelScope + SupervisorJob() + handler

    private val _state = MutableStateFlow<UiState>(UiState.Loading)
    val state: StateFlow<UiState> = _state.asStateFlow()

    fun load() {
        scope.launch {
            try {
                _state.value = UiState.Success(repo.fetch())
            } catch (e: CancellationException) {
                throw e
            } catch (e: Exception) {
                _state.value = UiState.Error(e.message)
            }
        }
    }
}

Expected failures become an Error state the UI can render. Cancellation passes straight through. Anything I genuinely didn’t anticipate still reaches the handler instead of crashing silently. Three layers, each with a clear job.

The line I hold

Coroutine exceptions aren’t an edge case you bolt on at the end. They’re part of the design of any KMP app that does real async work, which is all of them. The safety car exists because incidents are a certainty, not a possibility, and the smart teams plan for the cleanup before the lights go out.

Pick the scope deliberately so failures stay contained. Keep one handler as the net. Catch what you can recover from, and never, ever swallow cancellation. Do that in commonMain, and the crash that would behave one way on Android and another on iOS behaves the same way on both: handled.

The safety car is always on standby. The only question is whether you build the boundary before the incident or scramble for it after the app has already gone down. 🏁

The KMP Bits app is available on App Store and Google Play — built entirely with KMP.