Reliable Async Tests: The Problem

Episode #238 • Jun 19, 2023 • Free Episode

While Swift provides wonderful tools for writing async code, there are gaps in its tools for testing it. Let’s explore the tools it does provide to show where they succeed, and where they fall short.

This episode is free for everyone.

Subscribe to Point-Free

Access all past and future episodes when you become a subscriber.

See plans and pricing

Already a subscriber? Log in

Introduction

Brandon: Many, many months ago, you and I started a seemingly innocent post on the Swift forums asking others how they have been approaching testing asynchronous code in Swift in a reliable manner. Ever since Swift 5.5 we have had wonderful tools for writing async code in a nice, succinct syntax, and there were even a few basic testing tools provided from the beginning, but we found that those tools don’t cover most situations we encounter in everyday code.

Well, the forum post unfortunately didn’t get very far. There were a number of people that also expressed the difficulties they have had in testing async code, but by the end there were no actionable steps that could be taken today to improve the situation.

Stephen: We struggled with this for months. We had hundreds of CI test failures for things that weren’t true test failures, but instead were only due to the inability to predict how the runtime would process async code. Often we could insert little sleeps and yields to forcibly push things forward, but it was very annoying and ultimately slowed us down quite a bit managing it.

Also, WWDC 2023 has come and gone, and as far as we can tell no new tools were introduced for testing async code, and we are in the same boat as we have been in since the introduction of concurrency to Swift.

Brandon: Well, we finally think we have a pretty nice solution to the problem that can be used today, though we still think there is a lot of room for improvement in the Swift language itself.

So, we are kicking off a new series to explore what it takes to reliably test async code in Swift. We will show what are the tools that Swift gives us today, show the problems with those tools, and then see what it takes to fix the problems ourselves. Along the way we are going explore some very advanced topics of Swift’s runtime, including dynamically loading some ABI-stable yet not easily accessibly runtime hooks for completely altering the manner in which Swift enqueues asynchronous work.

So, let’s get started!

Async test cases

We are going to build a little toy application step-by-step, and we are going to write tests each step of the way to see what tools Swift and Xcode give us today for testing asynchronous code. We have a fresh project already created, and we’re going to do something quite dangerous by using the Xcode 15 beta for this episode. Don’t worry though, the sample code will still open in Xcode 14.3.

Now, before even writing any application code to test, we are going to hop over to the default test file.

In this file we have the very basics of some scaffolding for a test:

import XCTest
@testable import ReliablyTestingAsync

final class ReliablyTestingAsyncTests: XCTestCase {
  func testBasics() {
  }
}

The primary tool, and in some sense the only tool, that Swift gives us for testing asynchronous code is the fact that we can mark any test method as async:

func testBasics() async {
}

With that done we can now perform any asynchronous code we want inside.

Let’s just get a very quick and silly assertion in here just so that we can truly see that async work is possible in here. I’m going to perform a sleep for 1 second, but capture the current date before and after, and then assert that those two dates are roughly 1 second apart:

func testBasics() async throws {
  let start = Date()
  try await Task.sleep(for: .seconds(1))
  let end = Date()
  XCTAssertEqual(end.timeIntervalSince(start), 1, accuracy: 0.1)
}

This test passes, and it takes about 1 second to run:

Test Suite 'ReliablyTestingAsyncTests' passed.
Executed 1 test, with 0 failures (0 unexpected) in 1.059 (1.059) seconds

So, we indeed can perform async work in here and then perform assertions.

So, sounds great in theory! But let’s play around with some abstract units of async work and see just how easy it is to predict the manner in which they execute. For example, consider spinning up an unstructured task and printing inside the task and right after the task:

func testTaskStart() async {
  let task = Task {
    print(#line, { Thread.current }())
  }
  print(#line, { Thread.current }())
  await task.value
}

What order do you think the lines will be printed?

Let’s run to find out:

16 <NSThread: 0x600000d2ad40>{number = 4, name = (null)}
14 <NSThread: 0x600000d354c0>{number = 7, name = (null)}

It seems like the line after creating the task is executed before the line inside the task. We can even run it a bunch of times to see that. It also seems like each time we do this we get a different thread.

Is that always the case?

It’s hard to say, but to give us a little more confidence let’s get some hard data. Let’s turn this test into something we can actually assert on. We can have a little mutable collection of data that we append to and assert on:

func testTaskStart() async {
  var values: [Int] = []
  let task = Task {
    values.append(1)  // 🛑
  }
  values.append(2)
  await task.value
  XCTAssertEqual(values, [2, 1])
}

🛑 Mutation of captured var ‘values’ in concurrently-executing code

This does not compile because we are capturing mutable state in a @Sendable closure, which means that closure can be used in a concurrent fashion, and hence it is not safe to capture mutable state.

We can make this safe by wrapping values in something that properly isolates the data. We can do this by having a class that holds onto a value and a lock, and the only way to access the value is via controlled methods that make sure access is locked.

We’ll paste it into a new file, LockIsolated.swift, and you may recognize this type because we ship it in our dependencies library in order to aid in testing async code:

import Foundation

public final class LockIsolated<Value>:
@unchecked Sendable {
  private var _value: Value
  private let lock = NSRecursiveLock()
  public init(
    _ value: @autoclosure @Sendable () throws -> Value
  ) rethrows {
    self._value = try value()
  }
  public func withValue<T: Sendable>(
    _ operation: (inout Value) throws -> T
  ) rethrows -> T {
    try self.lock.withLock {
      var value = self._value
      defer { self._value = value }
      return try operation(&value)
    }
  }
}

extension LockIsolated where Value: Sendable {
  public var value: Value {
    self.lock.withLock {
      self._value
    }
  }
}

With that helper we can now isolate the mutable array to make it safe for accessing in the task:

func testTaskStart() async {
  let values = LockIsolated<[Int]>([])
  let task = Task {
    values.withValue { $0.append(1) }
  }
  values.withValue { $0.append(2) }
  await task.value
  XCTAssertEqual(values.value, [2, 1])
}

It does seem to pass when we run it, but to be really sure let’s run it a whole bunch of times. That way if there is some flakiness or non-determinism, it will hopefully show itself by running it many times.

Let’s try 10,000 times:

Test Suite 'Selected tests' failed.
Executed 10000 tests, with 26 failures (0 unexpected) in 11.520 (19.134) seconds

Well, it does seem like it can fail every once in awhile. Just half a percent of the time. So it is possible for the code in an unstructured task to start executing before the line that comes right after it.

It is worth pointing out that prior to Xcode 14.3 and iOS 16.4 this did not seem to be the case. In older versions of Xcode and iOS we were able to run this test 10s of thousands of times and it seemingly always passed. We of course had no such guarantees, but there was some evidence for it.

If we make the test method @MainActor, then it does seem to pass consistently:

@MainActor
func testTaskStart() async {
  let values = LockIsolated<[Int]>([])
  let task = Task {
    values.withValue { $0.append(1) }
  }
  values.withValue { $0.append(2) }
  await task.value
  XCTAssertEqual(values.value, [2, 1])
}

Running this 10,000 times passes:

Test Suite 'Selected tests' passed.
Executed 10000 tests, with 0 failures (0 unexpected) in 13.339 (16.173) seconds

And it appears to pass consistently because unstructured tasks inherit the actor of the asynchronous context they’re spawned from, so it has no choice but to run after the current context suspends, because otherwise the Task initializer would have to be blocking, which would be a very strange design decision for the Swift concurrency runtime, whose very goal is to provide nonblocking async code.

So already understanding how async code is scheduled and executed seems quite complicated.

What about the order in which multiple tasks start? Is that deterministic? We can write a test that tries to determine that:

func testTaskStartOrder() async {
  let values = LockIsolated<[Int]>([])
  let task1 = Task { values.withValue { $0.append(1) } }
  let task2 = Task { values.withValue { $0.append(2) } }
  _ = await (task1.value, task2.value)
  XCTAssertEqual(values.value, [1, 2])
}

If we run this, right off the bat it fails. On a second run it succeeds. And if we run it 1,000 times we will see that about 1% of the time:

Test Suite 'Selected tests' failed.
Executed 1000 tests, with 8 failures (0 unexpected) in 1.736 (2.053) seconds

So the order tasks start executing seems to often be the order they are created, but that’s not guaranteed.

Further, even making this test method @MainActor does not make these tasks start in a consistent order:

@MainActor
func testTaskStartOrder() async {
  let values = LockIsolated<[Int]>([])
  let task1 = Task { values.withValue { $0.append(1) } }
  let task2 = Task { values.withValue { $0.append(2) } }
  _ = await (task1.value, task2.value)
  XCTAssertEqual(values.value, [1, 2])
}

This still fails, and with a higher failure rate.

The reason for this is that unstructured tasks are actually always started on the global executor and then hop to the actor they should be executed on. In this case that means hopping to the @MainActor since the whole test method is marked as @MainActor. And it’s because of this interplay of the global executor that we are seeing inconsistent ordering.

And the same is true of task groups too:

func testTaskGroupStartOrder() async {
  let values = await withTaskGroup(
    of: [Int].self
  ) { group in
    group.addTask { [1] }
    group.addTask { [2] }
    return await group.reduce(into: [], +=)
  }
  XCTAssertEqual(values, [1, 2])
}

It passes with a single run, but if we run this 1,000 times we see again it fails roughly 17% of the time:

Test Suite 'Selected tests' failed.
Executed 1000 tests, with 178 failures (0 unexpected) in 2.166 (3.127) seconds

However, these small percentages do compound if you need to depend on many tasks starting in a particular order, like say 100 tasks:

func testTaskGroupStartOrder() async {
  let values = await withTaskGroup(
    of: [Int].self
  ) { group in
    for n in 1...100 {
      group.addTask { [n] }
    }
    return await group.reduce(into: [], +=)
  }
  XCTAssertEqual(values, Array(1...100))
}

Running this 1,000 times shows that now it fails over 90% of the time:

Test Suite 'Selected tests' failed.
Executed 1000 tests, with 935 failures (0 unexpected) in 8.988 (9.281) seconds

And so we really cannot depend on the order that tasks start executing, whether it be unstructured tasks or task groups. The same goes for async lets too, but we aren’t going to show that right now.

And if the order tasks start executing isn’t well defined, then I’m sure it means there are no guarantees on how async work interleaves amongst many tasks either.

To play around with this we will spin up a whole bunch of tasks and in each task we will append to the shared mutable array, then yield, which will suspend the current task, and then append again:

func testYieldScheduling() async {
  let count = 10
  let values = LockIsolated<[Int]>([])
  let tasks = (0...count).map { n in
    Task {
      values.withValue { $0.append(n * 2) }
      await Task.yield()
      values.withValue { $0.append(n * 2 + 1) }
    }
  }
  for task in tasks { await task.value }
}

What do we expect the array to be after all of these tasks execute?

Well, if task scheduling was deterministic, I would think that:

  • At first all 11 tasks are created
  • Then the first task starts and append 0 to the array
  • Then the first task yields, which allows the second task to run
  • The second task starts and appends a 2 to the array
  • Then the second task yields, which allows the third task to run
  • The third third task starts and appends 4 to the array
  • and on and on until we get to the last task, which appends 20 to the array and yields
  • Then we come back to the first task because it yielded first, and it gets to append its next number, which is 1
  • Then the next task continues and appends 3
  • and then on and on until the last task appends 19

So, essentially I expect to have an array with all the even numbers less than or equal to 20 at the front and in order, and then all the odd numbers after that:

XCTAssertEqual(
  values.value,
  [
    0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20
    1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21
  ]
)

However this does not pass typically. In fact, if we run the test 1,000 times we see it fails over 80% of the time:

Test Suite 'Selected tests' failed.
Executed 1000 tests, with 856 failures (0 unexpected) in 6.269 (6.763) seconds

So, not only is the starting of a task non-deterministic, but also the scheduling and interleaving of tasks is highly non-deterministic.

Writing a synchronous test

So, we’ve written a whole bunch of failing tests just to prove that it’s mostly out of our control to predict how the Swift runtime is going to execute concurrent code.

All of this goes to show that we just really cannot depend on the order of execution of things at all in Swift concurrency. And this is kind of to be expected. After all, one of the main benefits of Swift concurrency is that it manages a small number of threads in a thread pool, and dynamically allows tasks to execute their work in the pool. As soon as one task suspends it can give up its thread for another task to come in and do some work, and then later the original task can resume its own work on a completely different thread.

That scheduling process is quite complex, and Swift wants to do its best to give tasks equal opportunity to do their work in the pool, and the heuristic to do so can be quite complex and even change depending on what is happening at runtime. And so it’s not too surprising that there is an element of non-determinism inherent in async code.

Stephen: However, not all async code in our applications needs this incredibly powerful, yet unpredictable, scheduling mechanism, and it should not require us to reason about multiple async tasks cooperating to get time on the thread pool to execute their work. In fact, I would say that the vast majority of our async and concurrent code we encounter in our apps is quite simple. A user taps a button, we execute some kind of effectual work, whether it be a network request or database request or whatever, and we receive a response or failure. And maybe while that request is in flight we track some additional state, such as a loading indicator or a progress bar.

Such simple scenarios are not truly non-deterministic. There is a well defined state machine that governs their behavior, and so we would expect to be able to write a test for such situations.

So, let’s move onto something a bit more interesting to test. We are going to build up a little feature from scratch, and to do so we are going to return to one of the most common tropes of ours here on Point-Free: the humble counter app. But we will add a few bells and whistles to it to make it interact with async code in interesting ways, and that will push us to understand how we can reliably test async code in Swift.

Let’s start with the absolute basics, which is a new observable object that encapsulates the logic of a counter:

@MainActor
class NumberFactModel: ObservableObject {
  @Published var count = 0

  func incrementButtonTapped() {
    self.count += 1
  }
  func decrementButtonTapped() {
    self.count -= 1
  }
}

Nothing special so far, but this model is going to get a lot more complicated soon.

And we’ll add a very basic view just to get some UI into place so we can start exercising the logic of the feature:

struct ContentView: View {
  @ObservedObject var model: NumberFactModel

  var body: some View {
    Form {
      Section {
        HStack {
          Button("-") {
            self.model.decrementButtonTapped()
          }
          Text("\(self.model.count)")
          Button("+") {
            self.model.incrementButtonTapped()
          }
        }
      }
      .buttonStyle(.plain)
    }
  }
}

And we’ll fix the preview:

struct ContentView_Previews: PreviewProvider {
  static var previews: some View {
    ContentView(model: NumberFactModel())
  }
}

…and the app entry point:

@main
struct ReliablyTestingAsyncApp: App {
  var body: some Scene {
    WindowGroup {
      ContentView(model: NumberFactModel())
    }
  }
}

OK, this works, but it’s also not very impressive because there is no asynchronous behavior in the model at all.

But, before even introducing such async behavior, let’s get a basic test in place. Let’s create a new test file, NumberFactModelTests.swift, and paste in some basic scaffolding:

import XCTest
@testable import ReliablyTestingAsync

@MainActor
final class NumberFactModelTests: XCTestCase {
}

The only behavior our application has right now is that of incrementing and decrementing the count, and so it should be quite easy to do that:

func testIncrementDecrement() {
  let model = NumberFactModel()
  model.incrementButtonTapped()
  XCTAssertEqual(model.count, 1)
  model.decrementButtonTapped()
  XCTAssertEqual(model.count, 0)
}

And this test passes. And we can even run it thousands or tens of thousands of it times and it will always pass. This is a completely deterministic test, there is no chance of a failure.

It is of course not impressive at all, but perhaps in the future we will start having a lot more logic in the increment and decrement endpoints. Maybe we’ll start tracking analytics, or persisting data, and when that starts happening we will definitely want test coverage on that behavior.

But what is interesting about this test is just how easy it was to write. Because all the logic is completely synchronous we get to just invoke some endpoints on the model and assert on how state changed inside. This is why synchronous code is so easy to understand and so easy to test. We aspire to have our async code be this easy to test.

Writing an async test

So, let’s add some async behavior to the feature and see what it takes to test it. We are going to add a button to our feature such that when you tap it we fetch a random fact about that number from an external API.

Since we know we are going to want to write tests for this feature we want to control our dependency on this external service from the very beginning. We’ve talked about dependencies a ton on past episodes of Point-Free, and it’s just a necessary bit of upfront work that one needs to do to not only be able to write tests, but also improve our developer experience with working in Xcode previews, and more.

So, we will provide a very basic interface for the dependency:

struct NumberFactClient {
  var fact: (Int) async throws -> String
}

Just one single async and throwing endpoint that returns a fact string when you hand it a number. And note that we are using the “protocol witness” style of designing this dependency rather than a protocol.

We are also going to go ahead and go the extra mile by bringing in our Dependencies library, which makes it easy to provide dependencies to features that need them. And it also has some fancy features for making sure that you properly override dependencies in tests, which will be handy in a bit.

So, let’s import our dependencies library:

import Dependencies

And we can have Xcode add the library for us.

With that done there are a few small steps we must take to register our dependency with the library, and then we can immediately start using the @Dependency property wrapper to inject this dependency into any feature that needs access.

First we need to create a conformance to the DependencyKey protocol, which is where we get to specify which the client that will be used in tests, Xcode previews, and when run in a simulator or device. You can create a whole separate type for this, much like you would for SwiftUI’s environment values, or in this case we can take a shortcut and conform NumberFactClient to it directly:

extension NumberFactClient: DependencyKey {
}

There is one main requirement we must provide, and that is a static liveValue, which is the implementation of the interface that is actually allowed to make network requests, and is what will be used when the feature is run in simulators and devices:

extension NumberFactClient: DependencyKey {
  static let liveValue = Self { number in
    try await String(
      decoding: URLSession.shared.data(
        from: URL(
          string: "http://numbersapi.com/\(number)"
        )!
      )
      .0,
      as: UTF8.self
    )
  }
}

And because we have strict concurrency mode on we are getting a warning here:

⚠️ Type ‘NumberFactClient’ does not conform to the ‘Sendable’ protocol

But the fix is easy enough. We just need to force the fetch endpoint to be a @Sendable closure, thus making the entire client struct sendable:

struct NumberFactClient {
  var fact: @Sendable (Int) async throws -> String
}

Then we need to add a computed property to the DependencyValues type, which is what gives us access to a key path that can be used with the @Dependency property wrapper:

extension DependencyValues {
  var numberFact: NumberFactClient {
    get { self[NumberFactClient.self] }
    set { self[NumberFactClient.self] = newValue }
  }
}

That’s all it takes to register the dependency with the library, and those steps aren’t really that much different from what it takes to register a new environment value with SwiftUI.

With this new dependency defined we will now make our NumberFactModel depend on it using the @Dependency property wrapper:

@MainActor
class NumberFactModel: ObservableObject {
  @Dependency(\.numberFact) var numberFact
  …
}

…as well as hold onto some optional state for the fact we are showing in the UI:

@MainActor
class NumberFactModel: ObservableObject {
  @Published var fact: String?
  …
}

Now let’s actually make use of the dependency. We will implement an endpoint on the model for when a “fact” button is tapped in the UI:

func getFactButtonTapped() {
}

But we have a choice to make. The method definitely needs to perform async work, so we could mark it as such:

func getFactButtonTapped() async {
}

But the method is most likely going to be called from a synchronous context, so as a button in the view:

Button("Get fact") {
  await self.model.getFactButtonTapped()  // 🛑
}

🛑 Cannot pass function of type ‘() async -> ()’ to parameter expecting synchronous function type

…and we can’t await directly in the action closures of buttons.

So, should we make the method synchronous and instead spin up an unstructured task?

func getFactButtonTapped() {
  Task {
  }
}

There are pros and cons to each of these approaches, but in our opinion there is one style that slightly edges out the other. From our perspective you should always try to preserve a structured programming environment for as long as possible, and that means minimizing spinning up unstructured tasks.

That isn’t always possible. Sometimes you are forced to create an unstructured task, such as is the case here. However, we can push the unstructured task to a place of the code where structured programming doesn’t matter as much, such as the view:

Button("Get fact") {
  Task { await self.model.getFactButtonTapped() }
}

There’s not a lot of harm in spinning up an unstructured task in the view because this action closure is just a fire-and-forget mechanism and shouldn’t have a bunch of logic it anyway. It should just call a method on the model and do nothing else.

The real logic and behavior is going to be in the model, and so that is where we want to preserve structured programming, and for that reason we will make the model have an async method:

func getFactButtonTapped() async {
}

And we’ll force the view to create the unstructured task.

Now that we have an asynchronous context to work in we can perform any async work we want, such as performing the request for a fact:

func getFactButtonTapped() async {
  do {
    self.fact = try await self.numberFact.fact(self.count)
  } catch {
    // TODO: handle error
  }
}

We should be doing some proper error handling, but we won’t worry about that for now.

We could also spice up this logic a bit by first clearing out the current fact before fetching the new one:

func getFactButtonTapped() async {
  self.fact = nil
  …
}

And let’s do the same when the increment and decrement buttons are tapped:

func incrementButtonTapped() {
  self.fact = nil
  self.count += 1
}
func decrementButtonTapped() {
  self.fact = nil
  self.count -= 1
}

Now with the logic in place let’s display the fact in the view when it is non-nil:

if let fact = self.model.fact {
  Text(fact)
}

However, neither the preview or app will work yet because unfortunately the numbers API we are using does not work over HTTPS. So we have to provide a special Info.plist entry to tell iOS that it is OK to load HTTP requests.

With that done we can now give our feature a spin in a preview and see that it works.

So, that is some basic async behavior in our feature. What does it take to test it?

Let’s hop over to tests, and before writing a new test let’s make sure that our existing test still passes.

It does! Further, it passes without us even overriding the numberFact dependency we added. This means that the user flow that we are testing does not make any network requests, because if it did then it would cause a test failure.

For example, even just the simple act of accessing the numberFact property on the model like this:

func incrementButtonTapped() {
  _ = self.numberFact
  self.count += 1
}

…causes a test failure:

❌ testIncrementDecrement(): @Dependency(.numberFact) has no test implementation, but was accessed from a test context:

Location:
  ReliablyTestingAsync/ContentView.swift:26
Dependency:
  NumberFactClient

Dependencies registered with the library are not allowed to use their default, live implementations when run from tests.

To fix, override ‘numberFact’ with a test value. If you are using the Composable Architecture, mutate the ‘dependencies’ property on your ‘TestStore’. Otherwise, use ‘withDependencies’ to define a scope for the override. If you’d like to provide a default value for all tests, implement the ‘testValue’ requirement of the ‘DependencyKey’ protocol.

Our Dependencies library forces you to override any dependencies you use in tests because it is generally not right for you to access live dependencies in tests. That will cause you to accidentally make real network requests, which are slow and flakey, or write files to the disk, which will bleed over into other tests, and just overall causes a lot of problems.

So that’s cool, but let’s undo that change in the model.

OK, that was a fun little digression into how our Dependencies library helps keep us in check, but let’s now write a test for the async behavior in our feature.

We’ll start up another test method:

func testGetFact() async {
}

And we’ll construct a model to test:

let model = NumberFactModel()

But this time we do expect the user flow we are testing to touch the number fact dependency. So, we will need to override it, but let’s first see how far we can get without doing that.

After the model is created we will emulate the user tapping on the “Get fact” button:

model.getFactButtonTapped()

But this method is async, and thanks to the new async capabilities of XCTest we can actually await it here:

await model.getFactButtonTapped()

By the time that suspension finishes we expect the fact state to be populated from the result of the dependency:

await model.getFactButtonTapped()
XCTAssertEqual(model.fact, "???")

But what do we expect the fact to be? Since we are using the live number fact client it is going to make an actual network request to the Numbers API and report back some random fact about 0.

Let’s just run it to see what happens.

Well, we get two failures. The first one we’ve seen before. We are using a live dependency in a test context, and that is usually not the right thing to do:

❌ @Dependency(.numberFact) has no test implementation, but was accessed from a test context:

Let’s temporarily work around this problem by explicitly telling the Dependencies library that we are OK with using live dependencies for this one test:

func testGetFact() async throws {
  let model = withDependencies {
    $0.context = .live
  } operation: {
    NumberFactModel()
  }
  …
}

Now when we re-run tests that failure goes away. It’s of course still not a good idea to be making live API requests in a test, but let’s keep pushing forward.

There is still one failure, and it is more interesting:

❌ XCTAssertEqual failed: (“Optional(“0 is the coldest possible temperature old the Kelvin scale.”)”) is not equal to (“Optional(“???”)”)

It is showing that we did indeed load a fact from the live Numbers API, meaning we even made a network request, but the string we got back did not match what we provided. So, we might hope we can just replace our dummy string “???” with this fact and everything should be OK:

XCTAssertEqual(
  model.fact,
  """
  0 is the coldest possible temperature old the Kelvin \
  scale.
  """
)

Well, the test does seem to pass, but if we run it enough times we will eventually get a test failure:

❌ XCTAssertEqual failed: (“Optional(“0 is the atomic number of the theoretical element tetraneutron.”)”) is not equal to (“Optional(“0 is the coldest possible temperature old the Kelvin scale.”)”)

This is happening because we can’t possibly predict what fact is being sent back from the server. Sometimes it’s about the atomic number and sometimes about the Kelvin temperature scale.

This is why it’s so important to control dependencies. We can override the number fact client so that it returns something we can predict so that we can see how it’s data flows through our feature’s logic and behavior:

let model = withDependencies {
  $0.numberFact.fact = { "\($0) is a good number." }
} operation: {
  NumberFactModel()
}

This of course means we aren’t testing the actual interaction of our code with the Numbers API, but also we don’t really need to. The Numbers API is an external service that we do not control, and so for the purposes of tests we can just assume that it is working just fine. We only want to test how our feature behaves once the client returns some data.

Now we can write the assertion in a way that passes 100% of the time, deterministically:

await model.getFactButtonTapped()
XCTAssertEqual(model.fact, "0 is a good number.")

And this test passes! We have our first interesting asynchronous test, and it proves that when we tap the “Get fact” button that eventually a fact will be populated. If we had special error handling in our model we could also test that by providing a version of the NumberFactClient that throws an error, but we won’t worry about that.

We could also beef this test up a bit to further prove that when we tap the increment button that the fact is cleared, and then when we tap the “Get fact” button again we will get a new fact about the number number:

model.incrementButtonTapped()
XCTAssertEqual(model.fact, nil)
await model.getFactButtonTapped()
XCTAssertEqual(model.fact, "1 is a good number.")

And this test passes! We can even run it thousands of times and it still passes. I have a lot of confidence that this will always pass when run because we are waiting around for the fetch work to complete, and then asserting on the result after. There isn’t any room for non-determinism here.

Writing a nuanced async test

OK, so we now have an asynchronous test under our belts, and everything seems to work just fine. This might lead you to believe that we have all the tools we need to write tests for async code in Swift.

Brandon: Well, sadly that is not the case. The test we wrote so far is extremely simple. We emulate tapping a button, we wait for some asynchronous work to complete, and then we assert. Most of the time the things we want to test do not fall into such simple steps. Instead we often need to assert on how things changed just before or just after the async work, and sometimes we even need to wiggle ourselves in between two pieces of async work.

This is where Apple’s async testing tools start to fall short, and in fact we can write a test right now that demonstrates the problem.

Let’s take a look.

Recall that we have the specific behavior in our feature that we clear out the current fact while loading the new one:

func getFactButtonTapped() async {
  self.fact = nil
  do {
    self.fact = try await self.numberFact.fact(self.count)
  } catch {
    // TODO: handle error
  }
}

Let’s try to get some test coverage on this.

I’ll copy and paste the other test with a new name:

func testFactClearsOut() async {
  …
}

And instead of invoking getFactButtonTapped twice, let’s short circuit things a bit and start the model off in a state that already has a fact populated:

let model = withDependencies {
  $0.numberFact.fact = { "\($0) is a good number." }
} operation: {
  NumberFactModel()
}
model.fact = "An old fact about 0."

Then we want to be able to assert that when the “Get fact” button is tapped that the fact state is first cleared out, and then a moment later the new fact will be populated.

But where can we do that? We can’t do it right after the await because we know by that point the new fact has been loaded:

await model.getFactButtonTapped()
XCTAssertEqual(model.fact, nil)  // ???
XCTAssertEqual(model.fact, "0 is a good number.")

How can we wiggle ourselves in between the moment the fact is cleared out and when the fact is actually fetched?

We could try spinning up an unstructured task to perform the getFactButtonTapped logic so that it runs concurrently with the rest of the test:

let task = Task { await model.getFactButtonTapped() }

And then maybe we can assert that the fact is nil, then wait for the task to finish, and then finally assert that the fact state is re-populated:

let task = Task { await model.getFactButtonTapped() }
XCTAssertEqual(model.fact, nil)
await task.value
XCTAssertEqual(model.fact, "0 is a good number.")

However, this does not pass:

❌ testFactClearsOut(): XCTAssertEqual failed: (“Optional(“0 is a good number.”)”) is not equal to (“nil”)

And it shouldn’t be surprising that it doesn’t pass. After all, just a moment ago we saw that the vast majority of times the line directly after creating a task will execute before the first line in a task, so of course the fact state couldn’t possibly be cleared out yet.

Maybe we just need to yield to give the task enough time to start up:

let task = Task { await model.getFactButtonTapped() }
await Task.yield()
XCTAssertEqual(model.fact, nil)

Well it passed, but does it pass deterministically? If I run it a bunch of times:

Test Suite 'NumberFactModelTests' failed.
Executed 1000 tests, with 100 failures (0 unexpected) in 0.955 (1.584) seconds

It fails about 10% of the time.

And we get two different failures:

🛑 XCTAssertEqual failed: (“Optional(“0 is a good number.”)”) is not equal to (“nil”)

18/935 (2%) failed

🛑 XCTAssertEqual failed: (“Optional(“An old fact about 0.”)”) is not equal to (“nil”)

82/998 (8%) failed

So, with 1 yield we sometimes are not waiting enough time for the new fact to come in, and sometimes are waiting too much time.

What we really need here is the ability to precisely control exactly when the fact request finishes. This is actually straightforward to do since we decided to control our dependency. We can create an async stream with its backing continuation:

func testFactClearsOut() async {
  var factContinuation: AsyncStream<String>.Continuation!
  let factStream = AsyncStream { factContinuation = $0 }

  …
}

Even better, since we are depending on Dependencies already we can use a tool that comes with the library to complete these two steps in one:

func testFactClearsOut() async throws {
  let fact = AsyncStream.makeStream(of: String.self)
  …
}

Further, Swift 5.9 will have this tool provided automatically, and so soon you won’t even need to use our Dependencies library to get access to this functionality.

Now that we have a stream we can use it to provide an implementation of the NumberFactClient:

let model = withDependencies {
  $0.numberFact.fact = { _ in
    await fact.stream.first(where: { _ in true })!
  }
} operation: {
  NumberFactModel()
}

And at any point in the test we can emulate the fact endpoint finally returning by yielding some data to the continuation:

let task = Task { await model.getFactButtonTapped() }
await Task.yield()
await Task.yield()
await Task.yield()
await Task.yield()
await Task.yield()
await Task.yield()
XCTAssertEqual(model.fact, nil)
fact.continuation.yield("0 is a good number.")
await task.value
XCTAssertEqual(model.fact, "0 is a good number.")

Well, does this test pass? Well, if I run it a single time it does seem to pass.

But, can we really trust this Task.yield? Are we sure that it is always going to give the system enough time for the task to start up and for the fact state to be cleared out?

Let’s run the test repeatedly 1,000 times to see just how confident we can be in this code:

Test Suite 'NumberFactModelTests' failed.
Executed 1000 tests, with 41 failures (0 unexpected) in 0.955 (1.584) seconds

Ouch. Still 41 test failures out of 1,000. This test fails precisely when the yields are not enough for the task to start up and clear out the old fact, which at least means we’re down to a single failure, which is that we sometimes have the old fact still in state:

🛑 XCTAssertEqual failed: (“Optional(“An old fact about 0.”)”) is not equal to (“nil”)

41/991 (4%) failed

So we have improved things. And if we add a few more yields it seems to improve things a bit more.

So this definitely is a flakey test, and it is going to cause a world of hurt for you. You may never personally see a test failure on your machine, but you may start to see mysterious failures on CI, and you’ll have no choice but to run the whole suite again from scratch just to get a green build. And you may have to do that many, many times.

It’s worth mentioning that these tricks we are employing is exactly how you test this kind of code in Combine too. If you need to wiggle yourself in between the emissions of events from a publisher, you can construct a passthrough subject, inject it into your dependencies, and then you get to control exactly how they emit over time. So it’s cool to see that testing async/await code follows a very similar pattern.

So, this isn’t great.

But also maybe you could convince yourself that it’s no big deal to get test coverage on this. Having the fact state clear out before loading a new fact seems like a relatively minor bit of behavior.

Well, let’s add just a little bit more functionality that I think we definitely would want to get test coverage on. We will show a loading indicator while the fact request is inflight. So, we’ll add some state to the model:

@MainActor
class NumberFactModel: ObservableObject {
  @Published var isLoading = false
  …
}

And we’ll mutate that state when fetching the fact:

func getFactButtonTapped() async {
  self.isLoading = true
  defer { self.isLoading = false }
  self.fact = nil

  …
}

And just to show it works we can add a progress view to the view:

HStack {
  Button("Get fact") {
    Task { await self.model.getFactButtonTapped() }
  }
  if self.model.isLoading {
    Spacer()
    ProgressView()
  }
}

And running the preview shows that it does indeed work, though it is a bit fast so let’s artificially slow things down to really appreciate the functionality:

try await Task.sleep(for: .seconds(1))
self.fact = try await self.numberFact.fact(self.count)

Now this is a bit of behavior I’m inclined to think should have test coverage. First of all there is a very visual indicator to the user that something is happening in the background, and we also have the chance of not cleaning up the state properly, which would cause a loading indicator to appear forever. We should probably get test coverage on making sure that isLoading is reset back to false for both a successful fetch of a fact and a failure.

Let’s try testing the behavior. We will copy and paste the previous test, and make a few small tweaks to try to wiggle ourselves in between starting the request and finishing the request:

func testFactIsLoading() async {
  let fact = AsyncStream.makeStream(of: String.self)

  let model = withDependencies {
    $0.numberFact.fact = { _ in
      await fact.stream.first(where: { _ in true })!
    }
  } operation: {
    NumberFactModel()
  }

  let task = Task { await model.getFactButtonTapped() }
  await Task.yield()
  …
  XCTAssertEqual(model.isLoading, true)
  fact.continuation.yield("0 is a great number.")
  await task.value
  XCTAssertEqual(model.fact, "0 is a great number.")
  XCTAssertEqual(model.isLoading, false)
}

If we run the test 1,000 times we will see quite a few failures:

Test Suite 'NumberFactModelTests' failed.
Executed 1000 tests, with 30 failures (0 unexpected) in 1.011 (1.708) seconds

Next time: More problems

So we are now starting to see the problems with the testing tools that are provided by Swift and Xcode. They really only work with the most basic types of async functionality. In particular, where the user does something, async work happens, and then we want to assert on what was changed after that async work.

There are no tools that allow us to deterministically assert on what happens in between units of async work. We have to sprinkle in some Task.yields and hope it’s enough, and as we’ve seen a few times now, often it is not enough. We should probably be yielding a lot more times in these tests, and possibly even waiting for a duration of time to pass, which would unfortunately slow down our test suite. And still we could never be 100% certain that the test sill won’t flake some day.

Stephen: Let’s add another piece of functionality to our feature that is also quite common in real world development, and that’s effect management. We actually have a bug in our code right now in which if you tap the “Get fact” button multiple times really quickly it is possible to get multiple responses from the API and they can be completely out of order.

We are going to write a test to prove that this bug exists in our code, and then fix the bug to make sure that our test would have caught the bug in the first place. But first, let’s quickly see how this bug can manifest itself right in the preview…next time!

This episode is free for everyone.

Subscribe to Point-Free

Access all past and future episodes when you become a subscriber.

See plans and pricing

Already a subscriber? Log in

References

Reliably testing code that adopts Swift Concurrency?

Brandon Williams & Stephen Celis • Friday May 13, 2022

A Swift Forums post from yours truly about the difficulty of testing async code in Swift.

Concurrency Extras

Brandon Williams and Stephen Celis • Tuesday Jul 18, 2023

An open source Swift package that provides powerful tools for making async code easier to test.

Announcing Concurrency Extras: Useful, testable Swift concurrency.

Brandon Williams and Stephen Celis • Tuesday Jul 18, 2023

The announcement of our new Concurrency Extras library, which provides powerful tools for making async code easier to test.