How We 10x Improved Flutter UI Testing With Patrol 2.0 ?



We’re back with another update on Patrol - Patrol 2.0! During the last 2 months, we focused on shipping a new truly 10x feature – test bundling. Put shortly, test bundling is a new, advanced compatibility layer between the flutter_test package and native testing frameworks.

Test bundling fixes many long-standing problems with end-to-end UI testing in Flutter and unlocks use cases that weren't possible before. Read on to learn more about the issue and how we approached it.

💡Note

  • This article assumes basic knowledge of Flutter’s default testing tools, such as flutter_test and integration_test packages, and flutter test and flutter test integration_test commands.

The integration_test plugin

Our journey today starts with the integration_test plugin. Paraphrazing its tagline:

This package enables self-driving testing of Flutter apps on devices and emulators. It adapts flutter_test results into a format that is compatible with native Android and iOS instrumentation testing.

This sounds serious, but it’s actually a thin layer on top of the flutter_test package.

Let’s consider a widget test:

void main() {
  testWidgets('sign up flow', (WidgetTester tester) async {
    // test code omitted
  });
}

Placing this widget test in the integration_test directory (instead of the test directory) makes it no longer a widget test – it’s now an integration test. The code stays the same as if you were writing a widget test, and all APIs from flutter_test can still be used.

To run that test, flutter test integration_test command must be used (instead of flutter test). Running that command generates a temporary file containing, among lots of glue code, a call to IntegrationTestWidgetsFlutterBinding.ensureInitialized(). That’s why you don’t have to call that method at the beginning of integration tests.

💡Bindings

  • Conceptually, an app made with Flutter consists of two parts: the cross-platform part where the framework’s and your app’s Dart code lives, and the native embedder part hosting the Flutter Engine. Bindings are a mechanism that lets these two different worlds communicate with each other. It also makes it easy to replace one binding with another in a single line of code. That happens when you run flutter test - it executes so fast because it only runs the Flutter part of the app in headless mode, without the native part. This is made possible thanks to TestWidgetsFlutterBinding, which has all the native platform features stubbed out. It’s automatically used during flutter test.

Actually, IntegrationTestWidgetsFlutterBinding is the only public Dart API exposed by integration_test. But since it’s a plugin, there’s some native code in it. What does it do?

Put shortly, the native part of the integration_test plugin makes it possible to integrate with native tooling and to get Dart test results in native format. You may rightfully ask – why would I want that?

I’m certain that an answer to this question deserves its own paragraph.

Why is native tooling important?

iOS and Android have existed for the past 15 years, and the huge developer communities gathered around them have created tons of useful software. That also includes many great testing-related tools and platforms. Cloud device farms (Firebase Test Lab, AWS Device Farm, emulator.wtf, BrowserStack), open-source test runners (Flank, Marathon), test frameworks (way too many!), report generators to every format imaginable – these are just a few examples of the amazing tooling that developers of native mobile apps have at their disposal.

This extensive and well-established ecosystem offers huge advantages. It’s also important for enterprise clients, who often have large infrastructure built on top of it, with custom in-house tooling.

But there’s a problem – all these amazing tooling works only with native test frameworks. For example, for UI tests to run on Firebase Test Lab, they must be written using JUnit (on Android) or XCTest (on iOS).

This means that Flutter developers cannot easily tap into that mature testing ecosystem because they don’t write tests in either of those native frameworks. Instead, they use the official flutter_test package. You can address this problem in two ways:

  • by the Flutter community starting a crusade and advocating for including support for Flutter test APIs in all these existing tools,
  • by developing a bridge connecting the Flutter test APIs with native test frameworks.

As you’ve probably already realized, the first solution is infeasible, so let’s go with the bridge. Thanks to it, Flutter apps could use the existing native test tooling without that tooling needing to support Flutter explicitly.

I think this is exactly what integration_test would be in a perfect world. But we don’t live in a perfect world.

integration_test doesn’t deliver

The integration_test plugin has many problems, big and small. What’s also worrisome is that it hasn’t received any significant improvements since its release in Flutter 2.0 in March 2021.

So, what’s wrong with integration_test?

Unnecessary builds

Flutter issue #115751

Let's consider an app with 3 integration tests:

integration_test
├── sign_up_test.dart
├── location_test.dart
└── sign_out_test.dart

Running flutter test integration_test builds the application 3 times, once for every integration test file:

> sign_up_test.dart
    > Build app
    > Install app
    > Run app and execute tests
    > Kill app
    > Uninstall app
  > location_test.dart
    > Build app
    > Install app
    > Run app and execute tests
    > Kill app
    > Uninstall app
  > sign_out_test.dart
    > Build app
    > Install app
    > Run app and execute tests
    > Kill app
    > Uninstall app

This is unnecessarily slow because the full app build is performed for each of the 3 Dart test files. The only difference in inputs to these builds is a single integration test file. As the number of tests you have grows, the time it takes to execute them also increases – and there’s no way around that. But requiring a full app rebuild **for every test file tip the scale, making tests’ total build and execution time unbearably long.

Our first idea to solve this problem of unnecessary builds was to bundle tests together. We generated the integration_test/bundled_test.dart file and filled it with references to other tests in the integration_test directory, which we obtained by walking that directory:

import 'package:test/test.dart';
	
import 'notifications_test.dart' as notifications_test;
import 'permission_location_test.dart' as permission_location_test;
	
void main() {
  group('notifications_test.dart', notifications_test.main);
  group('permission_location_test.dart', permission_location_test.main);
}

This worked and fixed the problem of unnecessary builds. Now we could run flutter test integration_test/bundled_test.dart once, and all tests were built into a single app binary.

But after a while, we found out that this “primitive” test bundling approach had a major flaw. Take a look at this snippet:

void main() {
  patrolTest(
    'some test',
    ($) async {
      await $.pumpWidgetAndSettle(ExampleApp());
      await $('some button').tap();
      // ... omitted more test code
      exit(1); // kills the app
    },
  );
}

which leads us to the next problem…

No isolation between test runs

If something really bad happens to the app under test and it crashes (simulated by the exit() in the snippet above), subsequent tests don’t execute, and no test report will be available. The call to exit() in the snippet above might look off – after all, you never use it in Flutter apps – but it’s here just for demo purposes. If you want a more real-like example, imagine a native crash occurring in the app, resulting in a dreaded App Not Responding dialog.

Notification: Async Examples isn't responding

A crash like this has fatal consequences for the tests.

In Flutter, the tests are built into and run inside the app; since the app died, the tests have also died! No subsequent tests will be executed, and there will be no report since it’s generated at the end of the test run, but the test run crashed.

But a crash is not the only danger to the primitive test bundling approach. Remember, all tests run in the same process. You are in charge of ensuring no state is shared – for example, resetting global variables and ensuring plugins are not initialized more than once.

No sharding

Flutter issue #101296

Sharding means splitting the test suite across many workers (shards), which execute tests in parallel, reducing the total time it takes for a test suite to finish running. It also helps reveal implicit dependencies between tests because usually, tests are split into shards randomly, so there are no guarantees about the order in which they'll be executed.

But because of how integration_test is implemented, sharding is broken.

The integration_test plugin creates native tests only after all Dart tests execute, so there's no way to shard them - they just don't exist at the time when sharding happens!

This problem is not as bad as the previous one, but it's still *bad*. It makes running even medium-sized test suites infeasible because of how long it takes to execute them.

No test run durations

Flutter issue #117386

Compared to the previous problems, this one is merely annoying.

Consider a simple Dart test file:

void main() {
  testWidgets('alpha test', (WidgetTester tester) async {
    await tester.pumpWidget(const MyApp());
    await Future.delayed(const Duration(seconds: 10));
  });
	
  testWidgets('bravo test', (WidgetTester tester) async {
    await tester.pumpWidget(const MyApp());
    await Future.delayed(const Duration(seconds: 10));
  });
}

Both these tests will take about 10 seconds to execute each. Unfortunately, that’s not how their run times are reported. The first test’s duration is reported to be a few hundred milliseconds, and the subsequent ones finish instantly:

Reporting mobile UI test time duration

Why's that?

The cause is the same as before – native tests are created only after Dart tests finish running. When the native part of integration_test receives the results of Dart tests, it creates native tests out of them. But these test cases are simply stubs – their execution is finished immediately after creation. That's why the run times are reported incorrectly.

💡 See it yourself

  • If you’re curious, here are the links to integration_test's code responsible for creating the native tests out of Dart test results and immediately starting and finishing them:
    - FlutterTestRunner.java on Android
    - FLTIntegrationTestRunner.m on iOS

Summary – overall bad experience

Now that we know what the problem is, we can sketch out the acceptance criteria for a solution: tests must be completely isolated from one another to prevent flakiness and remove implicit dependencies between them.

That’s why we named this approach primitive test bundling. The idea was spot on, but the flaws in implementation disqualified it.

After lots of thinking and workshops, we realized that it was impossible to fix the flaws of primitive test bundling in pure Flutter and Dart. We had to drop down to the native level. That’s how advanced test bundling was born.

Accessibility bridge digression

Wait, what? This is an article about UI testing, and now we’re talking about accessibility?

Yes – because how accessibility works in Flutter is similar to how we implemented advanced test bundling.

Let’s think about how it works that Android and iOS can display the accessibility information over Flutter widgets which, well, are Flutter widgets - they exist only in the Flutter part of the app. Android and iOS have no slightest idea what a “Flutter widget” is.

In other words, how does it work that when you run this simple code with TalkBack/VoiceOver enabled and tap on the blue rectangle, you’ll hear Late nights in the middle of June?

class MyApp extends StatelessWidget {
  const MyApp({super.key});
	
  
  Widget build(BuildContext context) {
    return MaterialApp(
      home: Center(
        child: Semantics(
          label: 'Late nights in the middle of June',
          child: Container(
            width: 100,
            height: 100,
            color: Colors.blueAccent,
          ),
        ),
      ),
    );
  }
}

This is made possible by a component called accessibility bridge. It’s part of the Flutter Engine, and there’s a separate implementation for every operating system supported by Flutter (because all platforms have different accessibility APIs). The accessibility bridge receives semantics information from the Flutter framework and translates it into a format the operating system can understand. Then the accessibility information is laid on top of Flutter widgets.

This sounds simple in principle, but since Flutter supports many operating systems, and each differs slightly, there are many edge cases. This is done by some hardcore hacking of native accessibility frameworks, but it works reliably, thanks to the incredible engineering done by Google.

Here’s a drawing I sketched out to visualize this process:

accessibility bridge digression process

💡 Accessibility bridge

Eureka

At some point, I realized that what we need to fix the problems above is a component similar to the accessibility bridge but for tests. After all, the situation is fairly similar - similar concept (a test) exist in both Flutter and native, but there’s no link between them.

Fortunately, since we’re only focusing on Android and iOS (though that’ll probably change in the future), our case is much simpler than Flutter’s accessibility bridge.

More terminology!

On both mobile platforms, end-to-end UI tests work similarly. There are always 2 apps involved:

  • the app itself, which is being tested. It’s often called “app under test”.
  • the instrumentation, where the tests are defined

The instrumentation app runs first and starts executing tests one by one. The first thing each test does is start the app under test. Then, the actual test begins – tapping, entering text, assertions, and so on.

Test suite state lives in the instrumentation process, safe from any fatal crashes that may occur in the app process.

instrumentation process and app process

💡Differences

  • If you get deeper into these native testing frameworks, you’ll discover (sometimes huge) differences between Android and iOS, but this simplified understanding should suffice for this article.

The idea

Now that we know what tools are available, here’s how I imagined a new test bridge would work:

  1. The instrumentation process asks the app process for a list of all Dart test.
  2. For every Dart test, a native test is created. Its body calls the Dart test and waits for it to finish, returning its result.
  3. The instrumentation executes the native tests one by one.

What’s important is that there are no native tests at compile time - they are only created at runtime in step 2. We refer to this process as the “dynamic creation of tests”.

💡Test is a blurry term

  • In practice, it’s simply a method, so when I say “we dynamically create test cases”, what I actually mean is “we add methods to a class at runtime”. Of course, this is not the whole truth because most programming languages aren’t dynamic enough to do it, but you get the gist. I’ll explain how we do it in more detail later on.

I came to call this approach advanced test bundling. The name test bundling stayed because it still accurately describes what’s going on (the Dart file bundling all other Dart tests is still generated), but the internals are completely different.

Another drawing, this time visualizing how an improved version of integration_test should work:

Test bundling process

💡Nothing fancy, after all

  • This type of situation (where one part of the app ”lives in its own world”, separate from its native part, and various functionality has to be “bridged”) is common in cross-platform frameworks.

Enter – test bundling

You already understand the why and what of test bundling. Now we’re getting to the coolest part - implementation.

Let’s look under the hood and see what parts our new test bridge consists of and how they play together.

Dart implementation

I started the implementation of advanced test bundling from Dart, and right off the bat, I faced a problem. The gist is that package:test doesn’t allow for retrieving the test suite structure before it starts executing, but this is exactly what we need. If you’d like to learn more about this issue, I reported it to dart-lang/test repository.

We worked around this problem by using non-public APIs of the package:test_api, which is a dependency of package:test. It’s not a perfect solution, but it’s pretty simple, and it works. Also, the APIs we depend on are fairly stable, even though they’re not public. The workaround is to create a special patrol_test_explorer test case which runs first and retrieves the test suite structure. It gets this information from the global Invoker object, which is an internal API from the test_api package.

The code for this lives in the integration_test/bundled_test.dart file. Remember that Patrol CLI automatically generates this file during patrol test or patrol build, and the whole process is transparent to the developer.

// This whole file is generated automatically by Patrol CLI at build time.
	
// Internal API imports. Not nice, booo.
import 'package:test_api/src/backend/invoker.dart';
import 'package:test_api/src/backend/group.dart';
	
// Imports of other tests in the integration_test directory.
import 'notifications_test.dart' as notifications_test;
import 'permission_location_test.dart' as permission_location_test;
	
void main() async {
  final testExplorationCompleter = Completer<Group>();
  test('patrol_test_explorer', () {
     final topLevelGroup = Invoker.current!.liveTest.groups.first;
     testExplorationCompleter.complete(topLevelGroup);
  });
	
  group('notifications_test.dart', notifications_test.main);
  group('permission_location_test.dart', permission_location_test.main);
	
  final Group topGroup = await testExplorationCompleter.future;
  // At this point, we have test suite structure!
  // Later, we serve it over gRPC so the native part can query it.
}

Once this problem was fixed, work on the native part could begin.

💡 Omission alert!

  • This, of course, wasn’t the only problem. I omitted:
    - How we execute a specific Dart test once native side requests it
    - How we know which test has just completed executing (so we can send the status back to native side)

This article is already much longer than I planned, so I decided to leave it out. Solutions to these problems also depend on accessing some internal test suite state through Invoker.

Android implementation

Here’s a typical “sign-in” UI test written using first-party Android tools. It’s defined at compile time, the testing framework is JUnit4 coupled with AndroidJUnitRunner instrumentation runner, and UI interactions and assertions are done using the very popular Espresso library:

(AndroidJUnit4.class)
public class ExampleTest {
 
 public static Iterable<Object[]> testCases() {

    
    public void launchActivity() { 
         ActivityScenario.launch(MainActivity.class); 
    }

    
    public void signIn() {
        onView(withId(R.id.editTextUsername)).perform(typeText("charlie_root"));
        onView(withId(R.id.editTextPassword)).perform(typeText("ny4ncat")); 
        onView(withId(R.id.buttonSignIn)).perform(click()); 
        onView(withId(R.id.textViewWelcomeMessage)).check(matches(isDisplayed())); 
    }
}

This is not easily adaptable to what we need in our advanced test bundling approach. Here, tests are defined at compile time, but we have to generate them dynamically (i.e., at runtime) from Dart tests. Adding methods to a class at runtime is hard to do in JVM/ART, so that’s a no-go.

Fortunately, the JUnit4 library has the Parametrized runner. It lets us define a test case once and run it multiple times, feeding it new data each time. Here’s a simple example showing how the + (addition) operation could be tested in a calculator application with the help of the Parametrized runner:

(Parameterized.class)
public class CalculatorTest {
    
    public static Iterable<Object[]> testCases() {
        return Arrays.asList(new Object[][]{
                {0, 0, 0},
                {2, 2, 4},
                {3, 2, 5},
                {3, 3, 6}
	    });
    }
	
    private final double firstNum;
    private final double secondNum;
    private final double sum;
	
    public CalculatorTest(double firstNum, double secondNum, double sum) {
        this.firstNum = firstNum;
        this.secondNum = secondNum;
        this.sum = sum;
    }
	
    
    public void addTwoNumbers() {
        // omitted code that starts the activity
        onView(withId(R.id.firstNumEditText)).perform(typeText(Integer.toString(firstNum)));
        onView(withId(R.id.secondNumEditText)).perform(typeText(Integer.toString(secondNum)));
        onView(withId(R.id.sumTextView)).check(matches(Integer.toString(sum)));
    }
}

There are 4 tests, and a new app process is started for each of them to achieve isolation. During each test, an instance of CalculatorTest is created, and then the addTwoNumbers() method is called - it’s all taken care of by the Parametrized runner.

Watch the video below

Previous Post Next Post