Quality assurance and automated UI tests form an integral part of any application development. In mobile development, they hold an even greater significance since the stakes are so high when introducing user friendly, highly competitive, and cost-effective mobile applications with multiplatform usability and strong functional credentials. Factor in how mobile UI behaves differently on different devices, and how the availability of different screen sizes and densities increases the need to test UI on more than one device, and you’ll understand how complicated, and important, testing and QA is for mobile applications.
On the CreditWiseⓇ team at Capital One, we’ve adopted visual testing in order to improve our UI tests. What is CreditWise? CreditWise is a free credit monitoring tool that helps people understand, improve, and monitor their credit and financial health. Because of how critical and complex understanding credit can be, the last thing we want is for UI issues to impact and hinder our user’s experience of the app. Therefore, we use visual testing to validate the visual aspects of our application’s UI. In addition to validating that the UI displays the correct content or data, visual testing helps us validate the layout and appearance of each visual element of the UI, as well as the UI as a whole.
As a senior engineer on the CreditWise team, I am always looking into ways to harden our quality assurance process and develop a more robust testing mechanism. Visual testing has helped us improve our testing while reducing the possibility of human errors. However, we encountered a problem with the time spent on the tests’ execution. As our tests grew, we saw:
- As more tests were added more images for comparison were needed in the test suite.
- Since image comparison defined whether the tests were passing or failing, all the test images had to be compared before the test finished execution.
Consequently, as our test suite grew, the execution time grew as well until we were hitting 2 -3 hours of test execution time. This doesn’t work for agile development teams. 2-3 hour test execution times means you are waiting 2-3 hours for every single change going into the repository. Not only does this make the release process slow and tedious, but it also slows down day-to-day development work as every single incoming change has to run through these tests as well.
In this blog, I am going to investigate parallel test execution on Android and how my team scaled it using Jenkins. This “divide and push to free device” effort helped us reduce execution time to about half an hour.
Parallel Execution (Push to Free Device) vs Sharding
In order to improve the execution time, we evaluated various approaches towards speeding up test execution. Naturally, the first step is to spread test execution across multiple devices/emulators of the same kind. In Android’s documentation, sharding is one way to distribute tests to multiple devices. But it wasn’t the right approach for us (we’ll talk about it in a minute). Sharding didn’t quite meet the speed gains we were looking for, but what were our other options? In my research, I came across this blog on running Android UI Tests by an engineer named Roman Kushnarenko. The idea he mentioned in his blog was simple: push the test to the idle device in a multi-device pool during test execution rather than deciding in advance which device executes which test.
In order to understand how the “push to free device” approach is different, let us compare it with sharding using a generic example. Let’s say we have ten tests in two classes:
- Class A has 5 tests that take 3 minutes each for execution (15 mins total execution time).
- Class B has 5 tests that take 1 minute each for execution (5 mins total execution time).
Now, logically speaking, if we execute these tests on one emulator, we have (5 tests X 3 minutes each) + (5 tests X 1 minute each) = 20 minutes of total execution time. Now let us investigate what happens when we approach these tests using sharding and the “push to free device” approach.
Sharding (Standard ADB sharding) - Pre-Decided Test to Device Approach
Applying the sharding approach to the above scenario, we have ten tests to be shared (sharded to each) between two devices/emulators. The Android Instrumentation runner supports test sharding using the numShard and shardIndex arguments. Check out the documentation for more information. The command for sharding is:
adb shell am instrument -w -e numShards 2 -e shardIndex 1
The numShard denotes the number of shards, or pieces, in which the total available tests should be broken. The shardIndex denotes which shard, or piece, is to be executed on the device. In this case, we would use numShard = 2 and shardIndex = 0 for 1st device and shardIndex = 1 for 2nd device.
The problem in our above case is that when the two shards are created, the first shard will contain all tests from Class A, and the second shard will contain all tests from Class B. So, when these shards are distributed on two devices, you have device 1 executing tests from Class A (thereby running for 15 mins) and device 2 running all tests from Class B (thereby running for 5 mins). So, the total execution time now is 15 mins. Not the most impressive gain, given we came from 20 mins.
Parallel Execution - Push to Free Device Pattern
Now let’s look at parallel execution and the push-to-free-device pattern. The idea here is simple - get the lists of tests to execute, then simply push the test at the top of the list to the free idle device. Essentially, the list of tests acts like a stack where the top test on the stack goes to the device that is idle.
Applying this to our above example, our expectation should be that execution time should be 10 mins since we have five big tests and five small tests. In this case, if the test load is distributed evenly among two emulators - say three big tests and three small tests on Device 1 and two big tests and three small tests on Device 2 - our execution time would be 11 minutes, roughly 45-50% improvement in time.
When this approach is applied to our example, you will first see that all the big tests are divided among two emulators. Thus their execution time is reduced from 15 mins to 9 mins effectively. This is followed by the small tests being executed on two emulators. Thus their execution time is reduced to 3 minutes. This brings our total execution time to 11 minutes, which is as expected.
The following diagram attempts to put these ideas in a more understandable form:
In order to apply this to our CreditWise application, we wrote a custom Fastlane action. The action essentially generates a list of tests to be executed and then uses the above approach to distribute them to available emulators in the system. When running the application with two emulators we found that we did get a 40% improvement in execution as expected. Naturally, our first instinct was to simply increase the number of emulators to further improve the time. However, that wasn’t possible. As soon as emulator counts increased, the system quickly ran out of resources causing emulators to crash and such. Which brings us to the next step we implemented.
Divide And Conquer - Divide and Push to Free Device Pattern
Push to Free Device gave us 40% improvement. But 40% improvement on 3 hours of execution is still just under 2 hours. Though this improvement is significant, it wasn’t enough. Additionally, increasing the number of emulators wasn’t an option. So, we thought, what if we increase the number of nodes we run the emulators on?
These days, almost all projects at Capital One are backed by Continuous Integration (CI) systems. For the CreditWise application, we are using Jenkins as Jenkins provides a great pipeline that allows for multiple jobs to be executed in parallel and also for some light scripting. So, using the Jenkins pipeline, we created the following control flow:
- Pipeline begins by invoking a Jenkins job, that runs a custom Fastlane action which creates a list of tests to be executed.
- Job compiles the APK, the Test APK, and creates a list of tests to be executed. These artifacts are then uploaded to the artifactory.
- Pipeline job then downloads these artifacts and reads the lists of the tests.
- Pipeline job then equally creates 10 sub-lists from the complete test lists to be executed.
- Pipeline job then spins up 10 Jenkins jobs for the automation test run, passing the artifactory path and a sub-list of tests to each of those jobs.
- Each Jenkins job starts two emulators and distributes its sub-lists of tests in push-to-free-device pattern using the custom Fastlane action.
Using this approach, our 150+ tests are distributed across several nodes running two emulators each. Depending on the number of nodes used in the pipeline, the execution times can be drastically reduced to just a few minutes. We found that by using ten nodes with two emulators each we were able to reduce execution time from 3 hours to roughly 40-45 mins.
For this approach we used a CI solution but I think any script/application that can allow the engineer to break the test suite into multiple pieces and push them to various machines and then collect the results can be used here.
Divide And Push to Free Device Pattern - Scalable and Streamlined
With these improvements in our test execution, it was easy to add more tests to the code base without compromising execution and release time. Since now the execution time is so comparably low, we were also able to get these tests running on every incoming pull request, thereby ensuring quality and thorough testing on every change. Since the tools used here are Fastlane and Jenkins, it is also easy to apply and scale the same approach to iOS development as well.
I hope this article helped you to at least get started on some ideas for improving test execution time without compromising test quality. After all, quality testing in the early life-cycle of development is what matters most for a user friendly, highly competitive, and cost-effective mobile application!