Randomization | www.integrity-tf.org

Advanced Topics: Randomization

Generally, acceptance testing relies on reproducability. A test must always produce the same result, no matter how often it is being started in a row. This usually requires a reproducible state of the testing environment as well as a fixed test script without any "uncertain" elements.

But in some special cases, carefully targeted randomness is desired. Take for example a test script which is not primarily run to validate a certain workflow in an application, but which has the purpose of auto-generating a lot of data in said application, maybe for the purpose of load-testing a backend system. In such cases, randomness is desired, at least in well-defined areas of the script.

The purpose of this article is to outline the limits to randomization in Integrity scripts, as well as some features provided by Integrity to overcome some of those limitations. Lecture is highly suggested before creating scripts with any kind of randomized element!

"Behind the Scenes" of Integrity test execution

Even though it does seem as if Integrity would simply interpret a test script during execution, that is not the whole truth. Actually, it does interpret the whole script twice! Understanding this mechanic and its implications is key to understanding the limitations and challenges to be faced when introducing randomization.

Test execution is divided in two phases:

Dry Run: The Integrity Test Runner runs through the whole test script, starting at the root suite, interpreting all the commands. But it does not execute a single fixture call! Instead, an internal data structure is created, which is basically a tree representing an "execution plan" with each single suite, call and test. This structure contains pretty much everything that is required to render the test execution tree in the Eclipse Test Runner plugin - which is actually the primary reason for its existence - except the real test results, of course, and any data determined dynamically during test execution, like values returned from calls. The structure however is sufficient for displaying the execution state in the Eclipse plugin, and it is used for other remoting tasks like master-to-fork-synchronization as well. Dry Run usually finishes quite fast, since no fixtures are called and thus no real actions are performed, and for the same reason this phase is pretty much invisible.
Test Run: After the Dry Run, the Test Runner resets most of its internal state and starts over at the beginning. This time however, fixtures are called for real, and results are recorded. In order to dynamically update the test execution state displayed in connected Eclipse plugin instances, the execution tree from the Dry Run is continuously updated: results are added, strings with placeholders instead of actual parameters are replaced, variable values are inserted et cetera. These updates are forwarded to the plugin and integrated on the fly.

The key element in this two-phase-process is the execution state tree, also called the "SetList" in the Integrity codebase. This data structure is designed to be simple, extensible and easily modifiable in small increments which can be quickly sent to clients attached over a network connection. In order to achieve these goals, it relies on test execution to be exactly the same in both "dry" and "test" run. Actually, the whole Eclipse test runner plugin does rely on this fact, as well as some of the more sophisticated test execution control mechanisms like breakpoints and fork-to-master-synchronization.

Without any random elements in the scripts, fulfilling this requirement is quite simple. But adding randomness in the wrong place can quickly fuck things up!

Adding randomness to the equation

There are two ways in which randomization can be used in Integrity. First, a call fixture may internally use randomization to influence its result value. This does not result in any differences between the execution paths of the "dry" and "test" runs and is thus not problematic at all. Such randomization can always be used without further precautions.

It gets more interesting as soon as one wants to influence the execution path directly. Since Integrity does not support any conditional statements, the only way to do this would be using the suite or call execution multiplier. That multiplier can be fed from a constant value, and constant values can be determined by using custom operations, which can contain any arbitrary code - like for example code that includes random number generation! Doing this "the naive way" however would result in differences between dry run and test run, which is an absolutely safe way to loads of strange errors and behavior, especially if forks are used (remember: synchronization between forks and their master relies on the same protocol as the Eclipse plugin, which in turn relies on a stable test execution tree during both phases!).

But: there is a solution to these problems. After all, "random" numbers generated by a Random Number Generator (RNG) on a computer usually aren't actually random (except those from certain sources like hardware RNGs). Common RNGs use a so-called seed value to initialize their internal state and generate random numbers using a fixed, predictable algorithm afterwards. These algorithms are designed such that the resulting values are evenly distributed over the range of possible values, and they can generate an infinite number of "random" values. But due to their deterministic nature, they will always generate the exact same sequence of numbers when initialized with a constant seed value.

In order to use random numbers safely for "randomizing" Integrity test execution, an RNG is required which is initialized to the same seed value for both runs, "dry" and "test". It would also be required for a master process to forward its seed value to all forks created by it, which would ensure that forks and master can be properly synchronized. The custom operation class de.gebit.integrity.runner.operations.RandomNumberOperation, which is provided by the Integrity Test Runner itself, does implement this exact behavior!

How to use the RandomNumberOperation

Using this operation is pretty easy: just define it...

operationdef random uses de.gebit.integrity.runner.operations.RandomNumberOperation

...and you are ready to go! You can then call this operation in order to initialize a constant with a random value, which can then safely be used as suite/call multiplier. You can also use it to generate randomness for any other purpose, like for feeding it into call fixtures which would otherwise have to create randomness internally. The operation does not require a parameter and returns a decimal number with maximum precision between 0 and 1 by default, but it optionally accepts an integer value as a postfix parameter which limits the precision by specifying the number of decimals you want to have (if you specify "3", you'll get numbers like "0.542" for example). The number returned can subsequently be used in simple equations to generate random numbers in any range you require - for example by multiplying with 100 you'll get random numbers between 0 and 100.

If you just start an Integrity test by using the ConsoleTestExecutor, the initial seed used for the RandomNumberOperation is determined randomly. Every execution will thus differ from the one before, although both phases in each execution will run in the exact same way. If you need to perform multiple separate executions of a test in the exact same way however, you can also manually specify the seed value to use with the --seed command-line parameter. Use the same value to reproduce a previous test execution exactly! This nice little feature is also a good reason to obtain all "randomness" you require from the RandomNumberOperation, even if it does not affect test execution directly but is just needed inside a call fixture.