Unit tests should be written for each class and each static procedure in the system. They should be written as soon as the specification for the module exists. Once the implementation of a module is complete, additional tests should be written to test its implementation-specific behavior.
In practice, writing unit tests before a module is implemented can be frustrating, especially since changes to the spec may cause a set of tests to be wasted. A good habit is to write new tests each time new behavior is implemented. Don't wait until after development to write tests, since it's likely that a test suite written afterwards will be less complete than one written during development.
This can be done recursively upwards, as larger modules make use of more and more smaller ones. At the top-level, this is called a system test and should be run automatically and regularly in any software engineering environment.
This is especially useful since whatever changes were made are fresh on the engineer's mind, so bugs can be tracked down more quickly. Automatic testing systems are vital to make regression testing easy and fast. When a bug is found, tests should be written immediately that fail with the bug and pass once the bug is fixed.
OPTIONAL: In "real-world" software engineering, there's a lot more to testing [2, 3]:
- Acceptance Tests
- Tests run by the customer on the system; can be alpha test or beta test.
- Alpha Tests
- Tests run by the customer in a developer-controlled environment.
- Beta Tests
- Tests run by the customer in their own use environment.
- Performance Tests
- Tests that a system meets real-time and memory requirements.
- Stress Tests
- Tests how the system copes when loaded beyond its requirements.
- Thread Tests
- Tests the processing of a transaction through a multi-process system.
- Back-to-back Tests
- Tests multiple versions of the same system against one another.
- Recovery Tests
- Tests that a system recovers properly after a failure.
- Security Tests
- Tests that the system resists attacks against its correctness and privacy properties.
It is usually impossible to test all possible inputs to a procedure, so we need to try and focus our tests on the most likely problem spots:
A program should also be tested on inputs outside its expected input space. A number of bugs involve callers accidentally (or intentionally, in the case of hackers) failing to obey a "requires" clause and thus causing an error. One way to alleviate this problem is to check the "requires" clause whenever practical.
Q: Testing at the edge of the integer input space can cause integer overflow; should we spec this behavior?
A: If we want to be pedantic, we would write out explicitly what inputs are handled properly. In practice, we tend to rely on the client to have an intuition of what is feasible based on the properties of the relevant datatype.
static void appendVector(Vector c1, Vector v2) {
if (v1 == null || v2 == null) throw new
NullPointerException("input vectors cannot be null");
while (v2.size() > 0) {
v1.addElement(v2.lastElement());
v2.removeElementAt(v2.size() - 1);
}
}
Black-box testing may not exercise all lines of code in a procedure. Glass-box testing uses knowledge of the program structure to do this. Ideally, glass-box testing should be path-complete: it should run all possible paths through a program. To help determine what the paths are through a program, we use basic-block diagrams.
Basic-blocks are sequences of statements that do not contain any branches (conditionals or loops). Basic-block diagrams depict the flow of control in a program from block to block.
Since there are often an infinite number of paths through a program, we must settle for testing path boundary conditions:
For a given subtype, test through the supertype's spec, the subtype's spec, and the subtype's program structure. The supertype should have its own glass-box test suite.
Q: How do you test an abstract supertype?
A: To test the supertype class itself, one should provide a stub implementation of a subtype: one that provides trivial implementations of the supertype's abstract methods. This stub can then be used to test the supertype's non-abstract methods. Full subtype implementations should also test their behavior through the supertype's spec.
OPTIONAL:Part of software engineering in industry is creating detailed test plans. Test plans include descriptions of the items to be tested, the testing schedule, the procedures for managing the test process, and the hardware and software requirements for running the tests [3].
In addition to the testing done by the engineers, Quality Assurance (QA) engineers create their own test plans from customer requirement specs and test the systems from a user's point of view.
Run test suites against a system and deduce invariants about the code. The invariants can help programmers discover incorrect program behavior and expand their test suite to better exhaust the program. If the tool reveals an invariant that doesn't make sense, such as "'terms' is always 'null'", the programmer knows that this is either a bug or their test suite doesn't vary enough.
For example, Daikon is an application that runs a large test suite against a program and tracks the values of program variables during execution. Daikon then reports the invariants for each block of code. The user can then match these invariants against their expectations about the code and their test suite.
OPTIONAL: Plenty more [2]:
- Code Auditors
- Check quality of software to ensure it meets minimum coding standards
- Assertion Processors
- Tell whether programmer-supplied assertions are actually met during real program executions.
- Test File Generators
- Generate typical input files for programs undergoing testing.
- Test Data Generators
- Assist user in selecting test data to make program behave in a particular fashion.
- Test Verifiers
- Measure internal coverage of a test suite for a program.
- Test Harnesses
- Set up programs in a test environment, provide data, and simulate stubs for missing modules.
- Output Comparators
- Compare one set of outputs from a program to a reference set to determine if there are any differences. For example, running diff against a program output and the expected output.
- Data Flow Analyzers
- Track the flow of data through a program and attempt to find undefined references, incorrect indexing, memory leaks, and other data-related errors. For example, Purify.
A number of classes have been provided: the Fib interface, a test suite for that interface, a recursive implementation and its test suite, a linear implemenation and its test suite, and a caching implemenation and its test suite. Things to notice:
Q: What happens to RecursiveFib.fib() if it's called with n < 1?
A: Infinite recursion: StackOverflowError. This could be fixed by checking that the requires clause is satisfied.
Q: RecursiveFib fails testFibThirty and testFibFortySeven with this message: java.lang.InterruptedException: Test time exceeded 100ms. Why?
A: RecursiveFib's recursion grows exponentially with increasing n; at n == 30, it's attempting to make approximately 2^30 (half a billion) recursive calls, which takes quite some time.
Q: LinearFib fails testFibFortySeven with this message: fib(47) expected:<2971215073> but was:<-1323752223>. Why?
A: Integer overflow. fib(47) is larger than 2^31, the largest positive int (since an int has 32 bits). The return value looks like a negative number because of two's-complement integer representation.
Q: CachingFib fails testFibTenTwice with this message: 2nd fib(10) expected:<55> but was:<10>. Why? Why didn't testFibOneTwice() also fail?
A: There's a bug in CachingFib: it caches the argument 'n' instead of the result 'fib(n)' (testFibOneTwice() passes because fib(1) == 1). Notice that this error was not caught by any black-box tests. The glass-box tests make sure that both branches of the conditional in CachingFib.fib() are explored.
Q: It's difficult to automate GUI testing since usually a user has to do the point-and-clicking. How might we automate testing a system that has a GUI?
A: One way is to separate the GUI and the functional part of the system into separate modules. The functional part can then be tested with an automated test driver, while the GUI funcitonality can be tested by hand. However, the two parts still must be tested together (integration testing).
Another way is to use a GUI scripting tool that reads in mouse clicks and causes the system to send its results to some output analyzer. For example, once could script the process of creating an address book through a GUI, dump that data to a file, and compare that file against an expected result.
Q: An alternative to testing is verification: a formal or informal argument that a program works on all inputs. Why don't we usually use verification instead of testing?
A: For non-trivial programs, arguing correctness becomes difficult and very time-consuming. Furthermore, unless the argument refers directly to the program text, bugs can cause the program to violate the argument. Also, most forms of formal verification require a formal specification, which is often as hard or harder than the implementation itself!
Q: Why is regression testing necessary?
A: Changes to one part of a program can break behavior in another part of a program because of mistakes in implementation or bad specifications. Regression testing reveals these errors immediately after they are introduced, allowing the engineer to fix them while the change is fresh.
Q: Creating a large (perhaps even exhaustive) test suite requires generating a lot of test cases with the correct output. How can we create this data and be sure that it's correct?
A: One way is to implement a simple stub that generates correct input and output pairs. Such a stub can be checked by hand, and its output can be used to test a more complex, optimized implementation.
Because it takes a lot of time to generate good test suites by hand, we often try to minimize the number of partitions in a program's input space that we need to test. Automatic tools make it easier to generate large test suites and run them, allowing us cover the program input space better.
Q: Should engineers write the tests for their own programs?
A: Yes and no. Yes, because the engineer understands the code and the spec and can quickly write a number of the required tests. No, because an engineer will make assumptions about a program's behavior and either forget to test certain behavior or write tests that assume certain behavior. The best solution is to have the author of the code, other engineers, and customer representatives write tests.