In this section, the proposed G-GA algorithm is described. This method can be considered as a group-based genetic algorithm, in which population members are dynamically divided into 4 groups according to their fitness value. In what follows, we will first define the chromosome structure in the G-GA. Next, the fitness function used in the G-GA will be introduced. Finally, the way of grouping population members and utilizing mutation and crossover operators within each group will be explained.
The Chromosome Structure
The structure of each member of the population depends on: 1) the number of paths in the CFG of the program under the test and 2) the number of input arguments of the program under the test. For example, in the program shown in Fig. 1, the number of paths in its CFG is 3 and it has 2 input arguments. Therefore, the structure of a chromosome for this program can be represented as given in Fig. 2 below. This structure consists of 3 subsections, each is devoted to one path in the CFG. Each subsection has two genes, each gene represents a value for one input argument.
The Fitness function
The structure of the fitness function is the most critical part of any GA; that is, to design a criterion for judging the suitability of a feasible solution. In the problem of automatic test data generation, various fitness functions have been introduced by different authors [3] [15] [16] [17] [20] [25] [30] [33] [53] [63]. In this paper, we have considered two of the most important ones, as given below:
1- Branch distance and approximation level
Branch distance is calculated on the test data for conditional nodes [64]. For a given condition, it determines how close the test data is to satisfy that condition according to the requirement [64]. Approximation level assesses how close an individual is to reaching the target on the basis of its execution part through the control structure [64]. It is determined by counting the number of branching nodes not traversed by a test case while executing the target path [33]. There are some limitations in approximation level-based heuristics
-
If all nodes of a critical path are traversed, then the approximation level becomes zero. However, we cannot still guarantee that the target path is traversed. This can happen in the following scenario: the last conditional node of a CFG has two outgoing edges; one leads to the target path and the other one leads to somewhere else [33].
-
If all branching nodes are the same for two different paths, then the approximation level is zero for both paths. This is a generalization of the above-mentioned problem.
-
Approximation-level heuristics cannot consider multiple critical paths at the same time. This leads to the inefficiency of the automatic test case generation procedure [33].
2- Consider all paths
Using the test data 𝑡 (chromosome) as the input, the program under the test is executed and traversed paths are recorded (Considering the fact that a given chromosome encodes more than one input set to the program under the test). Then, assuming the total number of paths in the program to be and the number of distinct recorded paths for 𝑡 to be, the fitness value of the test data 𝑡 can be calculated using Eq. (1) [11].
$${f}_{t}= \frac{{m}_{t}}{n}$$
1
This way of calculating the fitness of 𝑡 completely ignores differences between different paths within the CFG; Finding test data for covering some paths is significantly harder than finding test data for covering other paths. Considering the above mentioned problems, the IGA method, introduced in this paper, aimed at providing an improved fitness evaluation method which considers not only the number of covered paths by a given chromosome (a set of test data), but also the degree of importance of paths covered by that chromosome. This can be obtained using the following equation:
$${f}_{t}=\left(\alpha \bullet \left(\frac{{m}_{t}}{n}\right)\right)+\left(\beta \bullet \left(\sum _{i=1}^{{m}_{t}} \frac{1}{{s}_{t}^{i}}\right)\right)+\left(\gamma \bullet \left( \frac{{d}_{t}}{HD}\right)\right)$$
2
In Eq. (2), \(\alpha\), \(\beta\), and \(\gamma\) are constant weights balancing between the strength of different factors in the fitness (\(\alpha + \beta + \gamma = 1\)). The first factor, \(\frac{{m}_{t}}{n}\), takes into account the number of unsatisfied paths. The second factor, \(\left(\sum _{i=1}^{{m}_{t}} \frac{1}{{s}_{t}^{i}}\right)\), considers the uniqueness of the test data. For a given path i in the test data t, \({s}_{t}^{i}\) indicates the number of repetitions of that path in all test data available in the current population. A low \({s}_{t}^{i}\) indicates a path which is rarely seen in other chromosomes and thus, is a significant finding. As a result, it must increase the fitness of its constituting chromosome. Finally, the last factor, \(\frac{{d}_{t}}{HD}\), pays attention to the diversity of the population, considering the distance between the given test data t and all other members of the population. Here, dt can be calculated according to the following equation:
$${d}_{t}= \sum _{k=1}^{p}{d}_{tk}$$
3
In the above equation, p is the number of chromosomes in the population, and \({d}_{tk}\), defined according to Eq. (4), is the Euclidean distance [30] between the test data t and a chromosome k in the population.
$${d}_{tk}=\left|{x}_{t}-{x}_{k}\right|= \sqrt{\sum _{j=1}^{l}{({x}_{tj}-{x}_{kj} )}^{2 } } ( k=1, 2, \dots , p)$$
4
In the above equation, l is the length of each chromosome in the population. HD is a summation over the total distance of each members of the population with other members and is calculated according to the Eq. (5), given below.
$$HD= \sum _{t=1}^{p}{d}_{t}$$
5
Grouping the Population
Although GA is simple to implement and converges quickly, it is easily trapped in local optima; it lacks the ability to balance the global exploration and the local exploitation of the population. To tackle this problem, most methods attempt to use adaptive recombination and mutation rates [10] [31]. The main disadvantage of such methods is that near the solution, they lose their exploration capability, which may not be suitable in some circumstances.
In the proposed method, at each generation of the GA, members of the population are divided into 4 equally sized groups, based on their fitness values; weak, acceptable, good, and excellent. The mutation and recombination operators in each group differ as is stated below.
The weak group: The mutation operator in this group works as follows: Let gbest be a randomly selected chromosome from the “excellent” group. Then, every member t of the “weak” group is mutated according to the following equation:
$${x}_{tj}= {x}_{tj}+uniform\left(.5, 1.5\right)\bullet \left(gbes{t}_{j}-{x}_{tj}\right), j=\text{1,2},...,l$$
6
Pseudo code of updating individuals within the weak group is given as follows in Algorithm 1.
Algorithm 1. Updating individuals within the weak group. |
1: Input: very good sub population, chromosome 3: Output: new chromosome 4: Begin 5: Gbest = random choose (excellent group sub population) 6: for index in chromosome: 7: { 8: Update chromosome based on Eq. (6) 9: } 10: return chromosome 11: End |
The acceptable group: In this group, each member is mutated in one of the following ways: single-point mutation, multi-point mutation, or all_excellent_mutation. The all_excellent_mutation is performed as given in Eq. (7) given below. In this equation, excellentk is the kth chromosome in the excellent group.
$${x}_{tj}= \left({x}_{tj}+\sum _{k=1}^{\frac{p}{4}}\left(excellen{t}_{kj}-{x}_{tj}\right)\right)\bullet uniform\left(.1, 2\right), j=\text{1,2},...,l$$
7
Pseudo code of updating individuals within the acceptable group is given in Algorithm 2.
Algorithm 2. Updating individuals within the acceptable group. |
1: Input: very good sub population, chromosome 3: Output: new chromosome 4: Begin 5: If random uniform (0, 1) > 0.5: 6: multi-point mutation (chromosome) 7: else: 8: if random uniform(0, 1) < = 0.5: 9: single-point mutation (chromosome) 10: else: 11: if random uniform(0, 1) < = 0.7: 12: Gbest = random choose (excellent group sub population) 13: Recombination (chromosome, Gbest) 14: else: 15: m = length (excellent group sub population) 16: for index in chromosome: 17: Update chromosome based on Eq. (7) 18: return chromosome 19: End |
The good group: In this group, either of the recombination or the mutation operator is performed as follows: If \(uniform\left(0, 1\right)>.1\), then the recombination would be carried out; otherwise the mutation will be performed. Pseudo code of updating individuals within the acceptable group is given as follows in Algorithm 3.
Algorithm 3. Updating individuals within the good group. |
1: Input: very good sub population, good group sub population, chromosome 3: Output: new chromosome 4: Begin 5: Gbest1 = random choose (excellent group sub population) 6: Gbest2 = random choose (good sub population) 7: If random uniform (0, 1) > .10: 8: Recombination (Gbest1, Gbest2, chromosome) 9: else 10: Mutation 10: return new chromosome 11: End |
The excellent group
In this group, only the local search is done.
This paper uses one of the following two conditions for stopping the G-GA: 1) all paths are covered and 2) a certain number of generations is reached. Pseudo code of the G-GA algorithm is given in the Algorithm 4.
Algorithm 4. Test datas generation based on G-GA. |
1: Input: instrumented version of program under test, number of program input variables, domain of input data, population size, maximum number of generations, \(\alpha , \beta , \gamma , \epsilon\), 3: Output: set of test dates for program under test 4: Begin 5: population = generate random chromosomes (population size) 6: generation = 0 7: while (maximum number of generations > generation) and (not find all paths) do 8: evaluate(population) 9: Determine the group of each chromosome (population) 10: for chromosome in excellent group sub population: 11: excellent group algorithm (chromosome) 12: for chromosome in good group sub population: 13: Good group algorithm (excellent group sub population, good group sub population, chromosome) 14: for chromosome in Acceptable group sub population: 15: Acceptable group algorithm (excellent group sub population, chromosome) 16: for chromosome in Weak group sub population: 17: Weak group algorithm (excellent group sub population, chromosome) 18: Combine group results as next population 19: generation = generation + 1 20: end while 21: return population 22: End |