Automatic Test Data Generation based on the Prime Path Coverage Criterion: A Grouping-based GA Approach

doi:10.21203/rs.3.rs-2796131/v1

Download PDF

Research Article

Automatic Test Data Generation based on the Prime Path Coverage Criterion: A Grouping-based GA Approach

https://doi.org/10.21203/rs.3.rs-2796131/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Software testing is the process of running an application with the goal of finding bugs and subsequently improving its quality. Software testing, as a key process, plays a role in ensuring the quality of software systems. Testing is currently considered an industry in the field of software. Given that about 40% of the cost of producing any software is spent on testing, creating tools for automatically generating test data will significantly reduce the current costs of software development. This process can be considered an optimization problem, and thus, search algorithms can be used for tackling it. The Genetic Algorithm (GA) is one of the widest algorithms in this field. In this paper, we have proposed a novel GA approach, called Group-based GA (G-GA), which differs from the standard GA algorithm in the following ways. First of all, a new fitness function has been utilized that uses search space information to guide the population. The population is divided into four groups, each of which is updated according to its fitness level. Finally, in the proposed algorithm, the selection operator has been omitted and thus, the algorithm has less complexity and calculations than the standard GA. Also, the proposed algorithm considers a good level of exploration and exploitation at each step. Experiments have shown that the proposed G-GA method, in terms of the convergence speed and the search time, significantly outperforms the basic GA, its variations, PSO, Tabu Search, and Simulated Annealing.

software testing

test data generation

search algorithms

genetic algorithm

One of the most important ways to evaluate software in the industry is through software testing. This method reduces the risk of software failure. Software testing is one of the most important methods of analyzing software quality assurance. The main purpose of the software test is to increase the reliability and accuracy of the program being tested. This process is very time-consuming and expensive, accounting for about 40% of the software's production cost, while not adding any functionality to the final product [3]. Software test automation methods are used to reduce the cost and time of the test [1].

One of the automated testing methods is structural testing. In this method, test cases are generated based on the internal structure of the program. Structural testing can be divided into static and dynamic [2] methods. In static methods, in order to automatically generate test data, the program source code is not used directly, but converted to a form that is standard and contains specific information to produce test data as effectively as possible. This is usually done based on symbolic execution [59] [60] [61] [62]. Disadvantages of this method depend on the type of the source code (the programming language). In addition, such methods have problems with arrays and pointers [3]. In dynamic methods, the program is executed, and thus problems with static methods are solved. Here, the test data generation problem converts into an optimization problem, where search algorithms can be used to find appropriate solutions [3] [4].

The Genetic Algorithm (GA) is one of the most well-known algorithms which has been used in this field [5]. The GA is powerful because [6]:

Its concept is easy to understand,
It supports multi-objective optimization,
It uses probabilistic transition rules, not deterministic rules,
It is good for "noisy" environments,
It can operate on various representations,
It is easily parallelised.

Though the Genetic algorithm has proved to be a fast and powerful problem-solving approach, some limitations are found embedded in it, as discussed below [6]:

One major obstacle of the GA is its sensitivity to the choice of the fitness (evaluation) function. A wrong choice may lead to weak or even wrong solutions.
Other parameters of a GA, like population size, mutation and crossover rate must also be chosen with care. A small population size will not give enough solution space to produce accurate results. A high frequency of genetic change or poor selection scheme will result in disrupting the beneficial schema and the population may enter error catastrophe, changing too fast for selection to ever bring about convergence.
Premature convergence is another issue with the GA. This kind of inefficiency is mostly seen in small problems where even small variations in reproduction rate may cause one genotype to become dominant over others.

To deal with the above-mentioned problems, various methods have been introduced; combining the GA with other algorithms [7] [8] [9], making the GA adaptive to the current state of the search space, and tuning its parameters such as probabilities of the crossover and the mutation [10] [11]. However, combinatorial methods usually have high computational complexity and adaptive methods need parameter tuning, which is not always possible.

In this paper, we have introduced a method for tackling the above-mentioned problems which has low computational complexity and needs not any parameter tuning. In this method, each individual in the population, according to its fitness value, is placed in a group. Evolution mechanisms in each group differ from other groups. This is due to the fact that more fit solutions need to be evolved differently from less fit ones. In addition, to be able to effectively use the proposed evolutionary method for the problem of automatic test case generation, we have introduced a new fitness function which takes into account the following three important parameters: 1) the number of satisfied paths on a given chromosome, 2) the uniqueness of each test path in that chromosome, and 3) the distance of the given chromosome from the other members in the population. To the best of our knowledge, no method has combined these parameters together for this purpose.

In order to evaluate the efficiency of the proposed algorithm, this method has been used for making test data for different programs and results have been compared with that of the basic GA, its variations, PSO, Tabu Search, and Simulated Annealing. Results indicate the superiority of the proposed method over existing methods.

The rest of this paper has been organized as follows. Section 2 discusses the related work. Problem description is presented in Section 3. The Proposed Algorithm is explained in Section 4. Section 5 provides experimental results. Finally, Section 6 concludes the paper.

2.1. Software testing using GA

Two methods of improvement have been performed on the genetic algorithm:

1. Improving the genetic algorithm by improving its parameters pc and pm (mutation and recombination rate).

This section first gives a brief review of the adjustment methods of crossover and mutation (pc, pm). Then, the latest and most typical methods are given. Because there are many ways to improve the pc and pm, we divide them into 3 categories, the constant strategy, the time-varying strategy, and the adaptive which determine the value of the pc and pm.

A. Constant Value Strategy:

The constant value strategy is such that the pc and pm is a fixed values during the entire search process. There is no intelligence in this method. In [12], [13], [14], [15], [16] these rates are constant, and thus no feedback is available from the search space. Therefore, the algorithm is easily trapped in the local optima.

B. Time-varying Strategies:

The GA using the time-varying strategies mean that the value of the pc and pm is determined according to the number of iterations. These methods can be increased or decreased [10]. In these methods, pc and pm is varied according to the number of iterations; hence, the strategy is a time-varying strategy. The disadvantage of this strategy is that pc and pm cannot be adjusted according to the state of the chromosomes; it can only increase or decrease, as there is no feedback mechanism.

C. Adaptive Strategy:

The adaptive strategy adjusts the pc and pm by using one or more feedback parameters by monitoring the search situation of the algorithm (For example population members' fitness values and population diversity). In [17], [18], [19], and [20] These rates are adaptive strategies. The disadvantage of this method is that the algorithm has to do a lot of calculations.

2. Improving the genetic algorithm by combining it with other algorithms.

A Combination of GA with other algorithms is are the most common methods that researchers have used in recent years. Kumar et al. have proposed an approach to automatically generate the test data for data flow testing based on a hybrid adaptive PSO-GA algorithm. The hybrid APSO-GA is developed to conquer the weaknesses of the GA and PSO algorithms, especially in data flow testing. The results obtained show that hybrid adaptive PSO-GA gives better results as compared to the other algorithms that are used in the field of test data generation [21]. Chawla et al present a hybrid Particle Swarm Optimization (PSO) and GA-based heuristic for automatic test data generation especially to address test data type and quality of test data. The performance of the developed algorithm is compared with GA, PSO, and claimed that hybrid PSO-GA algorithm gives better results for test data generation problems [22]. Mala et al proposed a hybrid GA approach that combines GA with local search techniques for test data generation problems using path testing as test adequacy criteria [23]. A hybrid approach of GA with tabu search was presented by Rathore et al. It uses the GA approach to gradually intensify the search procedure with an efficient mutation step involving tabu search [24]. Esnaashari and Damia used a combination of genetic algorithms and reinforcement learning to generate test data. They used reinforcement learning to search locally within the genetic algorithm [25]. Rijwan Khan introduced a method for the automated generation of software test data by combining a genetic algorithm and a cuckoo search algorithm. Their goal was to reduce the time and cost of producing test data. Cuckoo search algorithms have been used to improve chromosomes. Their experiments have shown that the combination of the two algorithms is better than applying each of them separately [26]. In this paper [27], the genetic algorithm and the simulated annealing algorithm are used to automate the production of test data based on the path coverage criteria and their results are compared. Their results show that the genetic algorithm is simulated more efficiently than the simulated annealing algorithm by correctly adjusting the parameters and achieving maximum coverage in the least number of iterations. Jain et. al. Have introduced a new 2-step inharmonious approach based on GA and PSO to class testing using data flow criteria [28]. A set of classes are further tested to study the performance of the proposed method in terms of the percentage of coverage and the execution time. The results of their experiments have shown that their proposed method is better than the random method in terms of the coverage ratio achieved and iterations performed. Overall, these improvements can obtain remarkable performance enhancements at the expense of increased algorithm complexity. Also, other versions of this algorithm such as genetic programming in [29] have been used for this purpose.

2.2. Software testing using other search algorithms

A. PSO

Li and Jiang [30], Peng [31], and Latiu et al [32] all used PSO. Sahoo et al. employed the Adaptive Particle Swarm Optimization technique to generate test data to maximize route coverage. They presented a fitness function that combines branch and route distances. Their technique was more effective than the standard PSO algorithm [33]. A technique for generating test data for multiple route coverage was presented in [34], which combines a PSO with metamorphic relations. This approach creates test data using the PSO and then repeatedly generates new test data utilizing metamorphic relationships between test data. The suggested strategy decreased the amount of PSO assessments. The findings show that the suggested strategy considerably improves the efficiency of fitness assessments and time use. Damia et al proposed, a novel method of particle swarm optimization algorithm (PSO) for automatic generation of test data for web-based software. In general, in this paper, the inertia weight is dynamically calculated in each round of the algorithm according to the fitness of each particle. Experiments have been performed on different programs and the results of experiments have shown that the proposed method has better convergence rate than several methods performed by other variants of the PSO [35].

B. ACO

Although the ant colony optimization (ACO) has a higher success rate than GA, it has several flaws, including expanding the test suite's size and repeating test data inside the same test suite without enhancing test coverage standards. As shown by Srivastava et al. [36] and Dahiya et al. [37], it must also update its pheromone value, which reduces its efficacy and takes a considerable amount of time. The ant colony optimization approach was utilized to produce test data in [38]. Their suggested technique is an evolutionary strategy designed to enhance the search performance of ants during local movements and boost search exploitation. Experiments demonstrated that their suggested strategy is superior to current strategies for generating test data regarding branch coverage and convergence time. Bidgoli et al. suggested an ant colony optimization technique to generate test data for spanning prime pathways. The suggested technique divided the input space such that pheromone values may be stored in the search area. In the local search stage of their methodology, a randomized comparative test was also employed. According to their research findings, the test data produced using their suggested methodology has a mutation index that is 9% higher than test data produced using EvoSuite, a famous test data generating program [39]. The fundamental ACO method is transformed into a discrete version in [40] to create test data for structural testing. The technical road map for merging the revised ACO algorithm and testing procedure is shown. Several techniques, including local transfer, global transfer, and pheromone updating, are designed and implemented to increase the algorithm's search capabilities and provide more varied test inputs. Coverage of program components is a unique optimization aim; hence in our method, the specialized fitness function is developed by taking the nesting level and predicate type of branch into account exhaustively. A comparison study is conducted using eight widely known apps to assess the efficacy of our ACO-based test data generating approach. The experimental findings demonstrate that our methodology is equivalent to particle swarm optimization-based methods and beats the current simulated annealing and genetic algorithms in test data quality and stability. In order to suggest appropriate parameter settings for real-world applications, an assessment of algorithm parameters is also used.

C. ABC

Sheoran et al. [41], Sahin et al. [42], Boopathi et al. [43] and Alazzawi et al. [44] have all used ABC in their research. The authors obtained good results. The artificial bee colony (ABC) method was utilized in [45] to solve the problem of test data creation, and branch coverage criteria were employed as a fitness function to optimize the offered solutions. Comparisons were made using seven well-established and well-documented applications.

D. NSA

SM Mohi-Aldeen et al. presented a technique for producing test data that satisfies the route coverage condition based on the Negative Selection Algorithm (NSA). The findings indicate that the NSA may minimize the amount of test data created, increase the percentage of coverage, and optimize the test data generating process. The suggested method's findings were compared to random testing and a prior study that used a Genetic Algorithm and Ant Colony Optimization to evaluate the approach's efficacy. The findings reveal that NSA beats other approaches to decreasing the volume of test data covering all program pathways, even the most challenging ones [46]. In [47], an algorithm for the production of test data based on negative selection is presented. The genetic algorithm incorporates a harmful selection method, and the test data for the goal route is automatically created while the population data of the genetic algorithm is continually optimized. Compared to the random technique and the genetic algorithm, the suggested approach improves route coverage and reduces the formation of duplicated test data, as shown by testing findings. Using GA, the approach provided in [48] adjusts the production of detectors in the generation phase of NSA and produces a fitness function based on prioritizing the pathways. Various benchmarking software packages have been utilized with various data formats. The findings demonstrate that the hybrid technique enhanced the proportion of covered program pathways, even for complex paths. It can reduce the amount of test data created and boost efficiency despite the increasing variety of data types utilized for input. This strategy enhances the efficacy and efficiency of test data creation and optimizes the search space area, hence improving the proportion of covered paths and eliminating duplicate data.

E. FA

Damia et al. generated test data automatically using a mix of the firefly algorithm and asexual reproduction optimization. Their findings demonstrated that the two algorithms worked better together than independently [49]. [50] Propose a unique method for test suite optimization based on the firefly algorithm's chaotic nature. Using the firefly method, we can simulate the challenge of optimizing test suites for performance and efficiency. In addition, a test optimization technique based on the firefly algorithm is suggested. Tests are conducted on specific benchmark programs, and the simulation results are compared using the ACO, GA, and Chaotic firefly algorithms. Regarding software testing branch coverage, the chaotic firefly method surpasses other bioinspired algorithms such as artificial bee colonies, Ant colony optimization, and Genetic Algorithm.

F. SFLA

The shuffled frog leaping algorithm (SFLA) is presented for generating structural test data [51]. This method has a good convergence rate and is very easy to implement. Branch coverage is employed as the fitness function in the proposed SFLA to create adequate test data. Seven criterion programs were utilized to compare the efficiency of the suggested SFLA with the genetic algorithm (GA), particle swarm optimization (PSO), ant colony optimization (ACO), and artificial bee colony (ABC). According to the data, the suggested SFLA has an average branch coverage of 99.99 percent, an average success rate of 99.97 percent, and an average number of generations covering all branches of 2.03. [52] Propose a search-based test data generation framework for concurrent programs. Additionally, a hybrid meta-heuristic algorithm, called SFLA-VND, is proposed, which could be used in the mentioned framework as well as other meta-heuristic algorithms. SFLA-VND is a combination of the shuffled frog leaping algorithm (SFLA) and the variable neighborhood descent (VND). The proposed framework has been experimented on five concurrent benchmark programs by applying genetic algorithm (GA), ant colony optimization (ACO), particle swarm optimization (PSO), SFLA and SFLA-VND. Experimental results demonstrate the effectiveness and efficiency of this framework. Also, the results confirm the superiority of SFLA-VND in comparison with some popular meta-heuristic algorithms, when they are used for test data generation.

How to automatically generate sets of input parameters, referred to as test data, for a given program so that different paths in that program are covered is the main problem considered in this paper. The main goal here is to minimize the amount of calculations required to generate such test data. This is an industry need due to the fact that large software consists of many methods, and covering all paths within all these methods requires lots of computation. As a result, minimizing the amount of computation needed for covering all paths of a single method can substantially reduce the amount of time needed for comprehensively testing the whole program.

Prior to automatically generating test data for covering all paths within a program, one has to figure out the number of available paths within that program. This can be done using the following two steps: 1) converting the program into its corresponding control flow diagram (CFG), and 2) counting the number of paths within the CFG. The first step can be done simply by converting control statements, such as IF, SWITCH, LOOPS, etc., into branching nodes and other statements into simple nodes. One example of a piece of code along with its CFG can be seen in Fig. 1 [56]. The second step can be accomplished by computing the cyclomatic complexity of the CFG [65].

We have to mention here that the existence of infeasible paths within a program can result in automatic test data generators for that program falling into infinite loops, trying to cover paths which are impossible to cover. There exist different methods in the literature for detecting infeasible paths [57] [66] [67] [68] [69], and thus, in this paper, we have assumed that such paths are first detected and removed from the set of available paths within the given program.

In this section, the proposed G-GA algorithm is described. This method can be considered as a group-based genetic algorithm, in which population members are dynamically divided into 4 groups according to their fitness value. In what follows, we will first define the chromosome structure in the G-GA. Next, the fitness function used in the G-GA will be introduced. Finally, the way of grouping population members and utilizing mutation and crossover operators within each group will be explained.

The Chromosome Structure

The structure of each member of the population depends on: 1) the number of paths in the CFG of the program under the test and 2) the number of input arguments of the program under the test. For example, in the program shown in Fig. 1, the number of paths in its CFG is 3 and it has 2 input arguments. Therefore, the structure of a chromosome for this program can be represented as given in Fig. 2 below. This structure consists of 3 subsections, each is devoted to one path in the CFG. Each subsection has two genes, each gene represents a value for one input argument.

The Fitness function

The structure of the fitness function is the most critical part of any GA; that is, to design a criterion for judging the suitability of a feasible solution. In the problem of automatic test data generation, various fitness functions have been introduced by different authors [3] [15] [16] [17] [20] [25] [30] [33] [53] [63]. In this paper, we have considered two of the most important ones, as given below:

1- Branch distance and approximation level

Branch distance is calculated on the test data for conditional nodes [64]. For a given condition, it determines how close the test data is to satisfy that condition according to the requirement [64]. Approximation level assesses how close an individual is to reaching the target on the basis of its execution part through the control structure [64]. It is determined by counting the number of branching nodes not traversed by a test case while executing the target path [33]. There are some limitations in approximation level-based heuristics

If all nodes of a critical path are traversed, then the approximation level becomes zero. However, we cannot still guarantee that the target path is traversed. This can happen in the following scenario: the last conditional node of a CFG has two outgoing edges; one leads to the target path and the other one leads to somewhere else [33].
If all branching nodes are the same for two different paths, then the approximation level is zero for both paths. This is a generalization of the above-mentioned problem.
Approximation-level heuristics cannot consider multiple critical paths at the same time. This leads to the inefficiency of the automatic test case generation procedure [33].

2- Consider all paths

Using the test data 𝑡 (chromosome) as the input, the program under the test is executed and traversed paths are recorded (Considering the fact that a given chromosome encodes more than one input set to the program under the test). Then, assuming the total number of paths in the program to be and the number of distinct recorded paths for 𝑡 to be, the fitness value of the test data 𝑡 can be calculated using Eq. (1) [11].

$${f}_{t}= \frac{{m}_{t}}{n}$$

This way of calculating the fitness of 𝑡 completely ignores differences between different paths within the CFG; Finding test data for covering some paths is significantly harder than finding test data for covering other paths. Considering the above mentioned problems, the IGA method, introduced in this paper, aimed at providing an improved fitness evaluation method which considers not only the number of covered paths by a given chromosome (a set of test data), but also the degree of importance of paths covered by that chromosome. This can be obtained using the following equation:

$${f}_{t}=\left(\alpha \bullet \left(\frac{{m}_{t}}{n}\right)\right)+\left(\beta \bullet \left(\sum _{i=1}^{{m}_{t}} \frac{1}{{s}_{t}^{i}}\right)\right)+\left(\gamma \bullet \left( \frac{{d}_{t}}{HD}\right)\right)$$

In Eq. (2), $\alpha$, $\beta$, and $\gamma$ are constant weights balancing between the strength of different factors in the fitness ($\alpha + \beta + \gamma = 1$). The first factor, $\frac{{m}_{t}}{n}$, takes into account the number of unsatisfied paths. The second factor, $\left(\sum _{i=1}^{{m}_{t}} \frac{1}{{s}_{t}^{i}}\right)$, considers the uniqueness of the test data. For a given path i in the test data t, ${s}_{t}^{i}$ indicates the number of repetitions of that path in all test data available in the current population. A low ${s}_{t}^{i}$ indicates a path which is rarely seen in other chromosomes and thus, is a significant finding. As a result, it must increase the fitness of its constituting chromosome. Finally, the last factor, $\frac{{d}_{t}}{HD}$, pays attention to the diversity of the population, considering the distance between the given test data t and all other members of the population. Here, d_t can be calculated according to the following equation:

$${d}_{t}= \sum _{k=1}^{p}{d}_{tk}$$

In the above equation, p is the number of chromosomes in the population, and ${d}_{tk}$, defined according to Eq. (4), is the Euclidean distance [30] between the test data t and a chromosome k in the population.

$${d}_{tk}=\left|{x}_{t}-{x}_{k}\right|= \sqrt{\sum _{j=1}^{l}{({x}_{tj}-{x}_{kj} )}^{2 } } ( k=1, 2, \dots , p)$$

In the above equation, l is the length of each chromosome in the population. HD is a summation over the total distance of each members of the population with other members and is calculated according to the Eq. (5), given below.

$$HD= \sum _{t=1}^{p}{d}_{t}$$

Grouping the Population

Although GA is simple to implement and converges quickly, it is easily trapped in local optima; it lacks the ability to balance the global exploration and the local exploitation of the population. To tackle this problem, most methods attempt to use adaptive recombination and mutation rates [10] [31]. The main disadvantage of such methods is that near the solution, they lose their exploration capability, which may not be suitable in some circumstances.

In the proposed method, at each generation of the GA, members of the population are divided into 4 equally sized groups, based on their fitness values; weak, acceptable, good, and excellent. The mutation and recombination operators in each group differ as is stated below.

The weak group: The mutation operator in this group works as follows: Let gbest be a randomly selected chromosome from the “excellent” group. Then, every member t of the “weak” group is mutated according to the following equation:

$${x}_{tj}= {x}_{tj}+uniform\left(.5, 1.5\right)\bullet \left(gbes{t}_{j}-{x}_{tj}\right), j=\text{1,2},...,l$$

Pseudo code of updating individuals within the weak group is given as follows in Algorithm 1.

Algorithm 1. Updating individuals within the weak group.

1: Input: very good sub population, chromosome

3: Output: new chromosome

4: Begin

5: Gbest = random choose (excellent group sub population)

6: for index in chromosome:

7: {

8: Update chromosome based on Eq. (6)

9: }

10: return chromosome

11: End

The acceptable group: In this group, each member is mutated in one of the following ways: single-point mutation, multi-point mutation, or all_excellent_mutation. The all_excellent_mutation is performed as given in Eq. (7) given below. In this equation, excellent_k is the kth chromosome in the excellent group.

$${x}_{tj}= \left({x}_{tj}+\sum _{k=1}^{\frac{p}{4}}\left(excellen{t}_{kj}-{x}_{tj}\right)\right)\bullet uniform\left(.1, 2\right), j=\text{1,2},...,l$$

Pseudo code of updating individuals within the acceptable group is given in Algorithm 2.

Algorithm 2. Updating individuals within the acceptable group.

1: Input: very good sub population, chromosome

3: Output: new chromosome

4: Begin

5: If random uniform (0, 1) > 0.5:

6: multi-point mutation (chromosome)

7: else:

8: if random uniform(0, 1) < = 0.5:

9: single-point mutation (chromosome)

10: else:

11: if random uniform(0, 1) < = 0.7:

12: Gbest = random choose (excellent group sub population)

13: Recombination (chromosome, Gbest)

14: else:

15: m = length (excellent group sub population)

16: for index in chromosome:

17: Update chromosome based on Eq. (7)

18: return chromosome

19: End

The good group: In this group, either of the recombination or the mutation operator is performed as follows: If $uniform\left(0, 1\right)>.1$, then the recombination would be carried out; otherwise the mutation will be performed. Pseudo code of updating individuals within the acceptable group is given as follows in Algorithm 3.

Algorithm 3. Updating individuals within the good group.

1: Input: very good sub population, good group sub population, chromosome

3: Output: new chromosome

4: Begin

5: Gbest1 = random choose (excellent group sub population)

6: Gbest2 = random choose (good sub population)

7: If random uniform (0, 1) > .10:

8: Recombination (Gbest1, Gbest2, chromosome)

9: else

10: Mutation

10: return new chromosome

11: End

The excellent group

In this group, only the local search is done.

This paper uses one of the following two conditions for stopping the G-GA: 1) all paths are covered and 2) a certain number of generations is reached. Pseudo code of the G-GA algorithm is given in the Algorithm 4.

Algorithm 4. Test datas generation based on G-GA.

1: Input: instrumented version of program under test, number of program input variables,

domain of input data, population size, maximum number of generations,

$\alpha , \beta , \gamma , \epsilon$,

3: Output: set of test dates for program under test

4: Begin

5: population = generate random chromosomes (population size)

6: generation = 0

7: while (maximum number of generations > generation) and (not find all paths) do

8: evaluate(population)

9: Determine the group of each chromosome (population)

10: for chromosome in excellent group sub population:

11: excellent group algorithm (chromosome)

12: for chromosome in good group sub population:

13: Good group algorithm (excellent group sub population, good group sub population, chromosome)

14: for chromosome in Acceptable group sub population:

15: Acceptable group algorithm (excellent group sub population, chromosome)

16: for chromosome in Weak group sub population:

17: Weak group algorithm (excellent group sub population, chromosome)

18: Combine group results as next population

19: generation = generation + 1

20: end while

21: return population

22: End

Experiments have been performed on various well-known programs that have been in most important articles in this field, and attempts have been made to cover all programming structures, including loops and conditions. Each program has different conditional and nested structures than other programs. The following two criteria have been used to perform evaluations:

Number of fitness evaluations
Percentage of Coverage

However, we have not included results for the percentage of coverage criterion, because in all experiments and for all methods, all paths have been covered thoroughly, and thus, it has been always 100%. In addition, in all experiments, other than the average number of fitness evaluations, we have also reported its standard deviation, and the results of statistical t-test to indicate the significance of the difference between reported averages.

The G-GA has been compared with different versions of the genetic algorithm, PSO, Simulated Annealing (SA), Tabu Search (TS), and Random Search (RS). Parameter settings of these algorithms are shown in Table 1. In this paper, integer parameters are considered to be selected from the inclusive range [-10, 100]. Each algorithm for each program is executed 50 times. The maximum number of fitness evaluations in each execution is considered to be 300,000.

To obtain the CFG of a given program, many different tools are available such as visustin, pycfg, or staticfg. To obtain the cyclomatic complexity of a given CFG, we have utilized NetworkX.

Table 1

Parameter settings for algorithms used in comparisons
Algorithms	Parametr	Value
G-GA	𝛼	.6
	𝛽	.15
	𝛾	.25
Standard GA	Crossover Rate	.90
	Mutation Rate	.10
	Selection	Ranking
PSO-GA [15]	Combination of GA and PSO
GABA	Crossover Rate	.50
	Mutation Rate	.05
	Selection	Elitism
IAGA[17]	Crossover Rate	Adaptive
	Mutation Rate
	Selection
SA	Initial temperature T0	100
	Epsilon	0.001
	Alpha	0.999
PSO	Acceleration constants c1 and c2	c1 = c2 = 1.4
PSO	Inertia weight${\omega }$	$wi=\left(1-fi\right)*\left(1- \frac{{favg}^{2}}{\sum _{i=1}^{ps}fi}\right)+ random.uniform(0.1, 0.3)$
TS	Max Tabu Size	Range is [5, 10]
AGA[20]	Crossover Rate	Adaptive
	Mutation Rate	Adaptive
	Selection	Ranking

Experiment 1:

The triangle classification program, is one of the most popular programs in this field. This program, written in Python, and its corresponding CFG is given in Fig. 3 [48]. In this CFG, the number of paths and the number of variables are both equal to 3. Table 2 compares the results of applying the proposed G-GA algorithm for finding all paths in the triangle classification program with that of other algorithms. From this table, one may conclude that the G-GA, with about 13816 fitness evaluations on average, significantly outperforms all other algorithms. The closest competitor is the IAGA algorithm which performs about 13.29% worse than the G-GA. More about the obtained results and their analysis are given in the “Analysis of the Results” sub-section, at the end of the experimental results section.

Table 2

Number of fitness evaluations of different algorithms for covering the CFG of the triangle classification program
Algorithms	Number Evaluation	p_value	Variance
AGA[20]	16168.2	0.63	4550.0
GA	18660.0	0.39	5110.8
PSO-GA [15]	19225.2	0.28	6182.2
GABA	23266.8	0.14	7619.6
IAGA[17]	15653.4	0.57	3883.2
PSO	22462.2	0.16	6151.5
SA	26404.0	0.23	5517.0
TS	21665.0	0.22	6699.3
G-GA	13816.2	1.0	2078.1

Experiment 2:

The next program to be tested is the Fibonacci sequence. It is considered with the aim of evaluating the behavior of the proposed algorithm in covering loops in CFGs. This program with its corresponding CFG is shown in Fig. 4. Table 3 provides the results of applying different algorithms for finding test data for covering the CFG of this program. As it can be seen in this table, the mean number of fitness evaluations of the proposed G-GA algorithm is about 9809, which is the least among reported algorithms. Like with experiment 1, reasons behind the superiority of the G-GA in this experiment are summarized in the “Analysis of the Results” sub-section.

Table 3

Number of fitness evaluations of different algorithms for covering the CFG of the Fibonacci program
Algorithms	Number Evaluation	p_value	Standard Deviation
AGA[20]	10713.0	0.68	3560.7
GA	13516.8	0.11	4114.1
PSO-GA [15]	12447.6	0.22	3988.1
GABA	13089.2	0.13	3596.1
IAGA[17]	12182.4	0.29	3845.1
PSO	15099.6	0.07	3872.2
SA	14447.6	0.09	4188.1
TS	15251.4	0.03	3932.4
G-GA	9809.0	1.0	3657.5

Experiment 3:

In this section, the program of “calculating the roots of a quadratic equation” is considered with the aim of evaluating the behavior of the proposed G-GA algorithm in finding suitable test data for conditions that contain computational terms. This program along with its CFG is shown in Fig. 5. Table 4 presents the results in terms of the number of fitness evaluations. These results show that the G-GA outperforms other existing algorithms by at least 5.86%. That is to say, for programs including conditions that contain computational terms, even though G-GA performs better, its superiority is not significantly higher than that of other algorithms. More analysis about the obtained results can be seen at the end of this section.

Experiment 4:

This experiment has been conducted to evaluate the performance of the proposed G-GA algorithm, in comparison to other algorithms, when the program under the test consists of nested “if” statements. Such a program, along with its corresponding CFG, is shown in Fig. 6. Table 5 presents the results in terms of the number of fitness evaluations. For this experiment, G-GA obtained substantially better results than other compared algorithms. It overcomes the best algorithm, IAGA, with about 23% higher speed. This means that the G-GA is a markedly better choice than the existing algorithms for generating test data when the CFG consists of nested “if” statements. More on this will be explained in the “Analysis of the Results” sub-section.

Table 4

Number of fitness evaluations of different algorithms for covering the CFG of the program for calculating the roots of a quadratic equation
Algorithms	Number Evaluation	p_value	Standard Deviation
AGA[20]	19573.2	0.68	4458.4
GA	26812.4	0.21	6713.7
PSO-GA [15]	25627.8	0.29	5224.2
GABA	26308.8	0.21	7655.8
IAGA[17]	20397.0	0.63	4254.1
PSO	25567.8	0.16	5041.3
SA	23698.2	0.18	5553.3
TS	21186.0	0.37	5586.9
G-GA	18489.6	1.0	4353.9

Table 5

Number of fitness evaluations of different algorithms for covering the CFG of the program with nested “if” statements
Algorithms	Number Evaluation	p_value	Standard Deviation
AGA[20]	40879.2	0.018	8833.7
GA	42924.78	0.005	8131.9
PSO-GA [15]	40255.2	0.019	8643.9
GABA	50811.2	0.0001	9919.7
IAGA[17]	35753.6	0.162	7318.9
PSO	40144.8	0.0197	6696.2
SA	59277.6	3.56e-06	8645.9
TS	56976.0	7.04e-05	8084.6
G-GA	29068.76	1.0	8902.4

Experiment 5:

In this experiment, the performance of the proposed G-GA algorithm is evaluated for finding test cases for a program with a large number of variables and complex conditions. This program along with its CFG is shown in Fig. 7. Table 6 presents the results in terms of the number of fitness evaluations. Like previous experiments, the best algorithm among the compared ones is the G-GA. However, like what we have seen in experiment 3, results show that for programs with large number of variables and complex conditions, G-GA is not a noticeably better candidate than existing algorithms. It is only about 5 percent faster than the next best algorithm, which is PSO-GA.

Experiment 6:

This experiment has been conducted to evaluate the performance of the proposed G-GA algorithm, in comparison to other existing algorithms, for finding test cases for a program with a complex structure and many independent paths. This program is listed in the source code (1) given below and its flow graph is shown in Fig. 8 (the start vertex in this graph is vertex 3 and the end vertex is vertex 5). Table 7 presents the results in terms of the number of fitness evaluations. Like what we have seen before, the best test data generation algorithm for programs with complex structures and many independent paths is the G-GA. It is about 13.44 percent faster than the next best algorithm, which is AGA.

Table 6 Number of fitness evaluations of different algorithms for covering the CFG of the program with large number of variables and complex conditions

Standard Deviation	p_value	Number Evaluation	Algorithms
4802.7	0.61	95700.0	AGA[20]
8237.9	0.33	107250.4	GA
4302.1	0.73	93590.4	PSO-GA [15]
6462.6	0.20	108448.8	GABA
3493.5	0.35	100832.8	IAGA[17]
5918.9	0.30	104049.6	PSO
8978.7	0.11	113411.2	SA
6921.1	0.19	108830.4	TS
4118.0	1.0	88936.8	G-GA

Source Code 1. A program with a complex structure and many independent paths

def calculate (operation):

number_1 = int (input ("Enter your the first number: "))

number_2 = int (input ("Enter your the second number: "))

# 1 is +

if operation = = 1:

output_number = number_1 + number_2

print(output_number)

# 2 is -

elif operation = = 2:

output_number = number_1 - number_2

print(output_number)

# 3 is *

elif operation = = 3:

output_number = number_1 * number_2

print(output_number)

# 4 is /

elif operation = = 4:

output_number = number_1 / number_2

print(output_number)

# 5 is %

elif operation = = 5:

output_number = number_1% number_2

print(output_number)

# 6 is exponent

elif operation = = 6:

output_number = pow(number_1, number_2)

print(output_number)

# 7 is sqrt

elif operation = = 7:

output_number = math. sqrt(number_1)

print(output_number)

# 8 is sin

elif operation = = 8:

output_number = math. sin(number_1)

print(output_number)

# 9 is cos

elif operation = = 9:

output_number = math. cos(number_1)

print(output_number)

# 10 is

elif operation = = 10:

output_number = math. log(number_1,10)

print(output_number)

else:

print('You have not typed a valid operator, please run the program again. ')

return output_number

Table 7

Number of fitness evaluations of different algorithms for covering the CFG given in Fig. 8
Algorithms	Number Evaluation	p_value	Standard Deviation
AGA[20]	6745. 6	0. 54	1570. 3
GA	10098. 0	0. 008	2142. 0
PSO-GA [15]	7853. 6	0. 17	1779. 1
GABA	11588. 4	0. 003	2872. 5
IAGA[17]	7038. 8	0. 42	1727. 7
PSO	9456. 0	0. 035	2327. 8
SA	11258. 4	0. 003	2607. 5
TS	8238. 8	0. 10	1735. 3
G-GA	5946. 0	1. 0	1683. 8

Analysis of the Results

The results obtained from the above 6 experiments have confirmed that the proposed G-GA method outperforms other existing state-of-the-art methods in terms of the number of fitness evaluations. The main reason for the superiority of the G-GA over existing algorithms is its ability to escape from the local optima. This ability is due to the fact that in the proposed method, each member of the population is updated according to the rules for its constituting group. The grouping mechanism in G-GA, makes it capable of appropriately balancing the exploration and the exploitation.

On the other hand, GA and GA1 are static algorithms with no feedback from the state of individuals within the search space, and thus, easily trapped into local optima. This problem is less severe in [15], where a combination of GA and PSO are utilized. However, the method is more complex, and thus takes much more time to achieve suitable results. The methods of [17] and [20] places great emphasis on the population diversity. However, increasing the population diversity leads to a slower convergence speed. The PSO algorithm has several weaknesses. In this algorithm, there is a possibility of particles to be stuck in local optima. The main reason is that particles converge to a specific point between the best general location and best personal locations. SA and TS also suffer from local optima traps, and hence, usually are not able to find the global optima.

Software testing consumes a lot of resources but does not add anything to the product in terms of performance. One of the methods of testing software is data production testing. Automatically generating test data is a difficult task because for any real-size application, the input range is too large and the search will be tedious. On the other hand, a random input data generator is unlikely to be able to fully test anomalous features and exceptions, which account for only a small portion of the entire input domain. In the last decade, various methods for automatic production of test data have been introduced, the aim of which is to detect the maximum amount of error by producing the least amount of test data. The main problem in the test data production process is to determine the input data of the program under test, so that it meets the specified test criteria. In this paper, an improved genetic algorithm is proposed that strikes a good balance between exploration and exploitation and also has few calculations. Even this algorithm can be used to solve other optimization problems. This method was compared with several other algorithms in the field of test data production and the results showed that the proposed method has better speed and convergence.

Conflict of interest: The authors declare that this manuscript has no conflict of interest with any other published sources and has not been published previously (partly or in full). No data have been fabricated or manipulated to support our conclusions.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Beizer, Boris. Software testing techniques. Dreamtech Press, 2003.
Lonetti, Francesca, and Eda Marchetti. "Emerging software testing technologies." Advances in Computers. Vol. 108. Elsevier, 2018. 91-143.
McMinn, Phil. "Search‐based software test data generation: a survey." Software testing, Verification and reliability 14.2 (2004): 105-156.
Harman, Mark, and Bryan F. Jones. "Search-based software engineering." Information and software Technology 43.14 (2001): 833-839.
Whitley, Darrell. "A genetic algorithm tutorial." Statistics and computing 4.2 (1994): 65-85.
Mitchell, Melanie. An introduction to genetic algorithms. MIT press, 1998.
Valdez, Fevrier, Patricia Melin, and Oscar Castillo. "An improved evolutionary method with fuzzy logic for combining particle swarm optimization and genetic algorithms." Applied Soft Computing 11.2 (2011): 2625-2632.
Abd-El-Wahed, Waiel F., A. A. Mousa, and Mohammed A. El-Shorbagy. "Integrating particle swarm optimization with genetic algorithms for solving nonlinear optimization problems." Journal of Computational and Applied Mathematics 235.5 (2011): 1446-1453.
Nemati, Shahla, et al. "A novel ACO–GA hybrid algorithm for feature selection in protein function prediction." Expert systems with applications 36.10 (2009): 12086-12094.
Damia, Amirhosein, Mehdi Esnaashari, and Mohammadreza Parvizimosaed. "Adaptive Genetic Algorithm Based on Mutation and Crossover and Selection Probabilities." 2021 7th International Conference on Web Research (ICWR). IEEE, 2021.
McGinley, Brian, et al. "Maintaining healthy population diversity using adaptive crossover, mutation, and selection." IEEE Transactions on Evolutionary Computation 15.5 (2011): 692-714.
Ahmed, Moataz A., and Fakhreldin Ali. "Multiple-path testing for cross site scripting using genetic algorithms." Journal of Systems Architecture 64 (2016): 50-62.
Manikumar, T., Kumar, A. J. S., & Maruthamuthu, R. (2016). Automated test data generation for branch testing using incremental genetic algorithm. Sādhanā, 41(9), 959-976.
Zhang, Na, Biao Wu, and Xiaoan Bao. "Automatic generation of test cases based on multi-population genetic algorithm." Int. J. Multimedia Ubiquitous Eng 10.6 (2015): 113-122.
Kumar, Sumit, Dilip Kumar Yadav, and Danish Ali Khan. "A novel approach to automate test data generation for data flow testing based on hybrid adaptive PSO-GA algorithm." Int. J. Adv. Intell. Paradigms 9.2/3 (2017): 278-312.
Yao, Xiangjuan, Dunwei Gong, and Wenliang Wang. "Test data generation for multiple paths based on local evolution." Chinese Journal of Electronics 24.1 (2015): 46-51.
Bao, Xiaoan, et al. "Path-oriented test cases generation based adaptive genetic algorithm." PloS one 12.11 (2017): e0187471.
Wu, Danyang, and Xuejun Yu. "Automatic generation of trusted test cases based on adaptive genetic algorithm." Journal of Physics: Conference Series. Vol. 1865. No. 4. IOP Publishing, 2021.
Aleti, Aldeida, and Lars Grunske. "Test data generation with a Kalman filter-based adaptive genetic algorithm." Journal of Systems and Software 103 (2015): 343-352.
Damia, A. H., Mehdi Esnaashari, and M. R. Parvizimosaed. "Software Testing using an Adaptive Genetic Algorithm." Journal of AI and Data Mining 9.4 (2021): 465-474.
Kumar, S., Yadav, D.K., and Khan, D.A., 2017. A novel approach to automate test data generation for data flow testing based on hybrid adaptive PSO-GA algorithm. International Journal of Advanced Intelligence Paradigms, 9(2-3), pp.278-312.
Chawla, P., Chana, I. and Rana, A. (2015) ‘A novel strategy for automatic test data generation using soft computing technique’, Frontiers of Computer Science, Vol. 9, No. 3, pp.346–363.
Mala, D.J., Ruby, E. and Mohan, V. (2012) ‘A hybrid test optimization framework-coupling genetic algorithm with local search technique’, Computing and Informatics, Vol. 29, No. 1, pp.133–164.
Rathore, A., Bohara, A., Prashil, R.G., Prashanth, T.S. and Srivastava, P.R. (2011) ‘Application of genetic algorithm and tabu search in software testing’, in Proceedings of the Fourth Annual ACM Bangalore Conference, ACM, March, p.23.
Esnaashari, Mehdi, and Amir Hossein Damia. "Automation of Software Test Data Generation Using Genetic Algorithm and Reinforcement Learning." Expert Systems with Applications (2021): 115446.
R. Khan, M. Amjad, and A. K. Srivastava, “Optimization of automatic test case generation with cuckoo search and genetic algorithm approaches”, Advances in Computer and Computational Sciences, Springer, Singapore, 2018, pp. 413-423.
M. Mann, O. P. Sangwan, P. Tomar, and S. Singh, “Automatic goaloriented test data generation using a genetic algorithm and simulated annealing”. 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence). IEEE, 2016 pp. 83-87.
N. Jain, R. Porwal, S. Kumar, S.Varshney, and M. Saraswat, “Automatic data flow class testing based on 2-step heterogeneous process using evolutionary algorithms”. Journal of Statistics and Management Systems, vol. 22, no. 7, pp.1315-1348, 2019.
Nosrati, Mohammad, Hassan Haghighi, and M. Vahidi Asl. "Test data generation using genetic programming." Information and Software Technology 130 (2021): 106446.
Jiang, Shujuan, et al. "Automatic test data generation based on reduced adaptive particle swarm optimization algorithm." Neurocomputing 158 (2015): 109-116.
Peng, N. (2012). A PSO test case generation algorithm with enhanced exploration ability. Journal of Computational Information Systems, 8(14), 5785-5793.
Latiu, G. I., Cret, O. A., & Vacariu, L. (2012, September). Automatic test data generation for software path testing using evolutionary algorithms. In 2012 Third International Conference on Emerging Intelligent Data and Web Technologies (pp. 1-8). IEEE.
Sahoo, R. R., & Ray, M. (2020). PSO based test case generation for critical path using improved combined fitness function. Journal of King Saud University-Computer and Information Sciences, 32(4), 479-490.
Lv, X. W., Huang, S., Hui, Z. W., & Ji, H. J. (2018). Test cases generation for multiple paths based on PSO algorithm with metamorphic relations. Iet Software, 12(4), 306-317.
Damia, A., Esnaashari, M., & Parvizimosaed, M. (2021, May). Automatic Web-Based Software Structural Testing Using an Adaptive Particle Swarm Optimization Algorithm for Test Data Generation. In 2021 7th International Conference on Web Research (ICWR) (pp. 282-286). IEEE.
Srivastava, P. R., Ramachandran, V., Kumar, M., Talukder, G., Tiwari, V., & Sharma, P. (2008, November). Generation of test data using meta heuristic approach. In TENCON 2008-2008 IEEE Region 10 Conference (pp. 1-6). IEEE.
Dahiya, S. S., Chhabra, J. K., & Kumar, S. (2010, April). Application of artificial bee colony algorithm to software testing. In 2010 21st Australian software engineering conference (pp. 149-154). IEEE.
Sharifipour, H., Shakeri, M., & Haghighi, H. (2018). Structural test data generation using a memetic ant colony optimization based on evolution strategies. Swarm and Evolutionary Computation, 40, 76-91.
Bidgoli, A. M., & Haghighi, H. (2020). Augmenting ant colony optimization with adaptive random testing to cover prime paths. Journal of Systems and Software, 161, 110495.
Mao, C., Xiao, L., Yu, X., & Chen, J. (2015). Adapting ant colony optimization to generate test data for software structural testing. Swarm and Evolutionary Computation, 20, 23-36.
Sheoran, Snehlata, Neetu Mittal, and Alexander Gelbukh. "Artificial bee colony algorithm in data flow testing for optimal test suite generation." International Journal of System Assurance Engineering and Management 11.2 (2020): 340-349.
Sahin, Omur, Bahriye Akay, and Dervis Karaboga. "Archive-based multi-criteria Artificial Bee Colony algorithm for whole test suite generation." Engineering Science and Technology, an International Journal 24.3 (2021): 806-817.
Boopathi, Muthusamy, et al. "Quantification of software code coverage using artificial bee colony optimization based on Markov approach." Arabian Journal for Science and Engineering 42.8 (2017): 3503-3519.
Alazzawi, Ammar K., et al. "Pairwise Test Suite Generation Based on Hybrid Artificial Bee Colony Algorithm." Advances in Electronics Engineering. Springer, Singapore, 2020. 137-145.
Aghdam, Z. K., & Arasteh, B. (2017). An efficient method to generate test data for software structural testing using artificial bee colony optimization algorithm. International Journal of Software Engineering and Knowledge Engineering, 27(06), 951-966.
Mohi-Aldeen, S. M., Mohamad, R., & Deris, S. (2016). Application of Negative Selection Algorithm (NSA) for test data generation of path testing. Applied Soft Computing, 49, 1118-1128.
XIA, C. Y., ZHANG, Y., WAN, L., SONG, Y., XIAO, N., & GUO, B. (2019). Test data generation of path coverage based on negative selection genetic algorithm. ACTA ELECTONICA SINICA, 47(12), 2630.
Mohi-Aldeen, S. M., Mohamad, R., & Deris, S. (2020). Optimal path test data generation based on hybrid negative selection algorithm and genetic algorithm. PloS one, 15(11), e0242812.
Damia, A. H., & Esnaashari, M. M. (2020). Automated Test Data Generation Using a Combination of Firefly Algorithm and Asexual Reproduction Optimization Algorithm. International Journal of Web Research, 3(1), 19-28.
Pandey, A., & Banerjee, S. (2021). Test suite optimization using chaotic firefly algorithm in software testing. In Research Anthology on Recent Trends, Tools, and Implications of Computer Programming (pp. 722739). IGI Global.
Ghaemi, A., & Arasteh, B. (2020). SFLA-based heuristic method to generate software structural test data. Journal of Software: Evolution and Process, 32(1), e2228.
Mirhosseini, Seyed Mohsen, and Hassan Haghighi. "A Search-Based Test Data Generation Method for Concurrent Programs." Int. J. Comput. Intell. Syst. 13.1 (2020): 1161-1175.
Korel, B., 1990. Automated Software Test Data Generation. IEEE transactions on software engineering 16 (8), 870–879
Sahin, Omur, and Bahriye Akay. "Comparisons of metaheuristic algorithms and fitness functions on software test data generation." Applied Soft Computing 49 (2016): 1202-1214.
Yang, Congrui, et al. "An improved adaptive genetic algorithm for function optimization." 2016 IEEE International Conference on Information and Automation (ICIA). IEEE, 2016.
Pressman, Roger S. Software engineering: a practitioner's approach. Palgrave macmillan, 2005.
Gong, Dunwei, and Xiangjuan Yao. "Automatic detection of infeasible paths in software testing." IET software 4.5 (2010): 361-370.
Bidgoli, Atieh Monemi, et al. "Using swarm intelligence to generate test data for covering prime paths." International Conference on Fundamentals of Software Engineering. Springer, Cham, 2017.
Baldoni, Roberto, et al. "A survey of symbolic execution techniques." ACM Computing Surveys (CSUR) 51.3 (2018): 1-39.
Yang, Guowei, et al. "Advances in symbolic execution." Advances in Computers 113 (2019): 225-287.
Ohbayashi, Hiroki, Hideyuki Kanuka, and Chikashi Okamoto. "A preprocessing method of test input generation by symbolic execution for enterprise application." 2018 25th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 2018.
Douglas, Craig C., and Krishanthan Krishnamoorthy. "Static analysis and symbolic execution for deadlock detection in MPI programs." International Conference on Computational Science. Springer, Cham, 2018.
Zhan, Lili. "Optimal Model of Software Testing Path Selection Based on Genetic Algorithm and Its Evolutionary Solution." Wireless Communications and Mobile Computing 2022 (2022).
Chen, Yong, et al. "Comparison of two fitness functions for GA-based path-oriented test data generation." 2009 Fifth International Conference on Natural Computation. Vol. 4. IEEE, 2009.
McCabe, Thomas J. "A complexity measure." IEEE Transactions on software Engineering 4 (1976): 308-320.
Yang, Song, Xuzhou Zhang, and Yun-Zhan Gong. "Infeasible Path Detection Based on Code Pattern and Backward Symbolic Execution." Mathematical Problems in Engineering 2020 (2020): 1-12.
Ngo, Minh Ngoc, and Hee Beng Kuan Tan. "Detecting large number of infeasible paths through recognizing their patterns." Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering. 2007.
Ding, Sun, and Hee Beng Kuan Tan. "Detection of infeasible paths: Approaches and challenges." Evaluation of Novel Approaches to Software Engineering: 7th International Conference, ENASE 2012, Warsaw, Poland, June 29-30, 2012, Revised Selected Papers 7. Springer Berlin Heidelberg, 2013.
Delahaye, Mickaël, Bernard Botella, and Arnaud Gotlieb. "Infeasible path generalization in dynamic symbolic execution." Information and Software Technology 58 (2015): 403-418.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Automatic Test Data Generation based on the Prime Path Coverage Criterion: A Grouping-based GA Approach

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Work

3. Problem Description

4. The Proposed Algorithm

5. Experimental Results

6. Conclusion

Declarations

References

Additional Declarations

Status:

Version 1