Australia has been suffering more from bushfires than other types of natural disasters in recent years due to climate changes which have increased the temperature and decreased rainfalls (Yu et al. 2020). Bushfires can be harmful to the human health and cause devastating impacts on the environment and economy (Zhang et al. 2016). For example, 173 people were killed and 1.1 million acres of area were burned during Black Saturday bushfires in Australia in 2009 (Ma et al. 2020). Later, 248 buildings across the New South Wales (NSW) were destroyed by bushfires in 2013 (Ma et al. 2020). The most catastrophic bushfire season occurred in the summer of 2019/2020, which devastated fire-fighters, humans, and animals (Milton and White 2020). The frequency and severity of the bushfires are expected to increase in the future as a result of climate changes (Milton and White 2020). It is important to model bushfires and mitigate the negative impacts of bushfires on humans and environment. It is also important to determine areas with a high possibility of bushfire occurrence to achieve a better natural hazard management (Tehrany et al. 2021). Various algorithms and methods have been applied to bushfire susceptibility mapping (Tehrany et al. 2021), including statistical methods, artificial intelligence and machine learning (ML) techniques, ensemble techniques, and evolutionary algorithms.
Statistical methods have been used to generate bushfire susceptibility maps in different studies, such as frequency ratio (FR), evidential belief function (EBF), and weight of evidence (WOE) (Pourghasemi 2016; Hong et al. 2017, 2019; Jaafari et al. 2017). FR uses an understandable procedure and simplifies the problem and outcome, which enables to analyze large datasets in software such as ArcGIS (Pradhan et al. 2007). Statistical methods can be used in the ArcGIS environment, which enables to generate spatial patterns of bushfire prediction maps (Nami et al. 2018).
Bushfire susceptibility mapping in a case study on Minudasht forests in Iran showed that Shannon’s entropy (SE) and FR are two promising methods for the prediction of bushfires with areas under the curve (AUCs) of 83.16% and 79.85%, respectively (Pourtaghi et al. 2015). EBF is also an appropriate statistical method for the prediction of bushfires, with an AUC of 81.03% in the Hyrcanian ecoregion in Iran (Nami et al. 2018). Another study also showed that logistic regression (LR) and FR had bushfire prediction rates of 88.3% and 85.3% in Thimphu and Paro districts in western Bhutan (Dorji and Ongsomwang 2017), respectively.
A study in the Yihuang area, China, showed that the methods such as linear discriminant analysis and quadratic discriminant analysis (LDA and QDA), FR, and WOE were useful for the prediction of bushfires. WOE had the highest AUC (82.20%), followed by FR (80.9%), QDA (78.3%), and LDA (78.0%) (Hong et al. 2017).
The advancement of remote-sensing technologies has improved the bushfire management and monitoring (Jain et al. 2020). The bushfire prediction by data-driven methods has recently been used owing to the improvement in data quality (e.g., weather data) (Jain et al. 2020). The advancement in data quality has also helped scientists to use different ML techniques for bushfire susceptibility mapping (Jain et al. 2020). ML techniques can predict bushfires using input data, regardless of the expert’s knowledge. ML techniques are trained by a portion of the data and find the most fitted model that can be used for the generation of spatial maps for the bushfire prediction in the entire bushfire-prone area (Leuenberger et al. 2018; Tonini et al. 2020).
Recent studies have shown that new artificial intelligence methods generate more accurate results than conventional statistical techniques (Hoang and Tien Bui 2018). Different ML methods such as random forest (RF), artificial neural network (ANN), decision tree (DT), support vector machine (SVM), and genetic algorithms (GAs) have been applied to bushfire prediction (Jain et al. 2020). The application of multiple ML methods, including RF, ANN, multi-layer perceptron (MLP), Dmine regression (DR), least-angle regression, radial basis function (RBF), self-organized map, SVM, DT, and LR showed that RF had the highest AUC (88.0%) for the prediction of bushfires in Mazandaran province, Iran (Gholamnia et al. 2020). Similarly, RF provided promising results during different seasons in the Liguria region of Italy (Tonini et al. 2020). RF exhibited better results than those of SVM, ANN, LR, and Probit regression (Cao et al. 2017; Ghorbanzadeh et al. 2019).
Unlike deterministic methods, RF does not require prior knowledge of the bushfires, yet achieves a similar accuracy as those of deterministic methods (Leuenberger et al. 2018). Other ML methods such as Bayes network (BN), DT, naive Bayes (NB), and multi-variate logistic regression (MLR) have been applied to the bushfire prediction in Pu Mat National Park, Vietnam (Pham et al. 2020). The BN had an AUC of 96.0%, followed by DT (94.0%), NB (93.9%), and MLR (93.7%) (Pham et al. 2020). Kernel logistic regression and SVM were also used to generate bushfire susceptibility maps in Cat Ba National Park, Vietnam, where the kernel logistic regression had the highest AUC of 92.2% for the prediction of bushfires (Bui et al. 2016).
Ensembles of ML methods also showed promising outcomes for bushfire susceptibility mapping. The ensemble of different techniques, including ANFIS, GA, and simulated annealing (SA), had the highest AUC of 90.3% for the bushfire prediction (Razavi-Termeh et al. 2020). Razavi-Termeh et al. (2020) also reported that an ensemble of RBF and an imperialist competitive algorithm had an AUC of 87.8%. A combination of WOE and a knowledge-based analytical hierarchy process was more accurate than the use of WOE alone and LR in Huichang County, China (Hong et al. 2019).
Gene expression programming (GEP) which is a branch of artificial intelligence approaches proposed by Ferreira (2001), can find the explicit function between the response variables and conditioning factors automatically without considering assumptions about the problem’s function form (Ferreira 2001; Emamgolizadeh et al. 2015; Hoang and Tien Bui 2018). GEP determines the relationships between dependent variables and conditioning factors that can be nonlinear (Hosseini and Lim 2021). GEP is a useful tool for natural disaster prediction such as landslide prediction (Zakaria et al. 2010; Kayadelen 2011; Mousavi et al. 2012; Hoang and Tien Bui 2018; Hosseini and Lim 2021).
The main purpose of this research is to investigate the application of GEP for generating bushfire probability maps. GEP is a relatively new approach based on an evolutionary algorithm. Therefore, our models generated by GEP are expected to provide some important insights to bushfire susceptibility mapping. To implement GEP and measure its capability to produce bushfire susceptibility maps over NSW which is one of the most bushfire-prone states in Australia, we proposed four ensemble methods: GEP and FR (GEPFR), RF and FR (RFFR), SVM and FR (SVMFR), LR and FR (LRFR), and four baseline methods: GEP, RF, SVM, and FR for the comparison with the ensemble methods. We compared the results of single and ensemble methods to identify the best method for the prediction of bushfires in our case study area.