Concrete is the most widely used man-made construction material due to its affordability, durability, and widespread availability of its components[1] and a staggering 25 billion tons of concrete are produced each year [2]. The extensive utilization of concrete can be attributed to its advantageous properties, including its water resistance and plasticity [3].
Despite its high compressive strength, concrete has low tensile strength [4]. As a result, cracks can form in the tensile regions of structural elements, requiring the incorporation of materials with higher tensile strength into the concrete. Cracking can occur even before loads are applied to the structure, such as during the fresh state of concrete due to plastic shrinkage or settlement or during the hardened state due to drying-induced shrinkage or restricted thermal movements of the structural elements, making reinforced structures more susceptible to water infiltration and corrosion. As a result, structural interventions and repairs are often required and sometimes even unavoidable. Despite the affordable nature of concrete structures, their restoration costs can be substantial. In fact, repairing structures affected solely by reinforcement corrosion can range from 1.5–3.50% of the gross national product of both developed and developing countries [5].
In light of these challenges, there has been a growing interest in developing technologies that can enhance the durability of concrete structures. Research on the use of nanotechnology in concrete has also gained particular relevance, with a substantial increase in published studies observed in recent years [6, 7]. Among these technologies, the addition of fibers in concrete, as well as in other cementitious materials, improves their mechanical properties [8–12]. These improvements include increased resistance to tension after cracking and enhanced flexural toughness. The fibers mainly function in cracked areas, acting as “bridges” between the cracks, which helps transfer stress and enables the materials to absorb energy. Studies on the performance of fiber-reinforced concrete have utilized various types of fibers, including synthetic fibers such as steel [12–15], polypropylene fibers [15, 16], and natural fibers (e.g., bamboo and hemp) [17]. Therefore, expanding the diversity of fibers tested as reinforcement materials for crack control is crucial. Given that using fibers as concrete reinforcement to control cracking involves various processes (e.g., mixing fibers with fresh concrete, molding test specimens, and conducting mechanical property tests), computational models as tools for preselecting fibers can be highly advantageous and reduce time and material use by identifying inefficient fibers. Thus, machine learning (ML) models have been applied to assist the development of fiber-reinforced concrete [7, 18, 19]. ML uses computational methods and learning algorithms to make predictions of material properties based on available datasets and has been shown as a valuable tool to minimize time requirements or simplify complex processes in creating new materials. By employing these models, researchers and engineers can effectively preselect materials that are better suited and more efficient for their study or application, among other potential benefits.
Despite the benefits of ML, the creation of a good and trustable ML model often requires a large amount of data, and many data sources may contain errors, missing values, or incomplete information, which can negatively impact the performance of machine learning models. Moreover, depending on the field of study or knowledge domain, collecting a large and diverse dataset may be impractical or expensive, limiting the development of robust models. A small and non-diverse dataset can also lead to the data imbalance problem, that is, some datasets may have an uneven distribution of samples, leading the model to be biased towards the majority samples and perform poorly on minority samples. Hence, techniques such as data augmentation[20, 21] and synthetic data generation[22, 23] have been used to improve data quality and to obtain adequate and representative training data for ML. Synthetic data are statistically generated from sample real data to produce higher data volume for ML models training and development. Civil engineering has been using synthetic data to develop ML models when the availability of real data is limited [24, 25], including predicting concrete properties and formulations [26, 27].
Therefore, due to limited data available regarding toughness in concrete incorporated with synthetic fibers, this work aimed to generate synthetic data and develop a ML models designed explicitly to assist in choosing fibers to be tested as reinforcement for cracking control and to select the best model or ML algorithm to predict the toughness factor of fiber-reinforced concrete elements.