Metabarcoding of environmental DNA (eDNA) and machine learning (ML) algorithms are two increasingly used methods for gauging biodiversity and tracking ecosystem health. We analyzed prior work in biodiversity assessment using ML-based eDNA metabarcoding and discussed its merits, limitations, and potential applications.
Our analysis is more comprehensive than prior works, including the efficacy of several ML algorithms used for eDNA metabarcoding and highlighting the challenges and limitations of this field. The use of ML-based eDNA metabarcoding for conservation and environmental monitoring was also investigated.
Based on this study, random forest, support vector machine, and naive Bayes were found to be the most popular ML approaches in eDNA metabarcoding. The accuracy (70-96.7%) of our algorithms' presence/absence predictions exceeded that of traditional identification methods in various scenarios. Nevertheless, ML-based eDNA metabarcoding accuracy may be impacted by PCR biases, sequencing faults, and limited taxonomic coverage.
These problems can only be resolved by using standardized electronic DNA sample collection methods, DNA extraction, PCR amplification, and sequencing. ML-based eDNA metabarcoding produces more accurate and trustworthy results for rare and evasive species when training datasets encompass a wide range of taxa and environmental factors.
Based on the findings of this research, ML-based eDNA metabarcoding has the potential to be used for ecological monitoring and conservation. This high-throughput, low-cost approach is well-suited for nationwide biodiversity surveys and ongoing ecosystem monitoring. EDNA metabarcoding based on machine learning might be used to determine conservation priorities, monitor the progress of restoration efforts, and spot intruders.
Rare and secretive species may be discovered by eDNA metabarcoding based on machine learning. Although conventional sampling methods may miss small amounts of DNA from rare or elusive species, eDNA metabarcoding may be able to. In order to find rare frog species in remote areas where traditional survey methods had failed, Deiner et al. (2016) used eDNA metabarcoding.
Interesting though it may be, ML-based eDNA metabarcoding has downsides. DNA extraction, PCR amplification, and sequencing can introduce bias and error into the data (Lacoursière-Roussel et al., 2016). Reduced accuracy in identifying species and the possibility of misidentification are both effects of the reference database's limited taxonomic coverage (Elbrecht et al., 2018).
Eliminating bias and maintaining research comparability are benefits of standardizing methods and procedures (Deiner et al., 2017). Large training datasets improve the accuracy and reliability of ML algorithms, especially for rare and difficult-to-detect species.
Biodiversity assessments and ecological observances benefit from eDNA metabarcoding based on machine learning. Recent studies have shown that ML algorithms can accurately identify species and locate rare and evasive organisms, but this approach has limitations. Improve biodiversity knowledge and conservation efforts using ML-based eDNA metabarcoding by standardizing procedures and training datasets.
The ML-based eDNA metabarcoding was created by Elbrecht et al. (2018). While classifying eDNA data from freshwater invertebrate samples from across the globe, random forest fared better than gradient boosting. Sequencing inaccuracies and a lack of taxonomic coverage hampered their research.
Mächler et al. (2019) used ML-based eDNA metabarcoding to analyze populations of Swiss river fish. SVMs fared better than random forest and naive Bayes in classifying species. Like those of other studies, their approach was limited by PCR bias and inadequate taxonomic coverage.
French river fish were eDNA metabarcoding with 95% accuracy using a support vector machine developed by Valentini et al. (2016). However, mistakes may arise from insufficient taxonomic coverage and PCR bias.
Wilcox et al. (2019) predicted invasive fish in a US river using eDNA data and a random forest algorithm. The limitations of PCR bias and taxonomic coverage reduced their accuracy to 90%.
In this research, we look at how ML-based eDNA metabarcoding is used in current biodiversity assessment studies. While ML-based eDNA metabarcoding has the potential to enhance the accuracy and efficiency of species identification, it is not without its downsides. We provide practical advice on how to use ML-based eDNA metabarcoding for ecological monitoring and conservation.
Our study argues that standardized methods and more extensive training datasets are needed to improve ML-based eDNA metabarcoding. By removing these barriers, ML-based eDNA metabarcoding might boost biodiversity evaluation and ecological monitoring.
Elbrecht et al. (2018) used eDNA metabarcoding data and a random forest approach to categorize freshwater invertebrates. Due to inaccuracies in the sequence and gaps in the taxonomic coverage, the accuracy of their classifications was only 78%. Similar methods were used to classify macroinvertebrates in freshwater streams by Deiner et al. (2017), with a success rate of 73%. The challenges of sequencing errors and inadequate taxonomic coverage also affect ML-based eDNA metabarcoding.
Doi et al. (2017) used a random forest algorithm to correctly categorize 70 percent of the fish species in Japanese rivers. Issues of taxonomic coverage and PCR bias stymied them. Mächler et al. (2019) reported that a support vector machine algorithm successfully categorized the species of fish found in Swiss rivers with an accuracy rate of 83%. Fundamental limitations of the study were PCR bias and a lack of taxonomic coverage.
Several of these studies were conducted in aquatic environments since that is where eDNA samples are often collected. Nevertheless, eDNA metabarcoding based on machine learning is being studied for both the ground and the air. Soil microbial communities were successfully classified using ML-based eDNA metabarcoding by Gibson et al. (2020), demonstrating its potential for use in studies of soil biodiversity.
When it comes to forecasting invasive species and finding rare and cryptic species, ML-based eDNA metabarcoding performs better than traditional methods. This technique also examines large numbers of environmental DNA samples, providing a comprehensive portrait of the biodiversity in a given area.
Large training datasets, sampling standardization, DNA extraction, PCR amplification and sequencing methods, and eDNA metabarcoding based on machine learning are all required. This method can significantly improve biodiversity assessments and conservation efforts if proper rules are established.
This review paper compiles data on eDNA metabarcoding based on ML. The technique, taxa of interest, ML algorithms used, and outcomes of these studies all provide light on the strengths and shortcomings of this approach.
More comprehensive training datasets are needed, particularly for rare and elusive species, and scientists have emphasized reducing bias and mistakes. Future research may improve the accuracy and efficiency of ML-based eDNA metabarcoding by addressing these issues.
The study helps with biodiversity assessment and ecological monitoring. It is essential because technology changes quickly, and we need more effective and efficient ways to save species. A road map for developing and using ML-based eDNA metabarcoding is provided by recent results, critical problems, and future research possibilities. In contrast to other studies, this one examines how current biodiversity assessments use ML-based eDNA metabarcoding. This assessment examines the utility of ML-based eDNA metabarcoding for ecological monitoring and conservation across a broad range of habitats and taxonomic categories.
Our research further highlights the need for well-tested procedures and large training datasets for ML-based eDNA metabarcoding. The need for this assessment is especially pressing given the novelty of topics like protocol standardization and the production of datasets.
Our findings further highlight the promise of ML-based eDNA metabarcoding to address gaps in current methods of assessing biodiversity, particularly regarding the discovery of rare and evasive species. It highlights the need to evaluate biodiversity and monitor the environment continually.
Our research contributes to the growing body of literature on ML-based eDNA metabarcoding for biodiversity assessment, demonstrating how this method can improve our understanding of biological communities and our ability to protect them.