The results of the assessment of the profilers against the experimental data for mutagenicity, carcinogenicity and skin sensitisation are shown in Tables 3, 4 and 5 respectively. Further detailed analysis was undertaken to identify over-predicting structural alerts in the carcinogenicity profilers i.e. with the Precision or PPV (positive predictive value) lower than 0.5. This analysis was performed to determine structural alerts with little information or predictive capability to increase the sensitivity and overall accuracy of the profiler. Detailed analysis of 13 non-genotoxic carcinogenicity structural alerts was conducted and results are presented in Table 6.1 and 6.2.
An additional analysis for the Oncologic Primary Classification carcinogenicity profiler was performed for 30 structural alerts incorporated in the profiler. The purpose of this analysis was to assess which of the structural alert(s) had a precision PPV (positive predictive value) lower than 0.5. This analysis is presented in Table 6.3.
The cutoff value was set to be 0.5 to ensure that none of these profilers had a lower predictive power compared to bacterial Ames test, which has also been applied to predict carcinogenicity of genotoxic substances in rodents. The high predictive power of positive Ames, which ranges from 77–90% depending on various factors, makes it superior to any other in vitro genotoxicity assay, all of which have a relatively lower performance in terms of predicting genotoxicity (Kazius et al., 2006).
Mutagenicity profilers
The results shown in Tables 3 indicated that the accuracy (percentage of positives and negatives correctly predicted) of the mutagenicity profilers varies across the datasets from 51–76%. Clearly, whilst 76% can be accepted because it is in line with the level of error generally seen in the experimental data in most databases, 51% is barely better than chance. The micronucleus alerts appear to be general and, as such, significantly over-predict mutagenicity. The most common alert triggered by this profiler is “Hacceptor-path3-Hacceptor”. This alert indicates the non-covalent binding of the target chemical to DNA via two bonded atoms connecting two H bond acceptors (Snyder et al. 2006). However, it appears that such a functional grouping is common in both mutagens and non-mutagens. It is likely that the performance of this profiler would improve if this specific alert was omitted.
As expected, both DNA binding profilers work best with the data obtained from Ames type tests (due to the availability of large databases for this test), but do not perform well for chromosome aberration or micronucleus data.
The genotoxicity and non-genotoxicity alerts (ISS) have acceptable true positive results but fail to distinguish the negatives. Overall, these alerts perform best with Ames type data.
The OASIS DNA alerts for Ames, micronucleus and chromosomal aberration predict the results in Ames datasets fairly well but for both micronucleus data and chromosomal aberration data, these profilers underpredict positive compounds with sensitivity rates ranging from 36 to 44%. The ISS Ames test alerts have accuracies over 70% for Ames datasets, which together with MCC values greater than 0.5, indicate that the performance is independent of skewed sample categories. It, however, needs to be noted that the micronucleus/ CA alerts may not be suitable predictors of Ames. They need to be considered separately and used, along with Ames, to develop the overall weight of evidence.
The receiver operating characteristic (ROC) curve, which is defined as a plot of test sensitivity as the y coordinate versus its 1-specificity or false positive rate (FPR) as the x coordinate, is an effective method of evaluating the quality or performance of diagnostic tests which in this case is an in silico profiler. As it is shown in Fig. 1, the ROC curve analysis showed that Ames ISS profiler achieved the highest balanced accuracy for both true positive rate and low false positive rate values.
Carcinogenicity profilers
Both DNA binding profilers performed equally poor with carcinogens and non carcinogens from all datasets, with accuracy values rarely above 60% and, MCC values indicating a performance barely better than chance.
The ISS carcinogenicity alerts fared a little better in predicting carcinogens, but showed a poor segregation of non-carcinogens reduced the overall effectiveness of this profiler with accuracy levels between 57% and 68% for the sample datasets, and modest to poor performance on skewed datasets as indicated by the MCC values of 0.17 to 0.36. The ROC analysis, shown in Fig. 2, indicate that ISS carcinogenicity profiler was the highest quality performance profiler compared to the other 3 profilers in terms of both true positive rate and false positive rate.
13 Non-genotoxic carcinogenicity (NGC) structural alerts among ISS carcinogenicity profiler were analysed individually to test their performance by PPV (positive predictive value). As shown in Table 6.1, the overall PPV of ISS non-genotoxic carcinogen structural alerts was 57%, where 326 substances are truly predicted as nongenotoxic carcinogen out of total 570 substances that had been identified to contain one of the 13 NGC structural alerts.
The precision value (PPV) for non-genotoxic carcinogenicity structural alert ranged from 0.92 for Trichloro (or fluoro) ethylene and Tetrachloro (or fluoro) ethylene as a highest PPV to 0.39 for Quercetin type flavonoids. Four out of 13 NGC structural alerts show over-prediction. These are thiocarbonyl, substituted n-alkylcarboxylic acids, quercetin type flavonoids, and phtalate (or butyl) diesters and monoesters. All of these four structural alerts predict non carcinogenic substances as carcinogens in more than 50% of the total substances that contain this structural alert, which lowers the total accuracy of the ISS carcinogenicity profiler.
Thiocarbonyl NGC structural alert was flagged in 106 substances as the only structural alert. Any substance that contained more than one NGC structural alert was not counted in this analysis to avoid any interference. Sixty two non-carcinogenic substances were falsely predicted as carcinogenic substances by thiocarbonyl structural alerts with sensitivity rate of 0.42. Due to this over-prediction of thiocarbonyl alert, it can be suggested that taking this alert out could increase sensitivity of the total NGC structural alerts and the ISS carcinogenicity profiler. This would, however, not be the ideal solution, as any thiocarbonyl NGCs would then be completely out of the scope of the profiler. Instead, it is proposed that further research be carried out to see whether performance of this alert can be improved using a larger database of thiocarbonyl substances.
Likewise, the other three NGC structural alerts; i.e. substituted n-alkylcarboxylic acids, quercetin type flavonoids and phthalate (or butyl) diesters and monoesters also showed a precision value lower than 0.5 with a 0.42, 0.39 and 0.38 respectively. Again ignoring these four structural alerts increased the total precision value of NGC structural alerts and consequently of the performance of the ISS carcinogenicity profiler. The results (Table 6.2) showed that the precision value of ISS non-genotoxic structural alert was improved from 0.57 to 0.64 by ignoring the 4 structural alerts. However, for the reasons mentioned before, it is proposed that further research should be carried out to improve the performance of these alerts within the profiler.
The Oncologic primary classification alert over-predicted carcinogens with sensitivity rates of 66–73% at the expense of poor prediction of non-carcinogens (30–46%), resulting in overall performance, which is barely better than chance for most of the datasets. All 30 structural alerts in Oncologic primary classification profiler were individually analysed for their precision as shown in Table 6.3. The four structural alerts showed over-prediction of non-carcinogenic substances as carcinogenic in more than 50% of the total substances containing this structural alert. These four alerts were carbamate type compounds, organophosphorus type compounds, peroxide type compounds and reactive ketone reactive functional groups. Carbamate type compounds structural alert for carcinogenicity was triggered in 63 substances with only 22 true positive carcinogenic substances. The other 43 substances that were flagged by this alert to be carcinogenic were non-carcinogenic in real experimental tests. This gave a low precision value for carbamate type compounds of 0.35. The second structural alert in oncologic primary classification profiler with low precision value (lower than 0.36) was organophosphorus type compounds structural alert, where 80 substances out of 126 flagged by this alert were wrongly predicted as carcinogenic substances. Peroxide type compounds structural alert showed only 0.27 precision (positive predictive value) rate with only 3 correctly predicted carcinogenic substances out of 11 substances flagged by the alert. The lowest structural alert in precision within oncologic primary classification profiler was for reactive ketone functional groups with only 0.1 where there was an over-prediction for 19 out of total 21 flagged substances. This mean that only 2 substances that were flagged by this alert were correctly predicted as carcinogenic substances out of the total 21 substances.
It can therefore be suggested that all four of these structural alerts could be ignored from the oncologic primary classification profiler to increase the total sensitivity and accuracy of the profiler. Indeed, as shown in Table 6.3, the total precision of the profiler was improved by 0.023 which is nearly 3% improvement in the overall performance of the profiler.
This would, however, not be the ideal solution, as this will miss carcinogenic compounds and would be completely out of the scope of the profiler. Therefore, it is proposed that further research be carried out to see whether performance of these four alerts can be improved using a larger database of thiocarbonyl substances.
Skin sensitisation profilers
The performance of the protein binding profilers was not found to be consistent across the sampled datasets. For the CAESAR dataset, these profilers tend to have a low predictivity for non-sensitisers, whilst for the other datasets it is the sensitisers that are not well predicted. Overall, the performance of these predictors is moderate to poor for all the datasets.
The DPRA lysine peptide depletion profilers showed a similar pattern, with performance being uniformly not better than chance for the CAESAR dataset but highly under-predictive for sensitisers in the other datasets. The protein binding potency profiler is also uniformly poor across all datasets, failing to detect the majority of sensitisers, with sensitivity rates between 12% and 29%. A similar pattern is seen with the keratinocyte gene expression profiler, where sensitivity rates were 19–43%.
The overall ROC analysis for all seven profiler shown in Fig. 3 indicated that protein binding OASIS has a relatively better performance compared to the other skin sensitization profilers in OECD Toolbox.
The study also carried out further analysis to investigate the contribution of different alerts within a given profiler on its overall performance. For this purpose, non-genotoxic carcinogenicity profilers were investigated as an example. This was because, under most EU regulatory frameworks, tests for carcinogenicity are only required when there is either a positive in vitro mutagenicity/genotoxicity test, or there are indications of carcinogenic effects from long term in vivo studies. This means that, whilst the current risk assessments framework would identify genotoxic carcinogens, it is possible that the carcinogenic effects of a nongenotoxic mechanism will not be identified without in vivo tests that are increasingly restricted or banned under different regulatory frameworks.