Each dataset is randomly divided into a training set (80%) and a test set (20%), and the experimental results of all models below are obtained using the same dataset. For the dataset identifying superconductors and non-superconductors, the output label of superconductors is set to 1 and the output label of 9399 insulators is set to 0. After training with the XGBoost classification model, the results are shown in Figure 1, and the AUC and TPR in the test set are 0.98 and 98.48% respectively. The results compared with other models are shown in Table I, which shows that the AUC and TPR of the XGBoost model are better than those of the ATCNN model [23], the only similar literature to our knowledge.
Table I Experimental results of different classification models
Models
|
AUC
|
TPR (%)
|
GBDT
|
0.96
|
96.47
|
ATCNN[23]
|
0.97
|
93.78
|
SVR
|
0.97
|
91.9
|
DecisionTree
|
0.98
|
97.84
|
RandomForest
|
0.98
|
98.23
|
XGBoost
|
0.98
|
98.48
|
For the collated superconducting critical temperature dataset, this manuscript proposes to use the atomic frequency as the input of the deep learning model and the deep learning model we chose is deep forest. The experimental results are shown in Figure 2(a), and in the test set, the value of R2, MAE and RMSE are 0.944, 4.04, and 7.51 respectively. Using the same training set and test set, the comparison results with similar literature are shown in Table II with the bolded indicating the best result in that column. In this manuscript, all algorithms in Table II are implemented and tested in the same hardware and software environment.
Table II Critical temperature prediction results on the test set
Network Model
|
R2
|
MAE
|
RMSE
|
SVM[22]
|
0.714
|
11.36
|
17.06
|
K Nearby
|
0.891
|
5.88
|
10.40
|
DecisionTree
|
0.888
|
5.40
|
10.50
|
GBDT
|
0.852
|
8.33
|
12.16
|
ExtraTree
|
0.892
|
5.32
|
10.40
|
1DCNN
|
0.891
|
6.25
|
10.54
|
Artificial Neural Networks
|
0.896
|
6.15
|
10.64
|
XGBoost
|
0.916
|
5.71
|
9.34
|
Bagging
|
0.918
|
4.80
|
9.07
|
ATCNN[23]
|
0.891
|
5.88
|
10.40
|
Sub-network
|
0.920
|
5.41
|
8.78
|
Random Forest[3]
|
0.928
|
4.58
|
8.53
|
Deep Forest
|
0.945
|
4.04
|
7.51
|
Table II shows that the value of R2, MAE, and RMSE of the test set of the deep forest model are 0.945, 4.04, and 7.51 respectively, which are optimal in the published literature.
The scatter plot of the material Fermi energy level is shown in Figure 2(b) and the final results obtained using different models are shown in Table III with the bolded indicating the best value in that column. From Table III, it can be seen that the sub-network model can achieve more accurate predictions of the Fermi energy levels. For the Fermi energy level dataset, the MAE of ExtraTree and the sub-network model are both 0.1, which is lower than the MAE of DFT in literature [29] and the 0.15 of ElemNet model in literature [24]. This also means that the accuracy of the model used in this manuscript exceeds the DFT calculation. The R2, MAE, and RMSE value of the sub-network model can also be seen in Table III at 0.984, 0.10, and 0.14 respectively.
Table III Fermi energy level prediction results on test set
Network Model
|
R2
|
MAE
|
RMSE
|
1DCNN
|
0.952
|
0.17
|
0.25
|
KNN
|
0.867
|
0.26
|
0.40
|
SVM[16]
|
0.887
|
0.24
|
0.38
|
ExtraTree
|
0.966
|
0.10
|
0.20
|
XGBoost
|
0.938
|
0.20
|
0.27
|
ElemNet[24]
|
-
|
0.15
|
-
|
ATCNN[23]
|
0.965
|
0.13
|
0.21
|
Random Forest[22]
|
0.955
|
0.13
|
0.23
|
DFT[29]
|
-
|
0.136~0.81
|
-
|
Deep Forest
|
0.977
|
0.11
|
0.17
|
Sub-network
|
0.984
|
0.10
|
0.14
|
In this manuscript, the atomic frequency number is used as the input for the deep learning model. The deep forest model is chosen to predict the forbidden band width of the material. In the test set, the MAE, RMSE, and R2 of the deep forest model are 0.27, 0.44, and 0.917 respectively. The corresponding scatter plot is shown in Figure 2(d). The results are shown in Table IV with the bolded indicating the best value of the column.
Table IV The forbidden band width prediction results on test set
Network Model
|
R2
|
MAE
|
RMSE
|
SVM[16]
|
0.555
|
0.59
|
0.56
|
ExtraTree
|
0.617
|
0.44
|
0.83
|
1DCNN
|
0.649
|
0.52
|
0.82
|
KNN
|
0.697
|
0.45
|
0.79
|
ATCNN[23]
|
0.814
|
0.35
|
0.63
|
Random Forest[22]
|
0.811
|
0.34
|
0.64
|
XGBoost
|
0.818
|
0.39
|
0.64
|
Bagging
|
0.825
|
0.34
|
0.64
|
Sub-network
|
0.866
|
0.34
|
0.55
|
CGCNN[30]
|
-
|
0.388
|
-
|
DFT[29]
|
-
|
0.6
|
-
|
Deep Forest
|
0.917
|
0.27
|
0.44
|
As shown in Table IV, the MAE, RMSE, and R2 of the deep forest model all outperform other machine learning models such as ATCNN, CGCNN, and SVM. It also outperforms the DFT calculation method, which shows that the accuracy of the deep forest model has surpassed the accuracy of the DFT calculation method.
In order to verify the effectiveness of the depth forest model in predicting the critical temperature of superconductivity, four materials were extracted from [21], all of which have measured critical temperature values. The values predicted using the depth forest model in this manuscript were compared with literature [21] and the results are shown in Table V, with the bolded indicating the best value in the row. Table V shows that the prediction error of the deep forest model for CaBi2 is much lower than that of literature [21] and the prediction errors of the remaining materials are within 3%.
Table V Critical temperature verification results
Materials
|
Actual Tc value(K)
|
The literature [21] predicted Tc values(K)
|
Deep forest model predicted Tc values(K)
|
CaBi2
|
2
|
14.85(642%)
|
5.91(195%)
|
HfV4Zr
|
10
|
10.17(1.7%)
|
9.92(0.8%)
|
Au0.5Nb3Pt0.5
|
10
|
10.13(1.3%)
|
10.30(3%)
|
Hf0.5Nb0.2V2Zr0.3
|
10
|
10.11(1.1%)
|
9.83(1.7%)
|
To further verify the practicality of the model, in this manuscript, for the 100,000 materials in the COD database[31], each material is first identified using the XGBoost model. If it is a superconducting material, the critical temperature value is then predicted using the deep forest model. Finally, the candidate superconducting materials with a critical temperature greater than 90 K are screened with a total of 50 materials satisfying the conditions collected. The detailed results are shown in the Appendix.