After the process of research methods and steps in the previous chapter, the research results and experimental results are presented and analyzed in the following chapters. Although the results of some experiments are not significant, as long as each link can be optimized, it will eventually bring Great improvement.
4.1 Feature construction and selection
As shown in Fig. 5, using the random forest model to directly draw the top ten features, it is found that the features on the weight map do not match the expectations, not the original data field names, and the importance proportion is also very low. Most of the researched data are categorical fields, which have been converted into multiple fields after one hot encoding (One Hot Encoding). Observing in this state, it is impossible to see the importance of the original data fields.
As shown in Fig. 6, the importance of each original feature can be seen by adding up the importance value of each derived field based on the original field. It can be seen from the Fig. that SN_NAME has the highest importance, which is higher than SN_TITLE. In fact, SN_TITLE is the word displayed on the label. However, in the setting, SN_TITLE may be empty, which causes its importance to decrease, even after compensation , The effect is also limited; although SN_NAME may be an alias, the same object has a higher correlation with the target variable due to the user’s setting habits. The importance of ACTION_TYPE is the lowest. After analysis, it is found that the label content of the new part number may directly copy the label content of similar products and change it, resulting in the inability to clearly distinguish the new version and the entry situation, which is of little help to the prediction model. The importance of PN_RANK is not as expected. After analysis, it is found that the label style is mainly affected by the process stage, and there is no obvious rule restriction on the variable content.
4.2 Common model modeling and comparison
The control group in this experiment uses statistical data in practice. Each time the form is entered, the average change content is 20%. If the entire form is sent directly without modification, the accuracy rate is 80%, but in fact, there are new materials. The number needs to be considered, this value will only be lower.
Table 11 Experimental results of each artificial intelligence model
|
All
|
BOX
|
CARTON
|
CB_SN
|
FCC
|
PALLET
|
SN
|
Others
|
Blind guess
|
0.112
|
0.175
|
0.252
|
0.217
|
0.154
|
0.188
|
0.222
|
X
|
Base
|
0.800
|
KNN
|
0.857
|
0.912
|
0.850
|
0.925
|
0.762
|
0.844
|
0.722
|
0.868
|
SVM
|
0.269
|
0.383
|
0.404
|
0.831
|
0.282
|
0.584
|
0.280
|
0.435
|
Decision tree
|
0.199
|
0.297
|
0.390
|
0.743
|
0.252
|
0.568
|
0.447
|
0.473
|
Random forest
|
0.879
|
0.933
|
0.876
|
0.940
|
0.754
|
0.874
|
0.794
|
0.868
|
GBDT
|
0.764
|
0.865
|
0.803
|
0.920
|
0.730
|
0.853
|
0.686
|
0.882
|
XGBoost
|
0.839
|
0.917
|
0.845
|
0.957
|
0.786
|
0.851
|
0.787
|
0.847
|
NN
|
0.888
|
0.896
|
0.883
|
0.931
|
0.838
|
0.896
|
0.802
|
0.860
|
As shown in Table 11, the horizontal axis of this table is of various types: All means all types; Others is the remaining data after excluding the six types. The vertical axis is the control group and various models. Looking at the vertical axis, it can be seen that regardless of the model, the accuracy of CB_SN is very high. It can be found that the target field of CB_SN has the least type, but in addition to looking at the target field type, attention should also be paid to the amount of data in Table 6. Divide the number of target fields of each type by the ratio of the amount of data to sort to get: CB_SN(83.47) > CARTON(74.53)> BOX(56.78)> PALLET(55.57)> All (48.65)> SN(28.82)> FCC(23.77). Table
It can be found that the higher the data ratio, the more data available for training, the higher the accuracy rate, but because there is still a problem of sparseness between the amount of data and the target field that needs to be considered, it is not absolute. The CARTON data ratio is the 2nd, and the BOX data ratio is the 3rd. However, the performance of these two types in SVM and decision tree is worse than that of the 4th PALLET in the data ratio. Looking at the horizontal axis, SVM and decision trees are generally poor; GBDT's performance is not satisfactory; the accuracy of KNN, random forest, XGBoost, NN and other models exceeds the control group, reaching the reference standard, and the performance is good, each with its own advantages and disadvantages.
4.3 Evaluation of the effectiveness of data pre-processing
In the previous step to clarify the importance of features, you can know that SN_TITLE and SN_NAME are the more important fields, and these two fields have room for further optimization. After compensation and processing of fuzzy data, each artificial intelligence model is retrained and observed. Accuracy. Complementary value method: SN_NAME is required, but may be an alias; SN_TITLE is not required and may be empty. Fill in the empty SN_TITLE into the data in the SN_NAME field. Fuzzy processing method: There is no mandatory and standardized way to fill in these two fields, as long as users of each station can understand it, and there is no restriction on symbols and capitalization. Depending on the user's habits, there may be several types of filling for the same object. Therefore, after removing the symbols in the 2 fields, only letters, numbers, and Chinese are left, and then converted to uppercase, so that the fuzzy approximate data can be integrated into consistent data.
Table 12 Experimental results after data processing
|
All
|
BOX
|
CARTON
|
CB_SN
|
FCC
|
PALLET
|
SN
|
Others
|
KNN
|
+0.013
|
-0.003
|
+0.004
|
+0.009
|
+0.012
|
+0.016
|
+0.027
|
+0.006
|
|
SVM
|
+0.218
|
+0.183
|
+0.143
|
+0.097
|
+0.143
|
+0.018
|
+0.093
|
+0.048
|
|
Decision
tree
|
+0.090
|
+0.096
|
+0.064
|
+0.111
|
+0.180
|
-0.002
|
+0
|
+0.018
|
|
Random
forest
|
+0.001
|
-0.026
|
-0.005
|
+0.003
|
+0.022
|
+0.020
|
+0.006
|
+0.025
|
|
GBDT
|
+0.003
|
+0.049
|
+0.023
|
+0.014
|
+0.020
|
+0.005
|
+0.057
|
-0.016
|
|
XGBoost
|
+0.035
|
+0.011
|
+0.014
|
-0.029
|
-0.03
|
-0.002
|
-0.014
|
-0.011
|
|
NN
|
+0.006
|
+0.010
|
-0.015
|
+0.014
|
+0.007
|
+0
|
+0.039
|
+0.011
|
|
As shown in Table 12, the horizontal axis of this table is various types, the vertical axis is the control group and various models, and the data column is the accuracy rate change value. It can be seen that SVM and decision tree have increased significantly, and the accuracy rate has increased by up to 21.8%; as for KNN, random forest, GBDT, XGBoost and NN, the impact is less significant. Some models may have already learned the upper limit on the existing data and features, so there is not much room for optimization, or some models such as random forest can deal with the problem of missing values or fuzzy data.
4.4 Evaluation of the effectiveness of circuit training
The experiment in this section only tries to optimize the neural network model such as the best overall performance, explains the purpose of the loop test, and analyzes and discusses the experimental results.
4.4.1 Comparison of effectiveness evaluation of test set segmentation methods
After testing the neural network, it was found that the data was too optimistic and inconsistent with the actual application. After analysis, it was found that the data has time characteristics. If the training set and the test set are randomly selected, there may be future data of the test data in the training set. Seeing the answer before verification may cause the model to get out of control, overfitting, and failing to learn correctly. Therefore, the experiment was designed to clarify the actual situation based on the test set segmentation method.
Table 13 Experimental results of test set segmentation method
|
All
|
BOX
|
CARTON
|
CB_SN
|
FCC
|
PALLET
|
SN
|
Others
|
random segmentation
|
0.959
|
0.956
|
0.960
|
0.980
|
0.940
|
0.957
|
0.910
|
0.922
|
time segmentation
|
0.894
|
0.906
|
0.868
|
0.945
|
0.845
|
0.896
|
0.841
|
0.871
|
As shown in Table 13, the horizontal axis of this table is the various types, and the vertical axis is the test set segmentation method. It can be clearly seen that the accuracy of all types in the random segmentation test set is better than the accuracy rate of the time segmentation test set. It is much higher. The accuracy of the random segmentation test set is all greater than 90%, the CB_SN even reaches 98% of the data, and the accuracy of the time segmentation test set falls between 84% and 95%.
4.4.2 Comparison of effectiveness evaluation of circuit training
80% of the entire data is used for training, and only 20% is used for verification. Because the data in this study is special, the field information is mostly categorical data, and the interval time of each material number is inconsistent, resulting in uneven data distribution, And due to time characteristics, the test data cannot be randomly selected. In this case, the data diversity in the test set is low. In order to solve this problem, this experiment was designed. In order to increase the diversity of the test set, partial data is used section by section through a loop, so that more data can have the opportunity to act as the test data.
Table 14 Results of the loop training experiment
|
All
|
BOX
|
CARTON
|
CB_SN
|
FCC
|
PALLET
|
SN
|
Others
|
NN Base
|
0.894
|
0.906
|
0.868
|
0.945
|
0.845
|
0.896
|
0.841
|
0.871
|
|
Rolling 2
|
0.913
|
0.932
|
0.910
|
0.920
|
0.830
|
0.901
|
0.791
|
0.866
|
|
Rolling 3
|
0.920
|
0.952
|
0.936
|
0.948
|
0.823
|
0.946
|
0.785
|
0.847
|
|
Rolling 4
|
0.911
|
0.971
|
0.928
|
0.943
|
0.837
|
0.909
|
0.773
|
0.893
|
|
Rolling 5
|
0.909
|
0.973
|
0.937
|
0.942
|
0.804
|
0.887
|
0.716
|
0.879
|
|
Rolling 6
|
0.904
|
0.952
|
0.966
|
0.932
|
0.872
|
0.864
|
0.714
|
0.841
|
|
Rolling 7
|
0.897
|
0.944
|
0.943
|
0.920
|
0.894
|
0.843
|
0.833
|
0.851
|
|
Rolling 8
|
0.913
|
0.957
|
0.950
|
0.886
|
0.831
|
0.875
|
0.785
|
0.808
|
|
Rolling 9
|
0.928
|
0.935
|
0.916
|
0.846
|
0.918
|
0.800
|
0.783
|
0.785
|
|
Rolling 10
|
0.912
|
0.938
|
0.944
|
0.885
|
0.848
|
0.866
|
0.705
|
0.763
|
|
As shown in Table 14, the horizontal axis of this table is various types, and the vertical axis is the number of cycles. In this experiment, the total data volume of the test set is the same as that of the control group. It can be proved that only the test set is dispersed and increased diversity. It can indeed improve results. Only the results of the SN type are lower than the control group, which may be related to the fact that most of the data types are special cases. It can be seen from the table that the best result has no significant relationship with the number of cycles, and it is speculated that it should be more related to the diversity of the test set distribution.
4.4.3 Comparison of effectiveness evaluation of circuit training
In order to increase diversity, design different methods of using partial data, so that more data can have the opportunity to act as test data. In order to avoid the use of partial data, the amount of data is too small, resulting in the model is memorizing answers, this experiment only uses all types to carry out. The control group in this experiment is the basic neural network model and method one of the previous experiment. Method one is that all data is used only once regardless of training or testing; method two is that all data will be put into the training set once; method three is that all data will be used once in the test set; method four is that all data will be used in the test set Once, the data used each time is used as the next training set, and a certain percentage of the new data is taken as the test set. Part of the information at the beginning and end may be skipped.
Table 15 Advanced experimental results of circuit training
Data volume
|
Training set
|
Test set
|
NN
Base
|
Method 1 Base
|
Method 2
|
Method3
|
Method
4
|
1/2
|
9796
|
2450
|
0.894
|
0.913
|
0.917
|
0.922
|
0.899
|
1/3
|
6531
|
1633
|
0.920
|
0.931
|
0.923
|
0.841
|
1/4
|
4898
|
1225
|
0.911
|
0.945
|
0.942
|
0.864
|
1/5
|
3918
|
980
|
0.909
|
0.913
|
0.923
|
0.862
|
1/6
|
3265
|
817
|
0.904
|
0.958
|
0.951
|
0.893
|
1/7
|
2798
|
700
|
0.897
|
0.899
|
0.962
|
0.767
|
1/8
|
2448
|
613
|
0.913
|
0.941
|
0.928
|
0.795
|
1/9
|
2176
|
545
|
0.928
|
0.917
|
0.919
|
0.791
|
1/10
|
1959
|
490
|
0.912
|
0.918
|
0.926
|
0.796
|
As shown in Table 15, the horizontal axis of this table is the amount of data for each training of methods one to three, the control group and various methods, and the vertical axis is the proportion of the amount of data used for each cycle training. From this table, it can be observed that the results of method two and method three are better than those of the control group, but only looking at the data, there is no obvious advantage or disadvantage between method two and method three. Because the research data of this experiment is not enough, no further verification is possible. The result is generally lower than that of the control group. After analysis, it is speculated that the model will have problems because each training will be trained to the first data. Observing the learning curve, it is found that the learning curve of method 2 is relatively normal, and the learning curve of method 3 is suspected of overfitting. Therefore, method 2 is selected, and the amount of data is 1/6 which is the better parameter for the experiment.
4.5 Evaluation of the effectiveness of model optimization
The experiments in this chapter are aimed at optimizing neural network models such as the best overall performance, and discuss and compare the results after sorting out the experimental results.
4.5.1 Comparison of effectiveness evaluation of adjustment generations
This experiment focuses on the generation parameters and observes the changes in the learning curve.
As shown in Fig. 7, Fig. 7a is the learning curve loss graph, and Fig. 7b is the learning curve accuracy graph. From Fig. 7a, it can be observed that the learning curve slowly rises all the way after the reversal, and cannot fall until after 60 It can be seen from Fig. 7b that the learning curve is constantly fluctuating, but there is no obvious trend overall. It is the highest when the generation is 35, and it becomes lower after 60. It can be seen from the Fig. 7 that there is a gap between the training learning curve and the test learning curve, and convergence can be achieved quickly. If there are too many generations, it may cause the neural network model to overfit the answer.
4.5.2 Comparison of effectiveness evaluation of adjusting the test set proportion
This experiment adjusts the test set ratio and observes the changes in accuracy.
Table 16 Test result table of test ratio
|
All
|
BOX
|
CARTON
|
CB_SN
|
FCC
|
PALLET
|
SN
|
Others
|
0.1
|
0.917
|
0.934
|
0.912
|
0.914
|
0.831
|
0.878
|
0.833
|
0.855
|
0.15
|
0.893
|
0.888
|
0.929
|
0.954
|
0.843
|
0.880
|
0.788
|
0.871
|
0.2
|
0.894
|
0.906
|
0.868
|
0.945
|
0.845
|
0.896
|
0.841
|
0.871
|
0.25
|
0.806
|
0.892
|
0.874
|
0.947
|
0.803
|
0.879
|
0.803
|
0.858
|
0.3
|
0.813
|
0.822
|
0.881
|
0.939
|
0.748
|
0.875
|
0.792
|
0.884
|
As shown in Table 16, the horizontal axis of this table is each type, and the vertical axis is the test ratio. Under the All type, 0.1> 0.2> 0.15>0.3> 0.25; for BOX type, 0.1> 0.2> 0.25> 0.15> 0.3; for CARTON type, 0.15> 0.1> 0.3>0.25> 0.2; for CB_SN type, 0.15> 0.25> 0.2> 0.3> 0.1; for FCC type, 0.2> 0.15> 0.1> 0.25> 0.3; for PALLET type, 0.2> 0.15> 0.25> 0.1> 0.3; for SN type, 0.2> 0.1> 0.25> 0.3> 0.15; Under Others type, 0.3> 0.2 = 0.15> 0.25> 0.1.
At first glance, the smaller the test ratio, the higher the accuracy rate. However, after analysis, it is obvious that the high accuracy rate is only an illusion when the test ratio is small. As the test ratio is less, the diversity of test data is lower. The more difficult it is to verify the correctness of the data; if the test ratio is too high, it will in turn lead to too low diversity of training data, resulting in a sharp drop in accuracy, so 0.2 is the better test ratio in the end.
Table 17 Test proportions in the loop training experiment result table
|
NN Base
|
Rolling(1,9)
|
Rolling(2,6)
|
Rolling(3,7)
|
0.1
|
0.917
|
0.963
|
0.970
|
0.834
|
0.15
|
0.893
|
0.929
|
0.936
|
0.923
|
0.2
|
0.894
|
0.928
|
0.958
|
0.962
|
0.25
|
0.806
|
0.892
|
0.932
|
0.940
|
0.3
|
0.813
|
0.874
|
0.933
|
0.923
|
As shown in Table 17, due to the numerous combinations of loop training methods and data volume, this experiment only takes the methods and data volume parameter sets that perform well in the previous loop training experiments for further attempts. The horizontal axis of this table is the control group and cyclic training parameters, the first parameter is the method, and the second parameter is the data volume denominator; the vertical axis is the test set ratio, under NN Base, 0.1> 0.2> 0.15> 0.3> 0.25; Under Rolling(1,9), 0.1> 0.15> 0.2> 0.25> 0.3; under Rolling(2,6), 0.1> 0.2> 0.15> 0.3> 0.25; under Rolling(3,7), 0.2> 0.25> 0.15 = 0.3> 0.1. The overall effect is better with a result of 0.2. At first glance, the smaller the test ratio, the higher the accuracy rate. However, after analysis, it is obvious that the high accuracy rate is only an illusion when the test ratio is small. As the test ratio is less, the diversity of test data is lower. The more difficult it is to verify the correctness of the data; if the test ratio is too high, it will in turn lead to too low diversity of training data, resulting in a sharp drop in accuracy, so 0.2 is the better test ratio in the end.
4.5.3 Comparison of effectiveness evaluation of adjusting trigger function
This experiment adjusts the excitation function and observes the changes in accuracy.
Table 18 Excitation function experiment results table
|
All
|
BOX
|
CARTON
|
CB_SN
|
FCC
|
PALLET
|
SN
|
Others
|
softmax
|
0.129
|
0.124
|
0.233
|
0.244
|
0.128
|
0.170
|
0.185
|
0.141
|
sigmoid
|
0.882
|
0.880
|
0.874
|
0.926
|
0.794
|
0.876
|
0.829
|
0.831
|
elu
|
0.885
|
0.896
|
0.877
|
0.946
|
0.835
|
0.894
|
0.838
|
0.852
|
relu
|
0.894
|
0.906
|
0.868
|
0.945
|
0.845
|
0.896
|
0.841
|
0.871
|
selu
|
0.891
|
0.919
|
0.887
|
0.951
|
0.855
|
0.894
|
0.826
|
0.874
|
As shown in Table 18, the horizontal axis of this table is each type, and the vertical axis is the excitation function parameter. Under the All type, relu> selu> elu> sigmoid> softmax; under the BOX type, selu> relu> elu> sigmoid> softmax; CARTON Under the type, selu> elu> sigmoid> relu> softmax; under the CB_SN type, selu> elu> relu> sigmoid> softmax; under the FCC type, selu> relu> elu> sigmoid> softmax; under the PALLET type, relu> selu = elu > sigmoid> softmax; for SN type, relu> elu> sigmoid> selu> softmax; for Others type, selu> relu> elu> sigmoid> softmax. On the whole, relu ≒ selu> elu> sigmoid> softmax, but the difference between relu, selu, elu, and sigmoid is not big, the difference is less than 1% for the All type; the difference is about 4% for the BOX type; the difference is about 4% for the CARTON type The difference is about 2%; the difference is about 2.5% for the CB_SN type; the difference is about 6% for the FCC type; the difference is about 2% for the PALLET type; the difference is about 1.5% for the SN type; the difference is about 4% for the Others type. Since the execution time of relu is faster, the time difference is about 20%, so relu is the better setting.
Table 19 Results of the excitation function in the loop training experiment
|
NN Base
|
Rolling(1,9)
|
Rolling(2,6)
|
Rolling(3,7)
|
softmax
|
0.129
|
0.278
|
0.365
|
0.763
|
sigmoid
|
0.882
|
0.897
|
0.946
|
0.957
|
elu
|
0.885
|
0.908
|
0.944
|
0.951
|
relu
|
0.894
|
0.928
|
0.958
|
0.962
|
selu
|
0.891
|
0.910
|
0.949
|
0.875
|
As shown in Table 19, due to the numerous combinations of loop training methods and data volume, this experiment only takes the methods and data volume parameter sets that perform well in the previous loop training experiments for further attempts.
The horizontal axis of this table is the control group and cyclic training parameters. The first parameter is the method, and the second parameter is the data volume denominator; the vertical axis is the excitation function, under NN Base, relu> selu> elu> sigmoid> softmax; Rolling Under (1,9), relu> selu> elu> sigmoid> softmax; under Rolling(2,6), relu> selu> sigmoid> elu> softmax; under Rolling(3,7), relu> sigmoid> elu> selu> softmax; the overall effect is the best result of relu. The difference between relu, selu, elu, sigmoid is not big, the difference is about 3% under Rolling(1,9); Rolling(2,6). The difference below is about 1.5%; the difference below Rolling(3,7) is about 9%. Since relu has better effect and faster execution time, relu is the better setting.
4.5.4 Comparison of effectiveness evaluation of adjusting the pruning ratio
This experiment adjusts the proportion of pruning and observes the changes in accuracy.
Table 20 Experimental results of pruning ratio
|
All
|
BOX
|
CARTON
|
CB_SN
|
FCC
|
PALLET
|
SN
|
Others
|
0.1
|
0.893
|
0.899
|
0.887
|
0.943
|
0.842
|
0.892
|
0.835
|
0.855
|
0.15
|
0.885
|
0.903
|
0.875
|
0.948
|
0.837
|
0.894
|
0.829
|
0.858
|
0.2
|
0.887
|
0.893
|
0.881
|
0.951
|
0.837
|
0.894
|
0.832
|
0.855
|
0.25
|
0.894
|
0.906
|
0.868
|
0.945
|
0.845
|
0.896
|
0.841
|
0.871
|
0.3
|
0.889
|
0.905
|
0.875
|
0.946
|
0.837
|
0.892
|
0.835
|
0.858
|
As shown in Table 20, the horizontal axis of this table is each type, and the vertical axis is the proportion of pruning. Under the All type, 0.25> 0.1> 0.3>0.2> 0.15; for BOX type, 0.25> 0.3> 0.15> 0.1> 0.2; for CARTON type, 0.1> 0.2> 0.15 =0.3> 0.25; for CB_SN type, 0.2> 0.15> 0.3> 0.25> 0.1; for FCC type, 0.25> 0.1 = 0.15 = 0.2= 0.3; in the PALLET type, 0.25> 0.15 = 0.2> 0.1 = 0.3; in the SN type, 0.25> 0.1 = 0.3> 0.2> 0.15; Under Others type, 0.25> 0.15 = 0.3> 0.1 = 0.2. On the whole, the results of the different pruning ratios are not very different, the difference is less than 1% in the All type; the difference is about 1% in the BOX type; the difference is about 2% in the CARTON type; the difference is about 1% in the CB_SN type; The difference is about 1% under the FCC type; the difference is about 0.5% under the PALLET type; the difference is approximately 1% under the SN type; the difference is approximately 1.5% under the Others type. In order to avoid over-fitting caused by too many neurons in the neural network, pruning is needed to prevent the neural network model from memorizing the answer. In different types, the difference between the best and the worst is only about 1%, and there are no obvious advantages and disadvantages. The neural network model of this study is not over-fitting, so it is not possible to optimize too much through this parameter. The overall effect is better to take 0.25.
Table 21 The proportion of pruning in the loop training experiment results table
|
NN Base
|
Rolling(1,9)
|
Rolling(2,6)
|
Rolling(3,7)
|
0.1
|
0.893
|
0.915
|
0.957
|
0.957
|
0.15
|
0.885
|
0.928
|
0.954
|
0.955
|
0.2
|
0.887
|
0.926
|
0.958
|
0.958
|
0.25
|
0.894
|
0.915
|
0.959
|
0.961
|
0.3
|
0.889
|
0.921
|
0.955
|
0.960
|
As shown in Table 21, due to the numerous combinations of loop training methods and data volume, this experiment only takes the methods and data volume parameter sets that perform well in the previous loop training experiments for further attempts. The horizontal axis of this table is the control group and the cycling training parameters. The first parameter is the method, and the second parameter is the data volume denominator; the vertical axis is the pruning ratio, under NN Base, 0.25> 0.1> 0.3> 0.2> 0.15; Under Rolling (1, 9), 0.15> 0.2
> 0.3> 0.1 = 0.25; under Rolling (2, 6), 0.25> 0.2> 0.1> 0.3> 0.15; under Rolling (3,7), 0.25
> 0.3> 0.2> 0.1> 0.15. The overall effect is the best with a result of 0.25. The results of the different pruning ratios are not very different, the difference is about 1% under Rolling(1,9); the difference is about 0.5% under Rolling(2,6); the difference is about 0.5 under Rolling(3,7) %. In order to avoid over-fitting caused by too many neurons in the neural network, pruning is needed to prevent the neural network model from memorizing the answer. In different types, the difference between the best and the worst is only about 1%, and there are no obvious advantages and disadvantages. The neural network model of this study is not over-fitting even after cyclic training. Therefore, it is not possible to optimize too much through this parameter. The overall effect is better to take 0.25.
4.5.5 Adjusting the depth of the network effectiveness evaluation comparison
The data in this study has been transformed or the input layer has a dimension of about 1,000, and the output layer has a dimension of about 256. This experiment adjusts the number of hidden layers to observe the changes in accuracy. Due to the limited amount of data in this study, if the depth of the network is too deep, the parameters will be too large and over-fitting. Therefore, this experiment only tested three hidden layers at most.
Table 22 Network depth experiment results table
|
All
|
BOX
|
CARTON
|
CB_SN
|
FCC
|
PALLET
|
SN
|
Others
|
1
|
0.889
|
0.889
|
0.887
|
0.926
|
0.790
|
0.890
|
0.838
|
0.844
|
2
|
0.894
|
0.906
|
0.868
|
0.945
|
0.845
|
0.896
|
0.841
|
0.871
|
3
|
0.892
|
0.908
|
0.883
|
0.960
|
0.833
|
0.899
|
0.823
|
0.852
|
As shown in Table 22, the horizontal axis of this table is each type, and the vertical axis is the number of hidden layers. Under the All type, 2> 3> 1; under the BOX type, 3> 2> 1; under the CARTON type, 1> 3> 2 ; Under CB_SN type, 3> 2> 1; Under FCC type, 2> 3> 1; Under PALLET type, 3> 2> 1; Under SN type, 2> 1> 3; Under Others type, 2> 3> 1 . On the whole, the difference between the results of each network depth is not big, the difference is less than 1% under the All type; the difference is about 2% under the BOX type; the difference is approximately 2% under the CARTON type; the difference is approximately 3.5% under the CB_SN type; The difference is about 5.5% for FCC type; about 1% for PALLET type; about 2% for SN type; and about 3% for Others type. In order to solve the problem of over complexity, the deeper the neural network is, the better, but if it is too deep, it may cause overfitting. It can be seen from the table that under different types, the best and the worst are not much different. There are no obvious advantages and disadvantages. The problem behind the problem to be solved in this research is not particularly complicated. The neural network model can learn well under the basic structure, so it is impossible to optimize too much through this parameter. The overall effect is the result of 2 hidden layers. Better.
Table 23 Network depth in the loop training experiment results table
|
NN Base
|
Rolling(1,9)
|
Rolling(2,6)
|
Rolling(3,7)
|
1
|
0.889
|
0.913
|
0.958
|
0.964
|
2
|
0.894
|
0.915
|
0.959
|
0.961
|
3
|
0.892
|
0.919
|
0.954
|
0.960
|
Due to the numerous combinations of loop training methods and data volume, this experiment only takes the methods and data volume parameter sets that perform well in the previous loop training experiments for further attempts. As shown in Table 23, the horizontal axis of this table is the control group and cyclic training parameters. The first parameter is the method and the second parameter is the denominator of the data volume; the vertical axis is the number of hidden layers, under NN Base, 2> 3> 1; under Rolling (1, 9), 3> 2> 1; under Rolling (2, 6), 2> 1> 3; under Rolling (3, 7), 1> 2> 3; the overall effect is better with two hidden layers, but the difference is very small. The difference between the results of each network depth is not big, the difference is about 0.5% under Rolling (1, 9); the difference is about 0.5% under Rolling (2, 6); the difference is about 0.5 under Rolling (3, 7) %.
In order to solve overly complex problems, the deeper the neural network-like depth is, the better, but too deep may cause over-fitting. As can be seen from the table, under different types, the difference between the best and the worst is only 0.5% are no obvious advantages and disadvantages. The problem to be solved in this study is not particularly complicated. The neural network model can learn well under the infrastructure even after it is trained in a loop. Therefore, it is impossible to optimize too much through this parameter, and the overall effect is better. The results of 2 hidden layers are better.