In the dominion of modern dairy farming, the integration of advanced technologies has brought about a paradigm shift in data-driven decision-making. This experimentation and result discussion section explores the application of big data analytics, coupled with the YOLOv5 algorithm, from the perspective of data quality management and risk assessment within dairy farming. The simulation of this experiment is implemented using Python software. Through a meticulous examination of feed behaviour analysis, this work proposes to shed light on the value of these tools in improving the productivity and efficiency of dairy farming operations. The study collected various images from (https://dataverse.nl/dataset.xhtml?persistentId=doi:10.34894/7M108F) for risk and feed behaviours analysis purposes.
Figure 2 provides an assessment of a machine learning or classification model's presentation when relating its predictions to the actual ground truth within a training dataset. Notably, in Fig. 2(a) the model correctly identified and classified 17,418 cases as true positives, accurately recognizing instances as they confirmed their positive stance on the matter. In Fig. 2(b), the model observes the model's performance evaluation by comparing its predictions against the actual dataset. The model correctly predicted 4,262 instances as positive, aligning with the actual positive cases in the dataset, representing accurate positive predictions. Furthermore, the model misclassified 2,330 cases as undesirable when they were reframed, in reality, positive in the actual dataset. Finally, the model correctly identified 3,023 instances as negative, recognizing cases as negative in the actual dataset. The misunderstanding matrix serves as a crucial tool for evaluating the presentation of classification models, commonly employed in the domains of machine learning and statistics.
Figure 3 explores the relationship between feed intake and the predicted probability, highlighting specific values of interest. The highest value of predicted probability, 0.0477, signifies a significant likelihood of a particular outcome or event related to feed intake. This value holds particular importance within the dataset, suggesting a potential focus area for further investigation or decision-making. Moving to the next highest value, which is 0.0417, find another notable point in the data where the predicted probability remains relatively high. This value could indicate a different level of significance or a potential threshold for decision-making. On the lower end, observe a value of 0.0250, representing a lower predicted probability. This lower probability could indicate a different scenario or outcome related to feed intake. Furthermore, the dataset presents yet another lower value of 0.0230.
Figure 4 presents a comprehensive feature importance plot that showcases the percentage gain attributed to various features in a dataset. Each feature's importance is assessed concerning the outcome or variable of interest, which in this case appears to be related to factors affecting cows, such as Eating time, Daily activity time, Body condition score, Daily rumination time, Ketosis risk, Drinking gulps, Dystocia score, Bolus, Mastitis risk, Chews per minute, and the season of calving. The plot provides valuable insights into which features have the most significant influence on the outcome variable, as indicated by their gain percentages. A higher gain percentage implies that a feature plays a more crucial role in influencing the outcome, making it a top priority for further investigation or consideration. Analysing this material can be pivotal in decision-making processes, as it helps identify key factors that contribute to specific outcomes related to cows.
Figure 5 explores the relationship between eating time, a variable of interest, and the associated label or outcome. Scattering plots are commonly used to display the distribution and potential patterns or correlations between two variables. The eating time represents the duration or frequency of eating behaviour in a dataset, while the label could signify a specific classification or outcome related to this behaviour. By plotting these variables on the same graph, can visually assess whether there are any discernible trends, clusters, or outliers in the data. Analysing this scattering plot can offer insights into how eating time may relate to the label or outcome, aiding in the understanding of any potential associations or patterns.
Figure 6 illustrates the relationship between daily rumination time and a labelled variable of interest. Scatter plots are valuable visual tools for exploring the correlation or patterns between two variables. In this case, Daily Rumination Time is plotted on one axis, while the labelled variable is represented on the other axis. This scatter plot allows us to assess any potential trends, clusters, or associations between Daily Rumination Time and the labelled variable. By investigating the dispersal of data points, can gain insights into whether there is a discernible relationship between these two factors and whether daily rumination time has any predictive value for the labelled variable.
Figure 7 represents the relationship between drinking time, the expected variable of interest, and a corresponding label or outcome. This information is valuable for identifying potential associations or dependencies between drinking time and the label, which can have implications in different domains, such as agriculture, health monitoring, or behavioural analysis, depending on the environment of the label and its relevance to the drinking behaviour of interest.
Figure 8 visualized the relationship between daily activity time and a corresponding label or variable of interest. Scatter plots are effective for understanding how one variable might impact or relate to another. Daily activity time is likely a measure of some aspect of the subject under study, such as an animal's behaviour or health status. The plot provides a graphical demonstration of daily activity time that varies across different values of the label. By examining this scatter plot, researchers and analysts can gain insights into any patterns, trends, or correlations between daily activity time and the label.
Figure 9(a) the comparison between predicted and actual values of eating time showcases a highly accurate predictive model. The actual eating time, recorded at 3.0688, closely aligns with the predicted value of 4.93791, as indicated by a remarkably low RMSE and MAE, both measuring at 0.01. Moreover, the R-squared value of 1.00 signifies a perfect suitability of the model to the observed data, emphasizing the exceptional precision of the predictive algorithm. Figure 9(b) observes a remarkable agreement between the predicted and actual values of daily rumination time, as indicated by an impressively high R-squared value of 1.00. This near-perfect fit showcases the precision of the predictive model, suggesting that it accurately captures and reproduces the variations in rumination time. Furthermore, the RMSE of 0.23 and MAE of 0.19 reflect the small discrepancies between the actual and predicted values. Figure 12(c) shows the assessment between predicted and actual values of drinking time behaviour revealing intriguing insights into the predictive exactness of the model. The remarkable closeness between the actual and predicted values, with a negligible difference of 0.07 in RMSE and 0.06 in MAE, demonstrates the robustness of the predictive model in capturing the intricacies of drinking behaviour.
Figure 10 presents a visual representation of the cumulative percentage of declines for both sick and healthy cows, along with their respective Kolmogorov-Smirnov (KS) statistics. The KS statistic for sick cows is calculated at 0.00959, signifying a greater divergence in the distribution of declines for this group. On the other hand, healthy cows exhibit a lower KS statistic of 0.00586, suggesting a relatively closer resemblance in the decline distribution among them. The overall KS statistic for both groups stands at 0.00420, signifying that there is some overlap in the cumulative percentage of declines among the two groups, but they still exhibit differences in their distribution patterns. These findings are crucial for understanding and comparing the health standing of cows in the given dataset.
The Receiver Operating Characteristic (ROC) curve, depicted in Fig. 11, illustrates the presentation of the Ketosis jeopardy prediction model in terms of specificity versus Sensitivity. This graph provides valued insights into the model's ability to discriminate between individuals in jeopardy of ketosis and those who are not. The training data area under the curve (AUC) of 0.77 and the test data AUC of 0.75 point out the model's reasonable predictive power, with the training data showing slightly better discrimination. This proposes that the model exhibits good overall performance in identifying individuals susceptible to ketosis, while also demonstrating its robustness on unseen data, making it a promising tool for ketosis risk assessment and management.
Figure 12 presents the ROC analysis results for predicting ketosis and mastitis in a dataset. During training, the model achieved a reasonably good performance in distinguishing between ketosis-positive and ketosis-negative cases with an AUC of 0.72 for ketosis, indicating its potential in this aspect. However, when evaluated on a separate testing dataset, the model's presentation in predicting ketosis dropped to an AUC of 0.53, suggesting that it performed only slightly better than random chance during testing. Moreover, the model's performance in predicting mastitis during training was relatively poor, with an AUC of 0.19. This indicates that the model struggled to distinguish between mastitis-positive and mastitis-negative cases during the training phase. When assessed on a separate testing dataset for mastitis, the model's presentation improved slightly but remained close to random chance, with an AUC of 0.48. These findings highlight the need for substantial improvements in the model's ability to accurately detect both ketosis and mastitis in dairy cattle, especially during testing, to enhance its practical utility in real-world scenarios.
Figure 13 indicates the risk scores for ketosis and mastitis in a dairy group. The percentages represent the likelihood or severity of these two health issues. Ketosis has a risk score of 46%, while mastitis has a higher risk score of 51%. This suggests that there is a higher probability of mastitis occurring compared to ketosis in the herd, and both conditions require attention and management to maintain the health of the cows.
5.1 Comparison Analysis
The comparison analysis presented in this work proposes to provide a complete evaluation of the performance of various classification algorithms, including the Decision Tree algorithm, Support Vector Machine (SVM), and a proposed method. Through an examination of key metrics such as precision and accuracy, this analysis offers valuable perceptions of the strengths and weaknesses of each approach. By assessing their respective capabilities in addressing the research problem, can identify the most suitable algorithm for the given task, ultimately guiding informed decision-making and the advancement of the research objectives.
In this comparison of different techniques, the study evaluates their performance based on key metrics: precision, accuracy and recall. The "Multiple Machine Learning" approach achieved a respectable accuracy of 90.9%, with a precision of 96.7% and a recall of 87.6%. Moving to the "Efficient DenseNet" technique, we observe a notable improvement in accuracy, reaching 97.2%, with precision at 98.09% and an impressive recall of 99.28%. However, the "Proposed" technique stands out as the clear frontrunner, boasting an exceptional accuracy rate of 99.8%, coupled with a precision of 99.2% and a remarkable recall of 99.4%. These results highlight the superior performance of the suggested technique, which excels in accurately classifying cases, reducing false positives, and effectively identifying actual positive cases, making it a highly promising approach for the task under consideration.