2.1 Ethical Statement
The trial was carried out in accordance with D.Lgs. 26/2014 and EU Directive 2010/63/EU concerning experiments on animals and was endorsed by the animal welfare committee (Organismo Preposto al Benessere Animale committee – OPBA – official number 167326) of Padova University. The experimental protocol was approved by the licensing committee OPBA (official number 167326) of Padova University according to D.Lgs. 26/2014. Furthermore, all methods were performed in accordance with the OPBA’s guidelines and regulations in compliance with D.Lgs. 26/2014. The research adhered to all aspects of the ARRIVE guidelines to inform both study design and reporting. The protocol consisted in the observation for 27.3 h of 12 healthy randomly chosen mid lactating dairy cows wearing a tri-axial accelerometer to assess the accuracy of a deep learning model in predicting cows’ behaviour.
2.2 Data collection
Animal husbandry and data collection are described in detail by Balasso and colleagues in a paper reporting the use of classical ML to classify dairy cow behaviour [21]. Briefly, the trial was carried out in an Italian dairy farm raising Italian Red-and-White cows in loose housing conditions. Twelve randomly selected healthy cows with 2.87±0.91 lactations and 180 ± 35 days in milk (average ± SD) were observed by trained personnel for on average 136 ± 29 min per cow in a period of 12 days. Animals were observed approximately between 1100 h and 1500 h, to include the highest variety of behaviours as possible by two trained operators who took down cow behaviour in real time using Microsoft Excel 2010 (Microsoft, Remond, WA, USA) on a computer synchronized with the sensor. Inter-observer reliability reckoned through Cohens’ kappa [27] was 0.91. During the observation sessions cows were wearing a tri-axial (X, Y, Z) accelerometer (model MSR145W, PCE Italia srl, Capannori, LU, Italy), applied to the center of the left side paralumbar fossa through an elastic band [21]. The sensor was set to collect data at a 5 Hz frequency [21] to be able to spot short‐term behaviours and at the same time save the battery life. The accelerometer was fixed so as to have on a standing animal X, Y and Z axes in a pre-set position: X vertical, Y parallel to the ground and the Z orthogonal to the cow’s flank. Five behaviours were considered: moving, standing still, feeding, ruminating, and resting, as reported in Table 2.
Table 2. Behaviour description for dairy cow.
Behaviour
|
Definition1
|
|
|
Standing still
|
Cows stand still without moving their legs or showing any sign of activity
|
Feeding
|
Cows ingest the feed and chew it at the feed bunk
|
|
|
Moving (Walking or moving slightly)
|
Cows walk across the pen or, while standing, perform other behaviours other than those here described, which are characterized by at least one step every 10 seconds.
|
Ruminating
|
Chewing that begins upon regurgitating a bolus and ends when the bolus is swallowed, both in standing or lying position.
|
Resting
|
Cows lie on the floor, not moving nor ruminating
|
1Adapted from Balasso et al. [21].
2.3 Dataset preparation
Acceleration data on the X, Y and Z axes were exported as .csv file using the software MSR 5.12.04 (PCE Italia srl, Capannori, LU, Italy). Data were then imported in Excel 2010 (Microsoft, Redmond, WA, USA), where the collection time (date, h, min, s, hundredths of a second), acceleration values on the X, Y and Z axes, and the corresponding behaviour were reported for each row in different columns. Statistical analyses were performed using R, version 3.2.1 (R Core Team 2013). Tri-axial accelerations were recorded every 0.2 seconds, corresponding to 27.3 h of observation (n = 490,900). The observations during which the behaviour was unclear were excluded by the dataset, leaving 25.4 h of observation suited for analyses (n = 456,730), including feeding (4.68 h; n = 84,206), moving (4.69 h; n = 84,400), resting (7.84 h; n = 141,055), ruminating (2.98 h; n = 53,744), and standing still (5.18 h; n = 93,325).
A list of metrics was obtained considering a rolling window of 15 observations. These metrics are: standard deviation (sd), average (avg), percentage change between an observation and the previous one, and the binary value related to it (if the percentage change is negative, the value given is 0; otherwise it is 1) applied to X, Y and Z acceleration data, for a total of 15 variables. As reported in Fig. 2. each interval of 40 observations (8 s) with a sliding interval of 13 observations (33%) was summarized into one observation unit and associated with a specific behaviour, obtaining a dataset of 211,720 observation units and 15 columns. All the intervals in between two different behaviours were excluded. The 8 s interval was chosen because it offered the best compromise in differentiating one very short behaviour, such as walking, from others.
To build up a predictive model, the dataset was randomly split into training (80% of the observations, n = 169,376) and testing (20% of the observations n = 42,344) datasets. The latter was used to estimate the performance of the model. All variables were normalized considering the mean and the standard deviation of the training dataset.
2.4 Data Modelling
As reported in Table 3 a CNN model made of 8 layers was built using 5 kinds of layers: convolution (n = 3), dropout (n = 1), max-pooling (n = 1), flattening (n = 1), and dense layers (n = 2).
- Convolution is a process in which a small matrix (the Kernel or filter) is slid across the input dataset which is transformed on the basis of the filter values. As reported in Table 2, in the Conv1d_1, Conv1d_2 and Conv1d_3 layers, the filters were set up as 128, 64, 32, respectively. For all three layers, the kernel size was set up as 3 and the activation function used was the rectified linear unit (ReLu). The dropout layer randomly selected neurons which are ignored during training. This helps prevent overfitting. To do this, a rate frequency is adopted at each step. In this model the rate was set to 0.3.
- Max-pooling was used to reduce the size of the tensor and accelerate calculations. It downsamples the input representation by calculating the largest value over the window as defined by pool size, which in our case was set up to 2.
- Flattening reduces data into an array so that the CNN can read it by removing every dimension but one. As reported in Table 2, the output shape of the layer is 544, which is equal to the multiplication of 17 times 32, the two dimensions of the previous layer.
- The Dense layer consists of a finite number of neurons (mathematical functions) which receive as input one vector and return another one as output. The first dense layer was made of 100 neurons with a ‘RELU’ activation function and was connected to the last dense layer with a softmax activation function and a length of 5, which is equal to the number of activities to be classified by the model. The model was deployed in Python using Keras [28] with a Tensorflow backend.
It is noteworthy that the final layer’s output shape is 5, given that there are 5 behaviours to classify.
Table 3. Summary of the Deep Learning model architecture, with description of the layers used, output shape, and the number of parameters used in the model for each layer.
Layer (type)
|
Output Shape
|
Parameters
|
Conv1d_1 (Conv1D)
|
(None, 38, 128)
|
5,888
|
Conv1d_2 (Conv1D)
|
(None, 36, 64)
|
24,640
|
Conv1d_3 (Conv1D)
|
(None, 34, 32)
|
6,176
|
dropout_1 (Droput)
|
(None, 34, 32)
|
0
|
max_pooling1d_1 (Max-pooling)
|
(None, 17, 32)
|
0
|
flatten_1 (Flatten)
|
(None, 544)
|
0
|
dense_1 (Dense)
|
(None, 100)
|
54,500
|
dense_1 (Dense)
|
(None, 5)
|
505
|
Total parameters: 91,709
|
|
|
Trainable parameters: 91,709
|
|
|
Non-trainable parameters: 0
|
|
|
This CNN model was chosen out of three CNN and a one CNN Long short-Term Memory Network models since it gave the best performance. Training the models took about 90 minutes for each model by using Google Colaboratory, which is a cloud-based notebook environment that lets you write, execute, and share code in Google Drive. Google Colaboratory gives free access to GPUs (Graphics Processing Unit) and TPUs (Tensor Processing Unit) with the following characteristics and performance (Table 4).
Table 4. Summary of the Graphics Processing Unit (GPU) characteristics and performance made available in Google Collaboratory.
Parameter
|
Value
|
GPU
|
Nvidia K80 / T4
|
GPU Memory
|
12GB / 16GB
|
GPU Memory Clock
|
0.82GHz / 1.59GHz
|
Performance
|
4.1 TFLOPS / 8.1 TFLOPS
|
Support Mixed Precision
|
No / Yes
|
GPU Release Year
|
2014 / 2018
|
No. CPU Cores
|
2
|
Available RAM
|
12GB (upgradable to 26.75GB)
|
CPU, central processing unit; RAM, random access memory.
Fig. 3 reports the learning curve of the model, a line plot showing how the accuracy of the model increases over training. Models are trained over a large number of epochs allowing the learning algorithm to run until the error from the model has been sufficiently reduced. The epoch is a unit meaning that each sample in the training dataset has had an opportunity to update the internal model parameters and the number of epochs is a hyperparameter that gives the number of times that the learning algorithm will work through the whole training dataset [29].
Model Assessment
Average accuracy (macro and weighted), recall, precision, and F1-score were calculated in order to measure CNN capability in predicting cow behaviour [22]. Once the numbers of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) have been set, average accuracy is reckoned as accuracy = (TP + TN) / (TP + FP + FN + TN) and gives an overall measure of correctly identified behaviours [21]. Note that in average accuracy, all classes are assigned equal weight when contributing their portion of the precision value to the total. This might not be a realistic calculation whenever there is a large amount of class imbalance. In the latter case, a weighted macro average is more informative. Weights are calculated by the frequency of the class in the truth column. The other parameters were calculated as follows: Recall = TP / (TP + FN); Precision = TP / (TP+FP); F1-score = (2 * Precision * Recall) / (Precision + Recall). The latter is a single score that balances both the concerns of precision and recall in one number, as reported in literature [22].