Even with the best possible hardware and configuration, CSI data contain noise and raw phase information cannot be directly used. This is caused firstly by the environment and reflected radio waves, secondly, by possible hardware instability. That is why the data preprocessing is an essential part of building a stable and accurate HAR system. The full workflow of the system, from the routers to model prediction and preprocessing is shown in Fig. 4.
6.1 Data preprocessing
Even with the most faultless gear and setup in place, CSI data unavoidably include noise and issues that result from ambient elements, including reflected radio waves, as well as probable device instability. These components might add unintended unpredictability and inaccuracy into the data. As a result, data preparation performs a vital component in the building of a strong and accurate Human Activity Recognition (HAR) system.
The technology of data processing is a vital component of the overall HAR system. It works as the bridge between the raw CSI data received from routers and the subsequent model forecast. Through a range of precisely documented activities and processes, data preparation is aimed at refining and enriching the raw data to make it more acceptable for successful model training and accurate activity detection.
The full working of the system, starting with the data collecting via routers to the final model's estimates, is depicted in Figure 4. This approach underscores the linked nature of data gathering, preprocessing, model generation, and activity detection, underscoring the critical role that data preparation plays in boosting the quality and reliability of the HAR system. By eliminating noise and instability conditions, data preparation guarantees that the model can make reasonable forecasts based on the input CSI data, even under tough real-world settings.
One key component of data processing in the context of Human Activity Recognition (HAR) is phase sanitization. Unlike amplitude information, raw phase data cannot be effectively exploited for activity detection due to intrinsic constraints. The primary difficulty with unprocessed phase data is its sensitivity to the effects of carrier frequency offset (CFO) and sample frequency offset (SFO).
CFO develops when the transmitter and receiver fail to synchronize their time and phases accurately before delivering a payload. This lack of synchronization may lead to large phase variations, especially in the 5 GHz frequency band, where even modest mistakes in clock time may result in phase changes of several π radians. However, human activities generally create phase changes of less than 0.5π radians.
Because of these limits, it becomes obvious that the minute phase changes generated by human movements are not detectable in the unprocessed phase data. Therefore, data preparation demands solving these obstacles related with phase data to make it more appropriate for accurate human activity identification. This may need altering for CFO and SFO influences, eventually boosting the phase data and improving its usefulness in the HAR system.
Sampling frequency offset (SFO) provides an insurmountable challenge in the utilization of raw phase data for human movement tracking. This shift is caused by the analog-to-digital converter, and it moves from subcarrier to subcarrier, resulting in multiple defects over the spectrum. Given that both carrier frequency offset (CFO) and SFO are unknown, the raw phase information remained inaccessible for direct application in activity identification.
However, a practicable strategy rests upon a linear modification detailed in [22], which tries to lessen the difficulties linked with phase data. This patch tries to address the impact of CFO and SFO, making the phase data more useful for activity identification. The outcomes of implementing this phase sanitization procedure are demonstrated in Figure 5.
The raw phase information, as shown in the left graphs, is initially noisy and fails to provide any important insights on walking behavior. However, after executing the sanitization technique, the phase data becomes less noisy, and a large shift in phase becomes noticeable during walking activity. This version simply addresses the challenges highlighted by CFO and SFO, allowing the phase data to become a meaningful component of the data preparation process, enabling more accurate human activity detection.
Data preparation also comprises correcting the appearance of anomalies in both the amplitude and phase data. These extremes may emerge due to several sources of noise, including transition rate alterations, power adjustments, thermal noise, and other external factors. It's vital to find and delete these anomalies since they might modify the signal and add erroneous information unrelated to human activities.
To circumvent this challenge, the Hampel Identifier method [15] is applied. This approach relies on statistical techniques, notably the median and median absolute deviation, to detect the location and distribution of anomalies within the data. By utilizing the Hampel Identifier, data preparation successfully detects and mitigates abnormalities, resulting in better and more trustworthy amplitude and phase data. This process helps guarantee that the data utilized for human activity identification is free from unwanted noise and appropriately portrays the true human activities being seen.
Despite the early filtering phases, some degree of noise typically lingers in CSI data. To further increase the data quality, a noise reduction strategy based on the Discrete Wavelet Transform, as described in [15], was created and implemented. In essence, the Discrete Wavelet Transform acts as an effective tool for noise reduction.
The technique entails transforming the original signal into a set of wavelet coefficients, with a focus on detecting and differentiating those components of primary significance. Typically, declining levels are signals of noise rather than true signal changes. By methodically removing these little components, the noise may be precisely filtered away without losing the overall signal quality.
Following this noise reduction method, the inverse wavelet transform is done to reconstruct the data, resulting in an improved and more precise dataset. This method insures that the CSI data utilized for human activity identification is not only free from unnecessary noise but also keeps the purity and clarity of the important signal components. The Discrete Wavelet Transform-based noise reduction has a major impact on purifying the data and boosting the accuracy of the following activity detection technique.
6.2 Machine Learning Model
Analyzing CSI data provides a unique difficulty, since typical approaches such as Support Vector Machines (SVM), decision trees, or fuzzy rule-based classifiers may not be well-suited for the intricacy and subtleties of this data. Therefore, we have chosen for a new approach: employing neural networks.
Neural networks are a cutting-edge and extremely effective approach for extracting hidden patterns and relationships from complicated and diverse data streams like CSI data. The value of neural networks resides in their potential to discover subtle correlations among data, even when such links are not obvious using standard analytical approaches.
The neural network technique provides the possibility of revealing underlying insights and patterns within the CSI data, allowing us to construct a robust model for human activity detection. With the capacity to adapt and learn from the data, neural networks offer the potential to give improved accuracy and performance in comprehending the intricate intricacies of human actions as collected by CSI data. By using the capabilities of neural networks, we want to attain a greater degree of accuracy and dependability in our human activity identification system.
The dataset utilized for training and evaluating the neural network model is explained in Chapter 5. As noted in Table 1, the dataset was constructed utilizing two antennas for both the receiver and transmitter routers, resulting in a total of four antenna pairs. For each of these antenna pairs, full CSI data was obtained, containing both phase and amplitude information over 114 subcarriers.
This dataset arrangement generates a considerable quantity of data, with a total of 912 characteristics for each unique CSI packet that is received. These properties are crucial in describing the signal fluctuations and patterns obtained by the CSI data. These properties serve as the foundation for training and fine-tuning the neural network model to detect and categorize diverse human activities effectively.
By working with this rich dataset, which encapsulates the complex interactions and variations in the Wi-Fi channel state, the neural network can effectively learn and derive meaningful insights from the CSI data, enhancing its capability to perform human activity recognition with a high degree of accuracy and reliability. The multiple elements retrieved from the CSI data add to the model's capacity to represent the nuanced intricacies of human motions and activities.
In addition to the intricacy of the dataset, it's vital to account for the fact that human transitions between occupations do not occur immediately; there's sometimes a transitional interval. Furthermore, ambient features and reflected waves in the Wi-Fi channel add another degree of complication. To efficiently detect human behaviors, the model must have the power to learn patterns over time, given the dynamic nature of human actions and the developing environment.
To overcome this difficulty, the input is arranged and supplied into the model as sequences with a preset step or window between them. This technique enables the model to capture activity patterns as they unfold across a succession of data points and generate predictions based on the cumulative information. The final objective is to identify the action done at the conclusion of each sequence.
Two distinct strategies were researched and evaluated to solve this temporal component, guaranteeing that the model can adapt to the developing patterns and transitions in human activities throughout time. This technique highlights the relevance of temporal dynamics in human activity detection, boosting the model's capacity to generate correct predictions as activities proceed and change.
The InceptionTime model, as introduced in [23], presents a compelling architectural approach intended expressly for solving time series classification challenges. It is built on a deep one-dimensional Convolutional Neural Network (CNN) structure suited to handle the complexity of time series data. The major aim of this model is attaining high accuracy in classification while minimizing training time, making it an economical option compared to other comparable architectures.
InceptionTime's remarkable characteristic resides in its capacity to successfully capture temporal patterns and relationships within time series data, which is especially important in the context of human activity detection based on CSI data. By employing deep CNN layers, InceptionTime excels in discovering subtle patterns that could be crucial for discriminating between diverse human activities.
Moreover, the model's efficiency in training time is a considerable benefit, particularly when working with huge datasets. This property may greatly minimize the computational cost involved with model training, making it a realistic alternative for real-world applications.
In summary, the InceptionTime model provides a promising and efficient architecture for the demanding issue of time series classification, with a special focus on the area of human activity identification. Its potential to blend high accuracy with minimal training time makes it a great tool for deriving relevant insights from CSI data and accurately categorizing human actions.
The InceptionTime model may be regarded as an ensemble of Inception networks, defined by a special design that comprises a Global Average Pooling layer and a Dense layer with a softmax activation function. Its principal input consists of a series of CSI data with a predetermined length of 1024. Each sequence is paired with a label describing the activity type performed at the conclusion of that specific sequence.
To successfully handle the data, it is separated into sequences with a step of 8 time points. This segmentation strategy enables the model to record patterns and transitions across the sequences, guaranteeing that it can generate predictions based on a sequence's cumulative information.
Cross Entropy Loss acts as the loss function for training the model. This loss function helps improve the model's parameters and guarantee that it produces accurate predictions about the activity class associated with each input sequence. The application of Cross Entropy Loss corresponds with common approaches in neural network training for classification problems.
The implementation of the InceptionTime model is based on the codebase supplied by [23]. This codebase gives a realistic framework for creating and training the model, simplifying the application of this architecture to the particular problem of human activity detection utilizing CSI data. Overall, the InceptionTime model design and data processing method are suited to the special needs of this application, aiming to achieve high accuracy in categorizing human activities based on CSI sequences.
Following the model's training procedure, the results indicate an accuracy of 38.2% on the validation dataset and a substantially higher accuracy of 99.2% on the training dataset. This disparity in performance shows a typical problem known as overfitting. In this context, overfitting occurs when the model performs extraordinarily well on the data used for training but fails to generalize successfully to unknown data, as evidenced in the decreased accuracy on the validation dataset.
The key element leading to this overfitting is likely the insufficient quantity of data available for training. The InceptionTime model is a vast and complicated architecture meant to capture nuanced patterns in the CSI data. However, given the very short dataset, the model may have merely "memorized" the training data rather than learning to generalize and detect larger patterns. This results to the high training accuracy but a reduced validation accuracy, showing a lack of resilience when presented with fresh, unexplored data.
To reduce overfitting and boost the model's performance, it is necessary to explore tactics such as getting a more significant dataset or adopting techniques like data augmentation or regularization. These techniques may help the model acquire more generalizable features and patterns, lowering the discrepancy between training and validation accuracy. Addressing overfitting is a prevalent difficulty in machine learning and is critical for constructing models that can successfully generalize to real-world events.
The LSTM-based model is a recurrent neural network (RNN) architecture uniquely optimized for processing sequence data, which is particularly significant for the job of human activity detection utilizing CSI data. RNNs excel at capturing sequential dependencies, and the Long Short-Term Memory (LSTM) variation further boosts their capacity to represent long-term dependencies inside sequences.
In this model, the main component is a single bidirectional LSTM layer with a hidden dimension of size 256. The bidirectional nature of the LSTM enables it to analyze input from both past and future time steps, boosting its potential to capture temporal patterns efficiently.
Following the LSTM layer, there are four dense layers in the model, each with various sizes: 512, 256, 128, and 7. These deep layers play a significant role in extracting and abstracting characteristics from the LSTM's output. They are supplied with Rectified Linear Unit (ReLU) as the activation function, which is a frequent option for incorporating non-linearity into neural networks.
The final output layer consists of 7 units, corresponding with the number of unique activity classes in the dataset. This layer applies a suitable activation function, often softmax in classification tasks, to build probability distributions across the activity classes, allowing the model to make predictions about the observed activities.
In essence, the LSTM-based model is intended to successfully handle sequential data, capturing both short-term and long-term relationships to reliably classify human behaviors based on the given CSI data. Its design and choice of activation functions are customized to the unique needs of the job, making it a good fit for this application.
The input for the LSTM-based model consists of sequences of CSI data, each having a fixed length of 1024, and is labeled with the activity done at the conclusion of the corresponding series. This configuration is meant to imitate a real-time Human Activity Recognition (HAR) system, representing the practical use of the model.
To handle the data properly, it is separated into 16-time-point chunks, with an 8-point overlap between them. This segmentation strategy enables the model to analyze numerous chunks of data to generate predictions, guaranteeing that it can catch trends throughout the sequences.
The model is trained using Cross Entropy Loss, an extensively used loss function for classification tasks. After training, the model demonstrates an accuracy of 61.1%, precision of 59%, and recall of 55% on the validation dataset. On the testing dataset, the model achieves an accuracy of 87.8%. The confusion matrix in Figure 6 displays the model's performance in identifying distinct activities.
It's worth mentioning that the amount of data gathered for each action has a notable influence on the model's accuracy. Activities with minimal data, such as "get up" and "get down," demonstrate poorer accuracy (35% and 24%, respectively), possibly owing to the tiny dataset available for training. For activities like "standing" and "no person," greater investigation is required to understand why they are categorized less properly.
While the model still suffers some overfitting, it is less noticeable compared to the prior model. Additionally, the total accuracy is substantially greater. This shows that the LSTM-based strategy is more suited for this specific job, especially when there is a limited quantity of data available for training. The model's ability to capture temporal relationships and generalize successfully makes it a good option for this application.