Overall, twenty-three system-related factors were mentioned in 297 of the overviewed articles. In general, three broader subcategories emerged - hardware, software, and network (Fig. 15).
3.4.1. Hardware
HMD resolution. Higher HMD resolution allows users to see finer details (e.g., read text, spot a target), which in turn can contribute to increased realism and sense of presence (Mehrfard et al. 2019; Perroud et al. 2019). Furthermore, higher resolution can influence the interactivity of users in VR (Jang and Park 2019; Perroud et al. 2019).
Haptic devices. In addition to hearing and seeing in VR, being able to touch and feel it can provide an increased sense of immersion, presence and enjoyment (Garcia-Valle et al. 2018; Srinivasan and Basdogan 1997). Overall, we can divide haptics into three major categories: (1) grounded, (2) non-grounded, and (3) wearable (Fig. 16).
Grounded haptics provide information on weight sensation and three-dimensional forces. They use mechanisms with a counterbalanced weight that exert forces along multiple axes (Kurita 2021). However, they are restricted in their mobility and size and thus not optimal for wider use. This is where non-grounded haptic devices can help as they employ a counteractive mechanism that creates linear or angular momentum and does not require ground connection, thus allowing for the device to be mobile (Kurita 2021). One limitation of such devices is that they are not powerful enough to generate large forces (Srinivasan and Basdogan 1997). Wearable haptics, however, allow for greater freedom and can be divided into three categories. First, vibro-tactile feedback is aimed at the tactile senses. These devices are useful tools for guiding users with limited visual cues. Second, force feedback devices provide stimuli for the
kinematic sense. There are two types of haptic feedback systems with force feedback capability: (1) exoskeletons and (2) fingertip-mounted devices. Exoskeletons allow users to manipulate virtual objects based on various types of haptic feedback. The primary problem with exoskeletons is the need for user-specific calibration. Considering this limitation, fingertip devices are an interesting solution. Finally, electronic feedback leverages the electrical conductivity of human skin to evoke tactile sensations (Garcia-Valle et al. 2018; Kurita 2021). Tactile sensation is evoked via electrodes that pass current onto the skin and create an electro- tactile sensation. However, as the electrodes pass actual current through to the skin of the user, one must be cautious as they might cause pain.
Field of View (FoV). Higher FoV has been associated with higher immersion as the user can perceive more of the VE (Boger 2017; Kim et al. 2018). Furthermore, it has been related to improved user performance in search, comparison, and walking tasks (McMahan et al. 2012). Additionally, wider FOV facilitates the learning process by allowing the subject to perceive a greater number of environmental stimuli (Rizopoulos and Charitos 2014). On the other hand, there is an association between wider FoV and more acute cybersickness symptoms (Kim et.al. 2018).
HMD type. Among the most often used types of HMDs are wireless, phone-in-a-box, stand- alone, and tethered. These headsets vary in their weight, type of lenses, resolution, FoV, audio output, refresh rate, and additional sensors (e.g., eye tracking, and hand tracking). Consequently, they also offer different levels of immersion, as well as different degrees of comfort (Angelov et al. 2020).
Tracking devices. Tracking can be divided into two general types: orientational and
positional. Orientational tracking systems determine the orientation of HMDs and/or
controllers in real three-dimensional space and it offers the user the ability to move within 3 degrees of freedom (3DoF) (yaw, pitch and roll) (Fig. 17). Positional tracking offers additional 3 degrees of freedom (moving up and down, left to right, backward and forward) (Fig. 17). Hence, positional tracking provides more opportunities for complex, realistic user interactions with the environment and is generally more immersive (Angelov et al. 2020).
3.4.2. Software
Image rendering. Rendering software facilitates the process of generating VR content based on a 2D image or 3D models (de Regt et al. 2020). Nowadays, engines can render scenes of up to 180K polygons in 1/20 second (Brooks 1999). Thus, enabling users to be immersed in VEs that are richer in detail and have higher visual and audio quality.
Tracking algorithms. Interaction with the VE is only possible if the users' position is adequately tracked. Kunz and colleagues (2016) have shown that low latency (> 10ms), high update rate (< 100 Hz) and high precision (> 5 mm RMS (random mean square) and 2 degrees RMS) of tracking can be crucial for achieving a pleasant VR experience (Kunz et al. 2016). Moreover, gaze fixation tracking allows users to interact with the VE just by glances, which has the potential to reduce cybersickness and might be a solution for users who are physically unable to use controllers (Li et al. 2017). Finally, gaze tracking can lead to energy saving via foveated rendering, which progressively reduces the image details outside of the region of the eye gaze fixation (Li et al. 2017)
Adaptive streaming. Streaming VR content is highly demanding on the network bandwidth due to the volume of information that needs to reach the HMD. Adaptive streaming optimizes video configurations to deliver the best possible quality to the user at any given time depending on the changing qualities of the network. For example, HTTP adaptive streaming algorithms switch to a lower bitrate when the network conditions are bad to avoid stalling of the video (Anwar et al. 2020). Other algorithms make use of the HMD eye-gaze tracking to maximize the quality of the predicted FoV at the expense of the unattended regions (Chiariotti 2021; Wang et al. 2017).
3.4.3. Network
Latency. Latency in networking is the amount of time it takes for a packet of data to be captured, transmitted, processed through multiple devices, then received at its destination and decoded. A VR headset requires high robustness against the fluctuations of the network quality since it has been shown that a latency higher than 15ms not only degrades the viewing quality, it also leads to cybersickness (Saxena et al. 2020).
Bandwidth. This term refers to the maximum rate of data transfer across a given path. In general, immersive VR experiences require a high bandwidth connection to deliver high video quality and support an immersive user experience (Saxena et al. 2020). Current capabilities of wireless communication under 4G do not fully support enough bandwidth for wireless streaming on HMDs, which is why different adaptive streaming and rate adaptation algorithms are usually applied (Anwar et al. 2020).
Bit rate. This describes the rate at which bits are transferred from one location to another (Gao and Wen 2021). A higher bitrate leads to more available video representations, which in turn provides a smoother viewing experience. Nevertheless, a higher bit rate can increase the operational cost of the service provider for transcoding and storing more video representations (Gao and Wen 2021).
Delay. Delay refers to the time it takes for the first byte to arrive (Grzelka et al. 2019). It is related not only to the quality of the network but also to the tracking algorithms used. In fact, research has shown that the overall user experience in VR decreases as the delay in rotation (established by tracking the user's position) increases (> 11ms.) (Grzelka et al. 2019).
Rate adaptation. Because of its immersive nature, VR requires higher resolution, which calls for higher bandwidth requirements. This poses a greater challenge in terms of wireless communication speed, quality and stability (Jiang et al. 2020). An effective solution for video streaming under unstable network conditions is adaptive bit rate (ABR) streaming, in which each video is encoded into many representations of different bit rates. Thus, the user dynamically selects the most suitable representation according to the current network conditions. As such, the rate adaptation mechanism is vital to the performance of ABR streaming. Rate adaptation can increase the quality of the overall user experience by increasing the video and audio quality of the content and decreasing the delay (Gao et al. 2019; Gao and Wen 2021; Jiang et al. 2020).