Head-mounted displays cause discomfort. This is commonly attributed to conflicting depth cues, most prominently between vergence, which is consistent with object depth, and accommodation, which is adjusted to the near eye displays. It is possible to adjust the camera parameters, specifically interocular distance and vergence angles, for rendering the virtual environment to minimize this conflict. This requires dynamic adjustment of the parameters based on object depth. In an experiment based on a visual search task we evaluate how dynamic adjustment affects visual comfort compared to fixed camera parameters. We collect objective as well as subjective data. Results show that dynamic adjustment decreases common objective measures of visual comfort such as pupil diameter and blink rate by a statistically significant margin. The subjective evaluation of categories such as fatigue or eye irritation shows a similar trend, but was inconclusive. This suggests that rendering with fixed camera parameters is the better choice for head mounted displays, at least in scenarios similar to the ones used here.
Vibrotactile and friction texture displays are good options for artificially presenting roughness and frictional properties of textures, respectively. These two types of displays are compatible with touch panels and exhibit complementary characteristics. We combine vibrotactile and electrostatic friction texture displays to improve the quality of virtual textures, considering actual textured surfaces are composed of both properties. We investigate their composition ratios when displaying roughness textures. Grating roughness scales of six surface wavelength values are generated under 11 display conditions, and in nine of which, vibrotactile and friction stimuli are combined with different composition ratios. A forced-choice experiment regarding subjective realism indicates that vibrotactile stimulus with a slight variable-friction stimulus is effective for presenting quality textures.
In this paper, visual perception principles were used to build an artificial perception model aimed at developing an algorithm for detecting junctions in line drawings of polyhedral objects that are vectorized from hand-drawn sketches. The detection is performed in 2D, before any 3D model is available and minimal information about the shape depicted by the sketch is used. The goal of this approach is to not only detect junctions in careful sketches created by skilled engineers and designers, but also detect junctions when skilled people draw casually to quickly convey rough ideas. Current approaches for extracting junctions from digital images are mostly incomplete, as they simply merge endpoints that are near each other, thus ignoring the fact that different vertices may be represented by different (but close) junctions and that the endpoints of lines that depict edges that share a common vertex may not necessarily be close to each other, particularly in quickly sketched drawings. We describe and validate a new algorithm that uses these perceptual findings to merge tips of line segments into 2D junctions that are assumed to depict 3D vertices.
Cognitive load assessment is crucial for user studies and human-computer interaction designs. As a noninvasive and easy to-use category of measures, current photoplethysmogram(PPG)-based assessment methods rely on single or small-scale predefined features to recognize responses induced by people?s cognitive load, which are not stable in assessment accuracy. In this study, we propose a machine-learning method by using 46 kinds of PPG features together to improve the measurement accuracy for cognitive load. We test the method on 16 participants through the classical n-back tasks (0-back, 1-back, and 2-back). The accuracy of the machine-learning method in di?erentiating di?erent levels of cognitive loads induced by task difficulties can reach 100% in 0-back vs. 2-back tasks, which outperformed the traditional HRV-based and single-PPG-feature based methods by 12% - 55%. When using "leave-one-participant-out" subject-independent cross validation, 87.5% binary classification accuracy was reached which is at the state-of-the-art level. The proposed method can also support real-time cognitive load assessment by beat-to-beat classifications with better performance than the traditional single-feature-based real-time evaluation method.
In recent years, the quality of real-time rendering has reached new heights - realistic reflections, physically based materials and photometric lighting are all becoming commonplace in modern game engines and even interactive virtual environments, such as virtual reality (VR). As the strive for realism continues, there is a need to investigate the effect of photorealism on users' perception, particularly for interactive, emotional scenarios in VR. In this paper, we explored three main topics, where we predicted photorealism will make a difference: the illusion of being present with the virtual person and in an environment, altered emotional response towards the character, and a more subtle response - comfort of being in close proximity to the character. We present a perceptual experiment, with an interactive expressive virtual character in VR, which was designed to induce particular social responses in people. Our participant pool was large (N = 797) and diverse in terms of demographics. We designed a between-group experiment, where each group saw either the realistic rendering or one of our stylized conditions (simple and sketch style), expressing one of three attitudes: Friendly, Unfriendly or Sad. While the render style did not particularly effect the level of comfort with the character or increase the illusion of presence with it, our main finding shows that the photorealistic character changed the emotional responses of participants, compared to the stylized versions. We also found a preference for realism in VR, reflected in the affinity and higher place illusion in the scenario, rendered in the realistic render style.
Virtual environments for gaming and simulation provide dynamic and adaptive experiences but, despite advances in multisensory interfaces, these are still primarily visual experiences. In order to support real time dynamic adaptation, interactive virtual environments could implement techniques to predict and manipulate human visual attention. One promising way of developing such techniques is to base them on psychophysical observations, an approach that requires a sound understanding of visual attention allocation. Understanding how this allocation of visual attention changes depending on a user's task offers clear benefits in developing these techniques and improving virtual environment design. With this aim, we investigated the effect of task on visual attention in interactive virtual environments. We recorded fixation data from participants completing freeview, search and navigation tasks in three different virtual environments. We quantified visual attention differences between conditions by identifying the predictiveness of a low-level saliency model, and its corresponding color, intensity and orientation feature conspicuity maps, as well as measuring fixation center bias, depth and duration as well as saccade amplitude. Our results show that task does affect visual attention in virtual environments. Navigation relies more than search or freeview on intensity conspicuity to allocate visual attention. Navigation also produces fixations that are more central, longer, and deeper into the scenes. Further, our results suggest that it is difficult to distinguish between freeview and search tasks. These results provide important guidance for designing virtual environments for human interaction, as well as identifying future avenues of research for developing `attention-aware' virtual worlds.
HTTP-based Adaptive Streaming (HAS) is the dominant Internet video streaming application. One specific HAS approach, Dynamic Adaptive Streaming over HTTP (DASH), is of particular interest as it is a widely deployed, standardized implementation. Prior academic research has focused on networking and protocol issues, and has contributed an accepted understanding of the performance and possible performance issues in large deployment scenarios. Our work extends the current understanding of HAS by focusing directly on the impacts of choice of the video quality adaptation algorithm on end user perceived quality. In congested network scenarios, the details of the adaptation algorithm determine the amount of bandwidth consumed by the application as well as the quality of the rendered video stream. HAS will lead to user perceived changes in video quality due to intentional changes in quality video segments, or unintentional perceived quality impairments caused by video decoder artifacts such as pixelation, stutters, or short or long stalls in the rendered video when the playback buffer becomes empty. The HAS adaptation algorithm attempts to find the optimal solution to mitigate the conflict between avoiding buffer stalls and maximizing video quality. In this paper, we present results from a user study that was designed to provide insights into `best practice guidelines' for a HAS adaptation algorithm. The study methodology contributes a unique method for gathering continuous quantitative subjective measure of user perceived quality using a Wii Remote.
Scene recognition is an essential component of both machine and biological vision. Recent advances in computer vision using deep convolutional neural networks (CNNs) have demonstrated impressive sophistication in scene recognition, through training on large datasets of labeled scene images [Zhou et al. 2018, 2014]. One criticism of CNN-based approaches is that performance may not generalize well beyond the training image set [Torralba and Efros 2011], and may be hampered by minor image modifications, which in some cases are barely perceptible to the human eye [Goodfellow et al. 2015; Szegedy et al. 2013]. While 'adversarial examples' may be unlikely in natural contexts, during many real-world visual tasks scene information can be degraded or limited due to defocus blur, camera motion, sensor noise, or occluding objects. Here, we quantify the impact of several image degradations (both common, and more exotic) on indoor/outdoor scene classification using CNNs. For comparison, we use human observers as a benchmark, and also evaluate performance against classifiers using limited, manually-selected descriptors. While the CNNs outperformed the other classifiers and rivaled human accuracy for intact images, our results show that their classification accuracy is more affected by image degradations than human observers. On a practical level, however, accuracy of the CNNs remained well above chance for a wide range of image manipulations that disrupted both local and global image statistics. Surprisingly, we find that a simple classifier based on the mean hue, saturation, and value (HSV) of images is the best option for classifying very degraded scenes. We also examine the level of image-by-image agreement with human observers, and find that the CNNs? agreement with observers varied substantially as a function of the nature of image manipulation. Together, these results suggest that CNN-based scene classification techniques are relatively robust to image degradations, however the pattern of classifications obtained for ambiguous images does not appear to closely reflect the strategies employed by human observers.
We compared the perceptual validity of human avatar walking animations driven by six different representations of human movement using a graphics Turing test. All six representations are based on movement primitives (MPs), which are predictive models of full-body movement that differ in their complexity and prediction mechanism. Assuming that humans are experts at perceiving biological movement from noisy sensory signals, it follows that these percepts should be describable by a suitably constructed Bayesian ideal observer model. We build such models from MPs and investigate if the perceived naturalness of human animations are predictable from approximate Bayesian model scores of the MPs. We found that certain MP based representations are capable of producing movements that are perceptually indistinguishable from natural movements. Furthermore, approximate Bayesian model scores of these representations can be used to predict perceived naturalness. In particular, we could show that movement dynamics are more important for perceived naturalness of human animations than single frame poses. This indicates that perception of human animations is highly sensitive to their temporal coherence. More generally, our results add evidence for a shared MP-representation of action and perception. Even though the motivation of our work is primarily drawn from neuroscience, we expect that our results will be applicable in virtual and augmented reality settings, when perceptually plausible human avatar movements are required.