首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper introduces the use of a visual attention model to improve the accuracy of gaze tracking systems. Visual attention models simulate the selective attention part of the human visual system. For instance, in a bottom‐up approach, a saliency map is defined for the image and gives an attention weight to every pixel of the image as a function of its colour, edge or intensity. Our algorithm uses an uncertainty window, defined by the gaze tracker accuracy, and located around the gaze point given by the tracker. Then, using a visual attention model, it searches for the most salient points, or objects, located inside this uncertainty window, and determines a novel, and hopefully, better gaze point. This combination of a gaze tracker together with a visual attention model is considered as the main contribution of the paper. We demonstrate the promising results of our method by presenting two experiments conducted in two different contexts: (1) a free exploration of a visually rich 3D virtual environment without a specific task, and (2) a video game based on gaze tracking involving a selection task. Our approach can be used to improve real‐time gaze tracking systems in many interactive 3D applications such as video games or virtual reality applications. The use of a visual attention model can be adapted to any gaze tracker and the visual attention model can also be adapted to the application in which it is used.  相似文献   

2.
形状特征是影响人机交互界面视觉工效的关键因素,为使人机交互界面能更好地 适应用户的生理及心理特性、提升用户体验,需要建构一种人机交互界面中形状特征的视觉显 著度计算模型。在分析形状特征对视觉显著度影响程度的基础上,针对人机交互界面中的典型 形状,利用内接正方形将形状分割为多个部分,使用相关三角形对形状部分的视觉显著度进行 计算,取其中最大值作为形状的视觉显著度,实现形状视觉显著度的量化分析与计算,并通过 眼动追踪实验验证该方法的有效性。  相似文献   

3.
Performing typical network tasks such as node scanning and path tracing can be difficult in large and dense graphs. To alleviate this problem we use eye‐tracking as an interactive input to detect tasks that users intend to perform and then produce unobtrusive visual changes that support these tasks. First, we introduce a novel fovea based filtering that dims out edges with endpoints far removed from a user's view focus. Second, we highlight edges that are being traced at any given moment or have been the focus of recent attention. Third, we track recently viewed nodes and increase the saliency of their neighborhoods. All visual responses are unobtrusive and easily ignored to avoid unintentional distraction and to account for the imprecise and low‐resolution nature of eye‐tracking. We also introduce a novel gaze‐correction approach that relies on knowledge about the network layout to reduce eye‐tracking error. Finally, we present results from a controlled user study showing that our methods led to a statistically significant accuracy improvement in one of two network tasks and that our gaze‐correction algorithm enables more accurate eye‐tracking interaction.  相似文献   

4.

In recent years, the significant progress has been achieved in the field of visual saliency modeling. Our research key is in video saliency, which differs substantially from image saliency and could be better detected by adding the gaze information from the movement of eyes while people are looking at the video. In this paper we purposed a novel gaze saliency method to predict video attention, which is inspired by the widespread usage of mobile smart devices with camera. It is a non-contacted method to predict visual attention, and it does not bring the burden on the hardware. Our method first extracts the bottom-up saliency maps from the video frames, and then constructs the mapping from eye images obtained by the camera in synchronization with the video frames to the screen region. Finally the combination between top-down gaze information and bottom-up saliency maps is conducted by point-wise multiplication to predict the video attention. Furthermore, the proposed approach is validated on the two datasets: one is the public dataset MIT, the other is the dataset we collected, versus other four usual methods, and the experiment results show that our method achieves the state-of-the-art.

  相似文献   

5.
In this paper we present a novel mechanism to obtain enhanced gaze estimation for subjects looking at a scene or an image. The system makes use of prior knowledge about the scene (e.g. an image on a computer screen), to define a probability map of the scene the subject is gazing at, in order to find the most probable location. The proposed system helps in correcting the fixations which are erroneously estimated by the gaze estimation device by employing a saliency framework to adjust the resulting gaze point vector. The system is tested on three scenarios: using eye tracking data, enhancing a low accuracy webcam based eye tracker, and using a head pose tracker. The correlation between the subjects in the commercial eye tracking data is improved by an average of 13.91%. The correlation on the low accuracy eye gaze tracker is improved by 59.85%, and for the head pose tracker we obtain an improvement of 10.23%. These results show the potential of the system as a way to enhance and self-calibrate different visual gaze estimation systems.  相似文献   

6.
This paper presents a real-time framework for computationally tracking objects visually attended by the user while navigating in interactive virtual environments. In addition to the conventional bottom-up (stimulus-driven) saliency map, the proposed framework uses top-down (goal-directed) contexts inferred from the user's spatial and temporal behaviors, and identifies the most plausibly attended objects among candidates in the object saliency map. The computational framework was implemented using GPU, exhibiting high computational performance adequate for interactive virtual environments. A user experiment was also conducted to evaluate the prediction accuracy of the tracking framework by comparing objects regarded as visually attended by the framework to actual human gaze collected with an eye tracker. The results indicated that the accuracy was in the level well supported by the theory of human cognition for visually identifying single and multiple attentive targets, especially owing to the addition of top-down contextual information. Finally, we demonstrate how the visual attention tracking framework can be applied to managing the level of details in virtual environments, without any hardware for head or eye tracking.  相似文献   

7.
3D gaze tracking from a single RGB camera is very challenging due to the lack of information in determining the accurate gaze target from a monocular RGB sequence. The eyes tend to occupy only a small portion of the video, and even small errors in estimated eye orientations can lead to very large errors in the triangulated gaze target. We overcome these difficulties with a novel lightweight eyeball calibration scheme that determines the user-specific visual axis, eyeball size and position in the head. Unlike the previous calibration techniques, we do not need the ground truth positions of the gaze points. In the online stage, gaze is tracked by a new gaze fitting algorithm, and refined by a 3D gaze regression method to correct for bias errors. Our regression is pre-trained on several individuals and works well for novel users. After the lightweight one-time user calibration, our method operates in real time. Experiments show that our technique achieves state-of-the-art accuracy in gaze angle estimation, and we demonstrate applications of 3D gaze target tracking and gaze retargeting to an animated 3D character.  相似文献   

8.
In this paper we propose a system for the analysis of user generated video (UGV). UGV often has a rich camera motion structure that is generated at the time the video is recorded by the person taking the video, i.e., the ?camera person.? We exploit this structure by defining a new concept known as camera view for temporal segmentation of UGV. The segmentation provides a video summary with unique properties that is useful in applications such as video annotation. Camera motion is also a powerful feature for identification of keyframes and regions of interest (ROIs) since it is an indicator of the camera person's interests in the scene and can also attract the viewers' attention. We propose a new location-based saliency map which is generated based on camera motion parameters. This map is combined with other saliency maps generated using features such as color contrast, object motion and face detection to determine the ROIs. In order to evaluate our methods we conducted several user studies. A subjective evaluation indicated that our system produces results that is consistent with viewers' preferences. We also examined the effect of camera motion on human visual attention through an eye tracking experiment. The results showed a high dependency between the distribution of fixation points of the viewers and the direction of camera movement which is consistent with our location-based saliency map.  相似文献   

9.
We present GazeDirector, a new approach for eye gaze redirection that uses model‐fitting. Our method first tracks the eyes by fitting a multi‐part eye region model to video frames using analysis‐by‐synthesis, thereby recovering eye region shape, texture, pose, and gaze simultaneously. It then redirects gaze by 1) warping the eyelids from the original image using a model‐derived flow field, and 2) rendering and compositing synthesized 3D eyeballs onto the output image in a photorealistic manner. GazeDirector allows us to change where people are looking without person‐specific training data, and with full articulation, i.e. we can precisely specify new gaze directions in 3D. Quantitatively, we evaluate both model‐fitting and gaze synthesis, with experiments for gaze estimation and redirection on the Columbia gaze dataset. Qualitatively, we compare GazeDirector against recent work on gaze redirection, showing better results especially for large redirection angles. Finally, we demonstrate gaze redirection on YouTube videos by introducing new 3D gaze targets and by manipulating visual behavior.  相似文献   

10.
For many applications in graphics, design and human computer interaction, it is essential to reliably estimate the visual saliency of images. In this paper, we propose a visual saliency detection method that combines the respective merits of color saliency boosting and global region based contrast schemes to achieve more accurate saliency maps. Our method is compared with existing saliency detection methods when evaluated using four public available datasets. Experimental results show that our method consistently outperformed current state-of-the-art methods on predicting human fixations. We also demonstrate how the extracted saliency map can be used for image classification.  相似文献   

11.
We present a novel approach to optimally retarget videos for varied displays with differing aspect ratios by preserving salient scene content discovered via eye tracking. Our algorithm performs editing with cut, pan and zoom operations by optimizing the path of a cropping window within the original video while seeking to (i) preserve salient regions, and (ii) adhere to the principles of cinematography. Our approach is (a) content agnostic as the same methodology is employed to re‐edit a wide‐angle video recording or a close‐up movie sequence captured with a static or moving camera, and (b) independent of video length and can in principle re‐edit an entire movie in one shot. Our algorithm consists of two steps. The first step employs gaze transition cues to detect time stamps where new cuts are to be introduced in the original video via dynamic programming. A subsequent step optimizes the cropping window path (to create pan and zoom effects), while accounting for the original and new cuts. The cropping window path is designed to include maximum gaze information, and is composed of piecewise constant, linear and parabolic segments. It is obtained via L(1) regularized convex optimization which ensures a smooth viewing experience. We test our approach on a wide variety of videos and demonstrate significant improvement over the state‐of‐the‐art, both in terms of computational complexity and qualitative aspects. A study performed with 16 users confirms that our approach results in a superior viewing experience as compared to gaze driven re‐editing [ JSSH15 ] and letterboxing methods, especially for wide‐angle static camera recordings.  相似文献   

12.
3D garment capture is an important component for various applications such as free‐view point video, virtual avatars, online shopping, and virtual cloth fitting. Due to the complexity of the deformations, capturing 3D garment shapes requires controlled and specialized setups. A viable alternative is image‐based garment capture. Capturing 3D garment shapes from a single image, however, is a challenging problem and the current solutions come with assumptions on the lighting, camera calibration, complexity of human or mannequin poses considered, and more importantly a stable physical state for the garment and the underlying human body. In addition, most of the works require manual interaction and exhibit high run‐times. We propose a new technique that overcomes these limitations, making garment shape estimation from an image a practical approach for dynamic garment capture. Starting from synthetic garment shape data generated through physically based simulations from various human bodies in complex poses obtained through Mocap sequences, and rendered under varying camera positions and lighting conditions, our novel method learns a mapping from rendered garment images to the underlying 3D garment model. This is achieved by training Convolutional Neural Networks (CNN‐s) to estimate 3D vertex displacements from a template mesh with a specialized loss function. We illustrate that this technique is able to recover the global shape of dynamic 3D garments from a single image under varying factors such as challenging human poses, self occlusions, various camera poses and lighting conditions, at interactive rates. Improvement is shown if more than one view is integrated. Additionally, we show applications of our method to videos.  相似文献   

13.
We aim to identify the salient objects in an image by applying a model of visual attention. We automate the process by predicting those objects in an image that are most likely to be the focus of someone's visual attention. Concretely, we first generate fixation maps from the eye tracking data, which express the ground truth of people's visual attention for each training image. Then, we extract the high-level features based on the bag-of-visual-words image representation as input attributes along with the fixation maps to train a support vector regression model. With this model, we can predict a new query image's saliency. Our experiments show that the model is capable of providing a good estimate for human visual attention in test images sets with one salient object and multiple salient objects. In this way, we seek to reduce the redundant information within the scene, and thus provide a more accurate depiction of the scene.  相似文献   

14.
随着眼动跟踪技术的日益成熟,面向终端用户的视线输入产品问世,视线交互(Gaze-based Interaction)的实用性越来越高。然而,由于眼睛并不是与生俱来的控制器官,用户界面中无论动态或静态的各种视觉反馈,在视线交互过程中都可能干扰用户的眼动,从而影响视线输入(视点坐标)。因此,通过两个视线点击(Eye Pointing)实验,从视点的空间分布特征和视线交互的人机工效两个方面,系统地评估了目标颜色因素对视线交互的影响。结果表明,目标颜色这类静态视觉反馈虽然不影响用户凝视目标时视点坐标的稳定性,但的确会对用户的眼动扫视过程造成显著影响,从而影响视线点击任务的人机工效。特别是在视线移动距离较长的情况下,这种影响更为明显。  相似文献   

15.
目的 经典的人眼注视点预测模型通常采用跳跃连接的方式融合高、低层次特征,容易导致不同层级之间特征的重要性难以权衡,且没有考虑人眼在观察图像时偏向中心区域的问题。对此,本文提出一种融合注意力机制的图像特征提取方法,并利用高斯学习模块对提取的特征进行优化,提高了人眼注视点预测的精度。方法 提出一种新的基于多重注意力机制(multiple attention mechanism, MAM)的人眼注视点预测模型,综合利用3种不同的注意力机制,对添加空洞卷积的ResNet-50模型提取的特征信息分别在空间、通道和层级上进行加权。该网络主要由特征提取模块、多重注意力模块和高斯学习优化模块组成。其中,空洞卷积能够有效获取不同大小的感受野信息,保证特征图分辨率大小的不变性;多重注意力模块旨在自动优化获得的低层丰富的细节信息和高层的全局语义信息,并充分提取特征图通道和空间信息,防止过度依赖模型中的高层特征;高斯学习模块用来自动选择合适的高斯模糊核来模糊显著性图像,解决人眼观察图像时的中心偏置问题。结果 在公开数据集SALICON(saliency in context)上的实验表明,提出的方法相较于同结...  相似文献   

16.
In this paper, we address simultaneous markerless motion and shape capture from 3D input meshes of partial views onto a moving subject. We exploit a computer graphics model based on kinematic skinning as template tracking model. This template model consists of vertices, joints and skinning weights learned a priori from registered full‐body scans, representing true human shape and kinematics‐based shape deformations. Two data‐driven priors are used together with a set of constraints and cues for setting up sufficient correspondences. A Gaussian mixture model‐based pose prior of successive joint configurations is learned to soft‐constrain the attainable pose space to plausible human poses. To make the shape adaptation robust to outliers and non‐visible surface regions and to guide the shape adaptation towards realistically appearing human shapes, we use a mesh‐Laplacian‐based shape prior. Both priors are learned/extracted from the training set of the template model learning phase. The output is a model adapted to the captured subject with respect to shape and kinematic skeleton as well as the animation parameters to resemble the observed movements. With example applications, we demonstrate the benefit of such footage. Experimental evaluations on publicly available datasets show the achieved natural appearance and accuracy.  相似文献   

17.
We present an adaptive slicing scheme for reducing the manufacturing time for 3D printing systems. Based on a new saliency‐based metric, our method optimizes the thicknesses of slicing layers to save printing time and preserve the visual quality of the printing results. We formulate the problem as a constrained ?0 optimization and compute the slicing result via a two‐step optimization scheme. To further reduce printing time, we develop a saliency‐based segmentation scheme to partition an object into subparts and then optimize the slicing of each subpart separately. We validate our method with a large set of 3D shapes ranging from CAD models to scanned objects. Results show that our method saves printing time by 30–40% and generates 3D objects that are visually similar to the ones printed with the finest resolution possible.  相似文献   

18.
Effective composition in visual arts relies on the principle of movement, where the viewer's eye is directed along subjective curves to a center of interest. We call these curves subjective because they may span the edges and/or center‐lines of multiple objects, as well as contain missing portions which are automatically filled by our visual system. By carefully coordinating the shape of objects in a scene, skilled artists direct the viewer's attention via strong subjective curves. While traditional 2D sketching is a natural fit for this task, current 3D tools are object‐centric and do not accommodate coherent deformation of multiple shapes into smooth flows. We address this shortcoming with a new sketch‐based interface called Flow Curves which allows coordinating deformation across multiple objects. Core components of our method include an understanding of the principle of flow, algorithms to automatically identify subjective curve elements that may span multiple disconnected objects, and a deformation representation tailored to the view‐dependent nature of scene movement. As demonstrated in our video, sketching flow curves requires significantly less time than using traditional 3D editing workflows.  相似文献   

19.
《Advanced Robotics》2013,27(10):1057-1072
It is an easy task for the human visual system to gaze continuously at an object moving in three-dimensional (3-D) space. While tracking the object, human vision seems able to comprehend its 3-D shape with binocular vision. We conjecture that, in the human visual system, the function of comprehending the 3-D shape is essential for robust tracking of a moving object. In order to examine this conjecture, we constructed an experimental system of binocular vision for motion tracking. The system is composed of a pair of active pan-tilt cameras and a robot arm. The cameras are for simulating the two eyes of a human while the robot arm is for simulating the motion of the human body below the neck. The two active cameras are controlled so as to fix their gaze at a particular point on an object surface. The shape of the object surface around the point is reconstructed in real-time from the two images taken by the cameras based on the differences in the image brightness. If the two cameras successfully gaze at a single point on the object surface, it is possible to reconstruct the local object shape in real-time. At the same time, the reconstructed shape is used for keeping a fixation point on the object surface for gazing, which enables robust tracking of the object. Thus these two processes, reconstruction of the 3-D shape and maintaining the fixation point, must be mutually connected and form one closed loop. We demonstrate the effectiveness of this framework for visual tracking through several experiments.  相似文献   

20.
Accurate simulation of all the senses in virtual environments is a computationally expensive task. Visual saliency models have been used to improve computational performance for rendered content, but this is insufficient for multi‐modal environments. This paper considers cross‐modal perception and, in particular, if and how olfaction affects visual attention. Two experiments are presented in this paper. Firstly, eye tracking is gathered from a number of participants to gain an impression about where and how they view virtual objects when smell is introduced compared to an odourless condition. Based on the results of this experiment a new type of saliency map in a selective‐rendering pipeline is presented. A second experiment validates this approach, and demonstrates that participants rank images as better quality, when compared to a reference, for the same rendering budget.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号