首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 34 毫秒
1.
基于协同感知的视觉选择注意计算模型   总被引:1,自引:0,他引:1       下载免费PDF全文
鉴于在任务相关的视觉注意中,需要建立基于任务的视觉注意显著图来引导视觉注意,为此利用与人认知过程相接近的协同感知理论来研究基于任务的视觉注意计算模型,即首先利用协同识别理论研究二义及多义模式的视觉感知,得到协同视觉感知理论;然后将协同视觉感知中的模式与从视觉注意模型中提取的底层视觉特征相对应,利用偏置矩阵的性质计算底层视觉特征间受任务影响而产生的偏置,再由此偏置和底层视觉特征生成基于任务的视觉注意显著图;最后提出了基于协同感知理论的视觉选择注意计算模型。该算法用于基于任务的视觉搜索的实验结果表明,该算法是有效的,在认知上是合理的。  相似文献   

2.
This paper presents a new attention model for detecting visual saliency in news video. In the proposed model, bottom-up (low level) features and top-down (high level) factors are used to compute bottom-up saliency and top-down saliency respectively. Then, the two saliency maps are fused after a normalization operation. In the bottom-up attention model, we use quaternion discrete cosine transform in multi-scale and multiple color spaces to detect static saliency. Meanwhile, multi-scale local motion and global motion conspicuity maps are computed and integrated into motion saliency map. To effectively suppress the background motion noise, a simple histogram of average optical flow is adopted to calculate motion contrast. Then, the bottom-up saliency map is obtained by combining the static and motion saliency maps. In the top-down attention model, we utilize high level stimulus in news video, such as face, person, car, speaker, and flash, to generate the top-down saliency map. The proposed method has been extensively tested by using three popular evaluation metrics over two widely used eye-tracking datasets. Experimental results demonstrate the effectiveness of our method in saliency detection of news videos compared to several state-of-the-art methods.  相似文献   

3.
We propose a biologically-motivated computational model for learning task-driven and object-based visual attention control in interactive environments. In this model, top-down attention is learned interactively and is used to search for a desired object in the scene through biasing the bottom-up attention in order to form a need-based and object-driven state representation of the environment. Our model consists of three layers. First, in the early visual processing layer, most salient location of a scene is derived using the biased saliency-based bottom-up model of visual attention. Then a cognitive component in the higher visual processing layer performs an application specific operation like object recognition at the focus of attention. From this information, a state is derived in the decision making and learning layer. Top-down attention is learned by the U-TREE algorithm which successively grows an object-based binary tree. Internal nodes in this tree check the existence of a specific object in the scene by biasing the early vision and the object recognition parts. Its leaves point to states in the action value table. Motor actions are associated with the leaves. After performing a motor action, the agent receives a reinforcement signal from the critic. This signal is alternately used for modifying the tree or updating the action selection policy. The proposed model is evaluated on visual navigation tasks, where obtained results lend support to the applicability and usefulness of the developed method for robotics.  相似文献   

4.
Extraction of visual features for lipreading   总被引:4,自引:0,他引:4  
The multimodal nature of speech is often ignored in human-computer interaction, but lip deformations and other body motion, such as those of the head, convey additional information. We integrate speech cues from many sources and this improves intelligibility, especially when the acoustic signal is degraded. The paper shows how this additional, often complementary, visual speech information can be used for speech recognition. Three methods for parameterizing lip image sequences for recognition using hidden Markov models are compared. Two of these are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape or shape and appearance, respectively. The third, bottom-up, method uses a nonlinear scale-space analysis to form features directly from the pixel intensity. All methods are compared on a multitalker visual speech recognition task of isolated letters  相似文献   

5.
Understanding and reproducing complex human oculomotor behaviors using computational models is a challenging task. In this paper, two studies are presented, which focus on the development and evaluation of a computational model to show the influences of cyclic top-down and bottom-up processes on eye movements. To explain these processes, reinforcement learning was used to control eye movements. The first study showed that, in a picture-viewing task, different policies obtained from different picture-viewing conditions produced different types of eye movement patterns. In another visual search task, the second study illustrated that feedback information from each saccadic eye movement could be used to update the model's eye movement policy, generating different patterns in the following saccade. These two studies demonstrate the value of an integrated reinforcement learning model in explaining both top-down and bottom-up processes of eye movements within one computational model.  相似文献   

6.
Bottom-up segmentation based only on low-level cues is a notoriously difficult problem. This difficulty has lead to recent top-down segmentation algorithms that are based on class-specific image information. Despite the success of top-down algorithms, they often give coarse segmentations that can be significantly refined using low-level cues. This raises the question of how to combine both top-down and bottom-up cues in a principled manner. In this paper we approach this problem using supervised learning. Given a training set of ground truth segmentations we train a fragment-based segmentation algorithm which takes into account both bottom-up and top-down cues simultaneously, in contrast to most existing algorithms which train top-down and bottom-up modules separately. We formulate the problem in the framework of Conditional Random Fields (CRF) and derive a feature induction algorithm for CRF, which allows us to efficiently search over thousands of candidate fragments. Whereas pure top-down algorithms often require hundreds of fragments, our simultaneous learning procedure yields algorithms with a handful of fragments that are combined with low-level cues to efficiently compute high quality segmentations.  相似文献   

7.
A biologically inspired object-based visual attention model is proposed in this paper. This model includes a training phase and an attention phase. In the training phase, all training targets are fused into a target class and all training backgrounds are fused into a background class. Weight vector is computed as the ratio of the mean target class saliency and the mean background class saliency for each feature. In the attention phase, for an attended scene, all feature maps are combined into a top-down salience map with the weight vector by a hierarchy method. Then, top-down and bottom-up salience map are fused into a global salience map which guides the visual attention. At last, the size of each salient region is obtained by maximizing entropy. The merit of our model is that it can attend a class target object which can appear in the corresponding background class. Experimental results indicate that: when the attended target object doesn’t always appear in the background corresponding to that in the training images, our proposed model is excellent to Navalpakkam’s model and the top-down approach of VOCUS.  相似文献   

8.
This paper presents a model of 3D object recognition motivated from the robust properties of human vision system (HVS). The HVS shows the best efficiency and robustness for an object identification task. The robust properties of the HVS are visual attention, contrast mechanism, feature binding, multi-resolution, size tuning, and part-based representation. In addition, bottom-up and top-down information are combined cooperatively. Based on these facts, a plausible computational model integrating these facts under the Monte Carlo optimization technique was proposed. In this scheme, object recognition is regarded as a parameter optimization problem. The bottom-up process is used to initialize parameters in a discriminative way; the top-down process is used to optimize them in a generative way. Experimental results show that the proposed recognition model is feasible for 3D object identification and pose estimation in visible and infrared band images.  相似文献   

9.
Guiding Attention by Cooperative Cues   总被引:2,自引:1,他引:1       下载免费PDF全文
A common assumption in visual attention is based on the rationale of "limited capacity of information processing". From this view point there is little consideration of how different information channels or modules are cooperating because cells in processing stages are forced to compete for the limited resource. To examine the mechanism behind the cooperative behavior of information channels, a computational model of selective attention is implemented based on two hypotheses. Unlike the traditional view of visual attention, the cooperative behavior is assumed to be a dynamic integration process between the bottom-up and top-down information. Furthermore, top-down information is assumed to provide a contextual cue during selection process and to guide the attentional allocation among many bottom-up candidates. The result from a series of simulation with still and video images showed some interesting properties that could not be explained by the competitive aspect of selective attention alone.  相似文献   

10.
视觉选择性注意计算模型   总被引:1,自引:0,他引:1  
提出一种用于智能机器人的视觉注意计算模型.受生物学启发,该模型模仿人类自下而上和自上而下 两种视觉选择性注意过程.通过提取输入图像的多尺度下的多个底层特征,在频域分析各特征图的幅度谱,在空域 构造相应的特征显著图.根据显著图,计算出注意焦点的位置和注意区域的大小,结合给定的任务在各注意焦点之 间进行视觉转移.在多幅自然图像上进行实验,并给出相应的实验结果、定性和定量分析.实验结果与人类视觉注 意结果一致,表明该计算模型在注意效果、运算速度等方面有效.  相似文献   

11.
This paper proposes a visual attention servo control (VASC) method which uses the Gaussian mixture model (GMM) for task-specific applications of mobile robots. In particular, low dimensional bias feature template is obtained using GMM to get an efficient attention process. An image-based visual servo (IBVS) controller is used to search for a desired object in a scene through an attention system which forms a task-specific state representation of the environment. First, task definition and object representation in semantic memory (SM) are proposed, and bias feature template is obtained using GMM deduction for features from high dimension to low dimension. Second, the features intensity, color, size and orientation are extracted to build the feature set. Mean shift method is used to segment the visual scene into discrete proto-objects. Given a task-specific object, top-down bias attention is evaluated to generate the saliency map by combining with the bottom-up saliency-based attention. Third, a visual attention servo controller is developed to integrate the IBVS controller and the attention system for robotic cognitive control. A rule-based arbitrator is proposed to switch between the episodic memory (EM)-based controller and the IBVS controller depending on whether the robot obtains the desired attention point on the image. Finally, the proposed method is evaluated on task-specific object detection under different conditions and visual attention servo tasks. The obtained results validate the applicability and usefulness of the developed method for robotics.  相似文献   

12.
目的 为研究多场景下的行人检测,提出一种视觉注意机制下基于语义特征的行人检测方法。方法 首先,在初级视觉特征基础上,结合行人肤色的语义特征,通过将自下而上的数据驱动型视觉注意与自上而下的任务驱动型视觉注意有机结合,建立空域静态视觉注意模型;然后,结合运动信息的语义特征,采用运动矢量熵值计算运动显著性,建立时域动态视觉注意模型;在此基础上,以特征权重融合的方式,构建时空域融合的视觉注意模型,由此得到视觉显著图,并通过视觉注意焦点的选择完成行人检测。结果 选用标准库和实拍视频,在Matlab R2012a平台上,进行实验验证。与其他视觉注意模型进行对比仿真,本文方法具有良好的行人检测效果,在实验视频上的行人检测正确率达93%。结论 本文方法在不同的场景下具有良好的鲁棒性能,能够用于提高现有视频监控系统的智能化性能。  相似文献   

13.
This paper presents a real-time framework for computationally tracking objects visually attended by the user while navigating in interactive virtual environments. In addition to the conventional bottom-up (stimulus-driven) saliency map, the proposed framework uses top-down (goal-directed) contexts inferred from the user's spatial and temporal behaviors, and identifies the most plausibly attended objects among candidates in the object saliency map. The computational framework was implemented using GPU, exhibiting high computational performance adequate for interactive virtual environments. A user experiment was also conducted to evaluate the prediction accuracy of the tracking framework by comparing objects regarded as visually attended by the framework to actual human gaze collected with an eye tracker. The results indicated that the accuracy was in the level well supported by the theory of human cognition for visually identifying single and multiple attentive targets, especially owing to the addition of top-down contextual information. Finally, we demonstrate how the visual attention tracking framework can be applied to managing the level of details in virtual environments, without any hardware for head or eye tracking.  相似文献   

14.
知识管理中基于本体的扩展检索方法   总被引:2,自引:0,他引:2  
在知识管理系统中,为有效地解决用户查询与文档之间相同概念的不同表达形式造成的失配问题,提出一种基于本体、以面向任务情景的结构化描述作为信息体内容的语义索引的双向扩展检索方法,通过相容匹配和知识联网2种机制实现了扩展检索,分别对应于自上而下的和自下而上的2种途径;并采用查询重写模板(QRT)来搜索与当前任务相关的知识.基于原始查询和本体,QRT生成大量的子查询,同时将与原始查询相关度的权重传递给子查询式.自上而下方法或知识联网机制通过组织、任务本体检索到相关知识项.自下而上方法在任务情景中搜索相似任务,并获取包含该任务描述的知识项.2种方法都应用QRT实现基于本体的知识检索.实验结果表明:文中方法提高了知识管理系统的检索效率和准确率.  相似文献   

15.
This letter presents an improved cue integration approach to reliably separate coherent moving objects from their background scene in video sequences. The proposed method uses a probabilistic framework to unify bottom-up and top-down cues in a parallel, "democratic" fashion. The algorithm makes use of a modified Bayes rule where each pixel's posterior probabilities of figure or ground layer assignment are derived from likelihood models of three bottom-up cues and a prior model provided by a top-down cue. Each cue is treated as independent evidence for figure-ground separation. They compete with and complement each other dynamically by adjusting relative weights from frame to frame according to cue quality measured against the overall integration. At the same time, the likelihood or prior models of individual cues adapt toward the integrated result. These mechanisms enable the system to organize under the influence of visual scene structure without manual intervention. A novel contribution here is the incorporation of a top-down cue. It improves the system's robustness and accuracy and helps handle difficult and ambiguous situations, such as abrupt lighting changes or occlusion among multiple objects. Results on various video sequences are demonstrated and discussed. (Video demos are available at http://organic.usc.edu:8376/ approximately tangx/neco/index.html .).  相似文献   

16.
This paper presents a computational method of feature evaluation for modeling saliency in visual scenes. This is highly relevant in visual search studies since visual saliency is at the basis of visual attention deployment. Visual saliency can also become important in computer vision applications as it can be used to reduce the computational requirements by permitting processing only in those regions of the scenes containing relevant information. The method is based on Bayesian theory to describe the interaction between top-down and bottom-up information. Unlike other approaches, it evaluates and selects visual features before saliency estimation. This can reduce the complexity and, potentially, improve the accuracy of the saliency computation. To this end, we present an algorithm for feature evaluation and selection. A two-color conjunction search experiment has been applied to illustrate the theoretical framework of the proposed model. The practical value of the method is demonstrated with video segmentation of instruments in a laparoscopic cholecystectomy operation.  相似文献   

17.
王凤娇  田媚  黄雅平  艾丽华 《计算机科学》2016,43(1):85-88, 115
视觉注意是人类视觉系统中的重要部分,现有的视觉注意模型大多强调基于自底向上的注意,较少考虑自顶向下的语义,也鲜有针对不同类别图像的特定注意模型。眼动追踪技术可以客观、准确地捕捉到被试的注意焦点,但在视觉注意模型中的应用还比较少见。因此,提出了一种自底向上和自顶向下注意相结合的分类视觉注意模型CMVA,该模型针对不同类别的图像,在眼动数据的基础上训练分类视觉注意模型来进行视觉显著性预测。实验结果表明:与现有的其它8个视觉注意模型相比,该模型的性能最优。  相似文献   

18.
In order to explore the selective attention mechanism and the dual-task information-processing model, two experiments were carried out involving a visual search task and a visual detection task. The results showed that the early period of attention selection is controlled in a bottom-up manner. With respect to the dual-task information-processing model, the results showed that the central information-processing model would include a sequence model for tasks that use the same perception resource, causing a bottleneck in information processing. Our study suggests that a simple and prominent signal could be used to attract drivers' attention prior to the emergent events. Moreover, any human-machine interface design in driving-associated systems should consider this information-processing bottleneck. With respect to signal type, targeted and easy to categorize were two useful elements to consider.  相似文献   

19.
基于视觉注意力计算的运动目标检测方法研究   总被引:1,自引:0,他引:1  
为了更准确地在全局运动视频场景中检测运动目标,提出了一种基于运动注意力和粒子滤波自底向上和自顶向下相结合的运动目标检测方法。基于多尺度可变块运动估计估计运动矢量场(Motion Vector Filed,MVF),构建运动注意力模型,得到运动注意力显著图,继而得到运动注意力的初始分布;采用自顶向下的基于目标颜色信息的粒子滤波算法,调整运动注意力的分布状况;使注意力集中到待测目标上,并提取出待测运动目标。实验结果表明,该方法在全局运动场景中能更加准确地检测目标。  相似文献   

20.
For the purpose of extracting attention regions from distorted videos, a distortion-weighing spatiotemporal visual attention model is proposed. On the impact of spatial and temporal saliency maps, visual attention regions are acquired directed in a bottom-up manner. Meanwhile, the blocking artifact saliency map is detected according to intensity gradient features. An attention selection is applied to identify one of visual attention regions with more relatively serious blocking artifact as the Focus of Attention (FOA) directed in a top-down manner. Experimental results show that the proposed model can not only accurately analyze the spatiotemporal saliency based on the intensity, the texture, and the motion features, but also able to estimate the blocking artifact of distortions in comparing with Walther’s and You’s models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号