首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
2.
Parallel pattern recognition requires great computational resources; it is NP-complete. From an engineering point of view it is desirable to achieve good performance with limited resources. For this purpose, we develop a serial model for visual pattern recognition based on the primate selective attention mechanism. The idea in selective attention is that not all parts of an image give us information. If we can attend only to the relevant parts, we can recognize the image more quickly and using less resources. We simulate the primitive, bottom-up attentive level of the human visual system with a saliency scheme and the more complex, top-down, temporally sequential associative level with observable Markov models. In between, there is a neural network that analyses image parts and generates posterior probabilities as observations to the Markov model. We test our model first on a handwritten numeral recognition problem and then apply it to a more complex face recognition problem. Our results indicate the promise of this approach in complicated vision applications  相似文献   

3.
In this paper, a new artificial neural network model is proposed for visual object recognition, in which the bottom-up, sensory-driven pathway and top-down, expectation-driven pathway are fused in information processing and their corresponding weights are learned based on the fused neuron activities. During the supervised learning process, the target labels are applied to update the bottom-up synaptic weights of the neural network. Meanwhile, the hypotheses generated by the bottom-up pathway produce expectations on sensory inputs through the top-down pathway. The expectations are constrained by the real data from the sensory inputs, which can be used to update the top-down synaptic weights accordingly. To further improve the visual object recognition performance, the multi-scale histograms of oriented gradients (MS-HOG) method is proposed to extract local features of visual objects from images. Extensive experiments on different image datasets demonstrate the efficiency and robustness of the proposed neural network model with features extracted using the MS-HOG method on visual object recognition compared with other state-of-the-art methods.  相似文献   

4.
Even though visual attention models using bottom-up saliency can speed up object recognition by predicting object locations, in the presence of multiple salient objects, saliency alone cannot discern target objects from the clutter in a scene. Using a metric named familiarity, we propose a top-down method for guiding attention towards target objects, in addition to bottom-up saliency. To demonstrate the effectiveness of familiarity, the unified visual attention model (UVAM) which combines top-down familiarity and bottom-up saliency is applied to SIFT based object recognition. The UVAM is tested on 3600 artificially generated images containing COIL-100 objects with varying amounts of clutter, and on 126 images of real scenes. The recognition times are reduced by 2.7× and 2×, respectively, with no reduction in recognition accuracy, demonstrating the effectiveness and robustness of the familiarity based UVAM.  相似文献   

5.
目的 为研究多场景下的行人检测,提出一种视觉注意机制下基于语义特征的行人检测方法。方法 首先,在初级视觉特征基础上,结合行人肤色的语义特征,通过将自下而上的数据驱动型视觉注意与自上而下的任务驱动型视觉注意有机结合,建立空域静态视觉注意模型;然后,结合运动信息的语义特征,采用运动矢量熵值计算运动显著性,建立时域动态视觉注意模型;在此基础上,以特征权重融合的方式,构建时空域融合的视觉注意模型,由此得到视觉显著图,并通过视觉注意焦点的选择完成行人检测。结果 选用标准库和实拍视频,在Matlab R2012a平台上,进行实验验证。与其他视觉注意模型进行对比仿真,本文方法具有良好的行人检测效果,在实验视频上的行人检测正确率达93%。结论 本文方法在不同的场景下具有良好的鲁棒性能,能够用于提高现有视频监控系统的智能化性能。  相似文献   

6.
This paper presents a model of 3D object recognition motivated from the robust properties of human vision system (HVS). The HVS shows the best efficiency and robustness for an object identification task. The robust properties of the HVS are visual attention, contrast mechanism, feature binding, multi-resolution, size tuning, and part-based representation. In addition, bottom-up and top-down information are combined cooperatively. Based on these facts, a plausible computational model integrating these facts under the Monte Carlo optimization technique was proposed. In this scheme, object recognition is regarded as a parameter optimization problem. The bottom-up process is used to initialize parameters in a discriminative way; the top-down process is used to optimize them in a generative way. Experimental results show that the proposed recognition model is feasible for 3D object identification and pose estimation in visible and infrared band images.  相似文献   

7.
Under natural viewing conditions, human observers selectively allocate their attention to subsets of the visual input. Since overt allocation of attention appears as eye movements, the mechanism of selective attention can be uncovered through computational studies of eyemovement predictions. Since top-down attentional control in a task is expected to modulate eye movements significantly, the models that take a bottom-up approach based on low-level local properties are not expected to suffice for prediction. In this study, we introduce two representative models, apply them to a facial discrimination task with morphed face images, and evaluate their performance by comparing them with the human eye-movement data. The result shows that they are not good at predicting eye movements in this task.  相似文献   

8.
一种引入注意机制的视觉计算模型   总被引:3,自引:0,他引:3       下载免费PDF全文
提出了一种基于注意机制的视觉模型,其特点是:将注意过程分为3个层次,分别模拟生物的瞳孔聚焦、眼动和头动;提出一种新的可变结构的非均匀采样映射来模拟生物视网膜特性;强调了数据驱动的botom-up过程与知识驱动的top-down过程的融合;提出新颖的类似树形的知识表示方法和基于深度优先搜索树的注意点转移控制机制。  相似文献   

9.
视觉选择性注意计算模型   总被引:1,自引:0,他引:1  
提出一种用于智能机器人的视觉注意计算模型.受生物学启发,该模型模仿人类自下而上和自上而下 两种视觉选择性注意过程.通过提取输入图像的多尺度下的多个底层特征,在频域分析各特征图的幅度谱,在空域 构造相应的特征显著图.根据显著图,计算出注意焦点的位置和注意区域的大小,结合给定的任务在各注意焦点之 间进行视觉转移.在多幅自然图像上进行实验,并给出相应的实验结果、定性和定量分析.实验结果与人类视觉注 意结果一致,表明该计算模型在注意效果、运算速度等方面有效.  相似文献   

10.
In Visual question answering (VQA), a natural language answer is generated for a given image and a question related to that image. There is a significant growth in the VQA task by applying an efficient attention mechanism. However, current VQA models use region features or object features that are not adequate to improve the accuracy of generated answers. To deal with this issue, we have used a Two-way Co-Attention Mechanism (TCAM), which is capable enough to fuse different visual features (region, object, and concept) from diverse perspectives. These diverse features lead to different sets of answers, and also, there is an inherent relationship between these visual features. We have developed a powerful attention mechanism that uses these two critical aspects by using both bottom-up and top-down TCAM to extract discriminative feature information. We have proposed a Collective Feature Integration Module (CFIM) to combine multimodal attention features and thus capture the valuable information from these visual features by employing a TCAM. Further, we have formulated a Vertical CFIM for fusing the features belonging to the same class and a Horizontal CFIM for combining the features belonging to different types, thus balancing the influence of top-down and bottom-up co-attention. The experiments are conducted on two significant datasets, VQA 1.0 and VQA 2.0. On VQA 1.0, the overall accuracy of our proposed method is 71.23 on the test-dev set and 71.94 on the test-std set. On VQA 2.0, the overall accuracy of our proposed method is 75.89 on the test-dev set and 76.32 on the test-std set. The above overall accuracy clearly reflecting the superiority of our proposed TCAM based approach over the existing methods.  相似文献   

11.
A visual attention model for robot object tracking   总被引:1,自引:0,他引:1  
Inspired by human behaviors, a robot object tracking model is proposed on the basis of visual attention mechanism, which is fit for the theory of topological perception. The model integrates the image-driven, bottom-up attention and the object-driven, top-down attention, whereas the previous attention model has mostly focused on either the bottom-up or top-down attention. By the bottom-up component, the whole scene is segmented into the ground region and the salient regions. Guided by top-down strategy which is achieved by a topological graph, the object regions are separated from the salient regions. The salient regions except the object regions are the barrier regions. In order to estimate the model, a mobile robot platform is developed, on which some experiments are implemented. The experimental results indicate that processing an image with a resolution of 752*480 pixels takes less than 200ms and the object regions are unabridged. The analysis obtained by comparing the proposed model with the existing model demonstrates that the proposed model has some advantages in robot object tracking in terms of speed and efficiency.  相似文献   

12.
In image classification based on bag of visual words framework, image patches used for creating image representations affect the classification performance significantly. However, currently, patches are sampled mainly based on processing low-level image information or just extracted regularly or randomly. These methods are not effective, because patches extracted through these approaches are not necessarily discriminative for image categorization. In this paper, we propose to utilize both bottom-up information through processing low-level image information and top-down information through exploring statistical properties of training image grids to extract image patches. In the proposed work, an input image is divided into regular grids, each of which is evaluated based on its bottom-up information and/or top-down information. Subsequently, every grid is assigned a saliency value based on its evaluation result, so that a saliency map can be created for the image. Finally, patch sampling from the input image is performed on the basis of the obtained saliency map. Furthermore, we propose a method to fuse these two kinds of information. The proposed methods are evaluated on both object categories and scene categories. Experiment results demonstrate their effectiveness.  相似文献   

13.
A coherent computational approach to model bottom-up visual attention   总被引:5,自引:0,他引:5  
Visual attention is a mechanism which filters out redundant visual information and detects the most relevant parts of our visual field. Automatic determination of the most visually relevant areas would be useful in many applications such as image and video coding, watermarking, video browsing, and quality assessment. Many research groups are currently investigating computational modeling of the visual attention system. The first published computational models have been based on some basic and well-understood human visual system (HVS) properties. These models feature a single perceptual layer that simulates only one aspect of the visual system. More recent models integrate complex features of the HVS and simulate hierarchical perceptual representation of the visual input. The bottom-up mechanism is the most occurring feature found in modern models. This mechanism refers to involuntary attention (i.e., salient spatial visual features that effortlessly or involuntary attract our attention). This paper presents a coherent computational approach to the modeling of the bottom-up visual attention. This model is mainly based on the current understanding of the HVS behavior. Contrast sensitivity functions, perceptual decomposition, visual masking, and center-surround interactions are some of the features implemented in this model. The performances of this algorithm are assessed by using natural images and experimental measurements from an eye-tracking system. Two adequate well-known metrics (correlation coefficient and Kullbacl-Leibler divergence) are used to validate this model. A further metric is also defined. The results from this model are finally compared to those from a reference bottom-up model.  相似文献   

14.
This paper presents a new attention model for detecting visual saliency in news video. In the proposed model, bottom-up (low level) features and top-down (high level) factors are used to compute bottom-up saliency and top-down saliency respectively. Then, the two saliency maps are fused after a normalization operation. In the bottom-up attention model, we use quaternion discrete cosine transform in multi-scale and multiple color spaces to detect static saliency. Meanwhile, multi-scale local motion and global motion conspicuity maps are computed and integrated into motion saliency map. To effectively suppress the background motion noise, a simple histogram of average optical flow is adopted to calculate motion contrast. Then, the bottom-up saliency map is obtained by combining the static and motion saliency maps. In the top-down attention model, we utilize high level stimulus in news video, such as face, person, car, speaker, and flash, to generate the top-down saliency map. The proposed method has been extensively tested by using three popular evaluation metrics over two widely used eye-tracking datasets. Experimental results demonstrate the effectiveness of our method in saliency detection of news videos compared to several state-of-the-art methods.  相似文献   

15.
王凤娇  田媚  黄雅平  艾丽华 《计算机科学》2016,43(1):85-88, 115
视觉注意是人类视觉系统中的重要部分,现有的视觉注意模型大多强调基于自底向上的注意,较少考虑自顶向下的语义,也鲜有针对不同类别图像的特定注意模型。眼动追踪技术可以客观、准确地捕捉到被试的注意焦点,但在视觉注意模型中的应用还比较少见。因此,提出了一种自底向上和自顶向下注意相结合的分类视觉注意模型CMVA,该模型针对不同类别的图像,在眼动数据的基础上训练分类视觉注意模型来进行视觉显著性预测。实验结果表明:与现有的其它8个视觉注意模型相比,该模型的性能最优。  相似文献   

16.
Both bottom-up and top-down approaches have been presented for hypothetical reasoning. However, both have merits and demerits, which are complementary. Thus, hypothetical reasoners combining those approaches are promising. Here, we concern about a bottom-up hypothetical reasoner incorporating top-down information. In order to simulate top-down reasoning on bottom-up reasoners, we can apply the upside-down meta-interpretation method, which is similar to Magic Set and Alexander methods, by transforming a set of Horn clauses into a program incorporating goal information. Unfortunately, it does not achieve speedups for bottom-up hypothetical reasoning because checking consistencies of solutions by negative clauses should be globally evaluated. This paper presents a new method to reduce the consistency checking cost for bottom-up hypothetical reasoning based on the upside-down meta-interpretation. In the transformation algorithm, logical dependencies between a goal and negative clauses are analyzed to find irrelevant negative clauses, so that bottom-up hypothetical reasoning based on the upside-down meta-interpretation can restrict consistency checking of negative clauses to those relevant clauses.  相似文献   

17.
This paper studies the design and application of a novel visual attention model designed to compute user's gaze position automatically, i.e., without using a gaze-tracking system. The model we propose is specifically designed for real-time first-person exploration of 3D virtual environments. It is the first model adapted to this context which can compute in real time a continuous gaze point position instead of a set of 3D objects potentially observed by the user. To do so, contrary to previous models which use a mesh-based representation of visual objects, we introduce a representation based on surface-elements. Our model also simulates visual reflexes and the cognitive processes which take place in the brain such as the gaze behavior associated to first-person navigation in the virtual environment. Our visual attention model combines both bottom-up and top-down components to compute a continuous gaze point position on screen that hopefully matches the user's one. We conducted an experiment to study and compare the performance of our method with a state-of-the-art approach. Our results are found significantly better with sometimes more than 100 percent of accuracy gained. This suggests that computing a gaze point in a 3D virtual environment in real time is possible and is a valid approach, compared to object-based approaches. Finally, we expose different applications of our model when exploring virtual environments. We present different algorithms which can improve or adapt the visual feedback of virtual environments based on gaze information. We first propose a level-of-detail approach that heavily relies on multiple-texture sampling. We show that it is possible to use the gaze information of our visual attention model to increase visual quality where the user is looking, while maintaining a high-refresh rate. Second, we introduce the use of the visual attention model in three visual effects inspired by the human visual system namely: depth-of-field blur, camera- motions, and dynamic luminance. All these effects are computed based on the simulated gaze of the user, and are meant to improve user's sensations in future virtual reality applications.  相似文献   

18.
基于视觉注意力计算的运动目标检测方法研究   总被引:1,自引:0,他引:1  
为了更准确地在全局运动视频场景中检测运动目标,提出了一种基于运动注意力和粒子滤波自底向上和自顶向下相结合的运动目标检测方法。基于多尺度可变块运动估计估计运动矢量场(Motion Vector Filed,MVF),构建运动注意力模型,得到运动注意力显著图,继而得到运动注意力的初始分布;采用自顶向下的基于目标颜色信息的粒子滤波算法,调整运动注意力的分布状况;使注意力集中到待测目标上,并提取出待测运动目标。实验结果表明,该方法在全局运动场景中能更加准确地检测目标。  相似文献   

19.
This paper presents a computational method of feature evaluation for modeling saliency in visual scenes. This is highly relevant in visual search studies since visual saliency is at the basis of visual attention deployment. Visual saliency can also become important in computer vision applications as it can be used to reduce the computational requirements by permitting processing only in those regions of the scenes containing relevant information. The method is based on Bayesian theory to describe the interaction between top-down and bottom-up information. Unlike other approaches, it evaluates and selects visual features before saliency estimation. This can reduce the complexity and, potentially, improve the accuracy of the saliency computation. To this end, we present an algorithm for feature evaluation and selection. A two-color conjunction search experiment has been applied to illustrate the theoretical framework of the proposed model. The practical value of the method is demonstrated with video segmentation of instruments in a laparoscopic cholecystectomy operation.  相似文献   

20.
暴林超  蔡超  肖洁  周成平 《计算机工程》2011,37(13):17-19,25
针对自然场景图像中复杂结构目标的快速定位问题,提出一种新的视觉注意模型。对目标进行学习提取显著性图斑,将图斑的特征信息、异质图斑之间的相对位置关系引入视觉注意过程,采用基于图匹配的图斑搜索策略合并与目标特征相似的异质图斑,从而获得注意焦点。与自底向上的视觉注意模型进行实验对比,结果表明该模型能引入复杂结构目标的特征信息和结构信息,降低无效关注次数,提高视觉注意的效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号