基于协同感知的视觉选择注意计算模型   总被引:1,自引:0,他引:1       下载免费PDF全文
鉴于在任务相关的视觉注意中,需要建立基于任务的视觉注意显著图来引导视觉注意,为此利用与人认知过程相接近的协同感知理论来研究基于任务的视觉注意计算模型,即首先利用协同识别理论研究二义及多义模式的视觉感知,得到协同视觉感知理论;然后将协同视觉感知中的模式与从视觉注意模型中提取的底层视觉特征相对应,利用偏置矩阵的性质计算底层视觉特征间受任务影响而产生的偏置,再由此偏置和底层视觉特征生成基于任务的视觉注意显著图;最后提出了基于协同感知理论的视觉选择注意计算模型。该算法用于基于任务的视觉搜索的实验结果表明,该算法是有效的,在认知上是合理的。  相似文献   

We study the problem of recovering temporal parameters which act as predictive operators, generalize time-to-collision and have direct interpretation for navigational purposes for piecewise arbitrarily smooth (polynomial) motion. A result stating that, for monocular observers undergoing arbitrary polynomial laws, these parameters are visually observable, is presented in the first part of this paper. This property suggests an alternate temporal representation of visual looming information. The second part of this paper is concerned with algorithmic approaches for environments with maneuvering agents. A method addressing model order determination, collision detection, and temporal parameter estimation is proposed. Experimental results are reported.  相似文献   

图像描述是机器学习和计算机视觉的重要研究领域,但现有方法对于视觉特征和模型架构之间存在的语义信息关联性探索还存在不足.本文提出了一种基于用户标签、视觉特征的注意力模型架构,能够有效地结合社交图像特征和图像中用户标签生成更加准确的描述.我们在MSCOCO数据集上进行了实验来验证算法性能,实验结果表明本文提出的基于用户标签、视觉特征的注意力模型与传统方法相比具有明显的优越性.  相似文献   

场景中的不规则文本识别仍然是一个具有挑战性的问题。针对场景中的任意形状以及低质量文本,本文提出了融合视觉注意模块与语义感知模块的多模态网络模型。视觉注意模块采用基于并行注意的方式,与位置感知编码结合提取图像的视觉特征。基于弱监督学习的语义感知模块用于学习语言信息以弥补视觉特征的缺陷,采用基于Transformer的变体,通过随机遮罩单词中的一个字符进行训练提高模型的上下文语义推理能力。视觉语义融合模块通过选通机制将不同模态的信息进行交互以产生用于字符预测的鲁棒特征。通过大量的实验证明,所提出的方法可以有效地对任意形状和低质量的场景文本进行识别,并且在多个基准数据集上获得了具有竞争力的结果。特别地,对于包含低质量文本的数据集SVT和SVTP,识别准确率分别达到了93.6%和86.2%。与只使用视觉模块的模型相比,准确率分别提升了3.5%和3.9%,充分表明了语义信息对于文本识别的重要性。  相似文献   

动作识别与行为理解综述   总被引:4,自引:1,他引:4       下载免费PDF全文
随着“以人为中心计算”的兴起和生活中不断涌现的新应用,动作识别和行为理解逐渐成为计算机视觉领域的研究热点。主要从视觉处理的角度分析了动作识别和行为理解的研究现状,从行为的定义、运动特征提取和动作表示以及行为理解的推理方法3个方面对目前的工作做了分析和比较,并且指出了目前这些工作面临的难题和今后的研究方向。  相似文献   

在工程领域,作业人员通常需要面对刺激分布不均的复杂信息界面,并执行相关的交互任务.作业人员的视觉注意力分配已被证明与任务绩效密切相关,但对于复杂界面中基于不同信息分配策略的多优先级刺激对作业人员的视觉注意力分配及任务绩效间的潜在联系仍亟待研究.对此,本文基于多优先级注意力分配策略实验对作业人员在不同负荷条件下的任务绩效和视觉行为的影响机制展开研究.实验结果表明,差异性的分配策略和信息优先级划分提升了任务绩效表现,不同分配策略和优先级划分条件下的视觉行为存在显著差异,并受脑力负荷的影响.该结论能够为人机交互界面的设计和优化提供参考,从而提高作业人员在任务中的绩效表现.  相似文献   

图像描述是目前图像理解领域的研究热点. 针对图像中文描述句子质量不高的问题, 本文提出融合双注意力与多标签的图像中文描述生成方法. 本文方法首先提取输入图像的视觉特征与多标签文本, 然后利用多标签文本增强解码器的隐藏状态与视觉特征的关联度, 根据解码器的隐藏状态对视觉特征分配注意力权重, 并将加权后的视觉特征解码为词语...  相似文献   

We discuss a coordinate-free approach to the geometry of computer vision problems. The technique we use to analyse the three-dimensional transformations involved will be that of geometric algebra: a framework based on the algebras of Clifford and Grassmann. This is not a system designed specifically for the task in hand, but rather a framework for all mathematical physics. Central to the power of this approach is the way in which the formalism deals with rotations; for example, if we have two arbitrary sets of vectors, known to be related via a 3D rotation, the rotation is easily recoverable if the vectors are given. Extracting the rotation by conventional means is not as straightforward. The calculus associated with geometric algebra is particularly powerful, enabling one, in a very natural way, to take derivatives with respect to any multivector (general element of the algebra). What this means in practice is that we can minimize with respect to rotors representing rotations, vectors representing translations, or any other relevant geometric quantity. This has important implications for many of the least-squares problems in computer vision where one attempts to find optimal rotations, translations etc., given observed vector quantities. We will illustrate this by analysing the problem of estimating motion from a pair of images, looking particularly at the more difficult case in which we have available only 2D information and no information on range. While this problem has already been much discussed in the literature, we believe the present formulation to be the only one in which least-squares estimates of the motion and structure are derived simultaneously using analytic derivatives.  相似文献   

针对SVM进行图像分割时存在对噪声和孤立点较敏感导致分割结果不佳和抗造性能低下等问题,提出一种基于视觉注意和改进隶属度的FSVM (Modified fuzzy SVM,MFSVM)彩色图像分割方法.该方法在考虑人类视觉显著性检测机制因素的同时,对标准的模糊SVM算法进行改进,新的隶属度函数综合考虑了样本点距离类中心的远近以及样本点的疏密程度,从而有效惩罚噪声点并增强了支持向量的作用.通过彩色图像分割进行验证,结果显示与标准的SVM及基于样本疏密程度隶属度的FSVM分割方法相比,本文方法能够对复杂场景下的彩色进行有效分割,同时呈现出良好的抗噪能力.  相似文献   

人的行为识别是计算机视觉领域中的重点研究问题之一.相对于静态图像中物体识别研究,行为识别更加关注如何感知感兴趣目标在图像序列中的时空运动变化.视觉行为的存在方式从二维空间到三维时空的扩展大大增加了行为表达及后续识别任务的复杂性,同时也为视觉研究者提供了更广阔的空间以尝试不同的解决思路和技术方法.近年来,人的行为识别相关工作层出不穷,已成为计算机视觉研究中的热点方向.以时间为顺序,对从21世纪初至今约15年中出现的视觉行为识别研究方法进行了梳理、归类和总结.相比其他综述性文章,以不同时期人的行为识别数据库的演化为线索,介绍不同时期行为识别研究所关注的研究重点问题和主要研究思路,能更清晰直观地体现行为识别研究的发展历程.同时,以数据库演化历程为顺序介绍行为识别,能更好地呼应当前视觉领域愈来愈受人关注的大数据驱动的研究思路.通过对相关工作的梳理和总结,还对今后行为识别研究的发展方向做出展望,希望对各位研究者方向把握上提供一些帮助.  相似文献   

Abstract.  We explore an important phase of information systems design (ISD), namely task redesign, and especially how different viewpoints enter into the discussions. We study how one particular visual representation, a process diagram, is interpreted and how alternative, even competing, representations are produced verbally. To tie the visual and verbal representations and the representational practices to wider social practices, we develop and use the Extended Three-dimensional Model of discourse. Visual representations emerged as focal in bringing in the different viewpoints and as reference points for discussions. Our model provided a focused and powerful means to unveil for the outside researchers how the planned changes in tasks and authority relationships instigated a social struggle. The IS designer was an outsider to the client organization and therefore considered only the information system, not the social system in which it was intended to operate. Other participants did not recognize this, therefore, seeing the designer as furthering managerial interests. Seeing task redesign in the social context of a client organization can help IS designers and researchers to understand what the users see naturally, that is, the ISD as a dynamic, enabling but socially constrained process where different viewpoints are represented.  相似文献   

金侠挺  王耀南  张辉  刘理  钟杭  贺振东 《自动化学报》2019,45(12):2312-2327
面向复杂多样的钢轨场景, 本文扩展了最先进的深度学习语义分割框架DeepLab v3+ 到一个新的轻量级、可伸缩性的贝叶斯版本DeeperLab, 实现表面缺陷的概率分割. 具体地, Dropout被融入改进的Xception网络, 使得从后验分布中生成蒙特卡罗样本; 其次, 提出多尺度多速率的空洞空间金字塔池化(Atrous spatial pyramid pooling, ASPP)模块, 提取任意分辨率下的密集特征图谱; 更简单有效的解码器细化目标的边界, 计算Softmax概率的均值和方差作为分割预测和不确定性. 为解决类别不平衡问题, 基于在线前景 − 背景挖掘思想, 提出损失注意力网络(Loss attention network, LAN)定位缺陷以计算惩罚系数, 从而补偿和抑制DeeperLab的前景与背景损失, 实现辅助监督训练. 实验结果表明本文算法具有91.46 %分割精度和0.18 s/帧的运行速度, 相比其他方法更加快速鲁棒.  相似文献   

Learning spatial models from sensor data raises the challenging data association problem of relating model parameters to individual measurements. This paper proposes an EM-based algorithm, which solves the model learning and the data association problem in parallel. The algorithm is developed in the context of the the structure from motion problem, which is the problem of estimating a 3D scene model from a collection of image data. To accommodate the spatial constraints in this domain, we compute virtual measurements as sufficient statistics to be used in the M-step. We develop an efficient Markov chain Monte Carlo sampling method called chain flipping, to calculate these statistics in the E-step. Experimental results show that we can solve hard data association problems when learning models of 3D scenes, and that we can do so efficiently. We conjecture that this approach can be applied to a broad range of model learning problems from sensordata, such as the robot mapping problem.  相似文献   

An experiment examined the effects of visual signalling to relevant information in multiple external representations and the visual presence of an animated pedagogical agent (APA). Students learned electric circuit analysis using a computer‐based learning environment that included Cartesian graphs, equations and electric circuit diagrams. The experiment was a 2 (visual signalling, no visual signalling) × 2 (visual APA presence, no visual APA presence) between‐subjects design, resulting in four experimental conditions: visual signalling with APA presence (APA + S), visual signalling without APA presence (S), no visual signalling with APA presence (APA) and no visual signalling without APA presence (C). Signalling was provided via gestures of the APA in the APA + S condition and via dynamic arrows in the S condition. To investigate potential moderating effects of prior knowledge on APA presence and visual signalling factors, middle school students were grouped into low prior knowledge (LPK) and high prior knowledge (HPK) groups using scores on a domain pre‐test. Results revealed that LPK students had higher post‐test scores after learning with visual signalling, resulting in equivalent post‐test performance to their HPK counterparts. LPK students also had higher post‐test scores, higher ratings of graphics understanding and lower perceived difficulty ratings in conditions that included the visual image of the APA. Conversely, HPK students had better post‐test scores after learning without the APA. These results indicate that the effectiveness of visual signalling techniques and the visual presence of an APA is dependent on learner characteristics, including prior domain knowledge.  相似文献   

In Part I of this paper we developed the theory and algorithms for performing Shape-From-Silhouette (SFS) across time. In this second part, we show how our temporal SFS algorithms can be used in the applications of human modeling and markerless motion tracking. First we build a system to acquire human kinematic models consisting of precise shape (constructed using the temporal SFS algorithm for rigid objects), joint locations, and body part segmentation (estimated using the temporal SFS algorithm for articulated objects). Once the kinematic models have been built, we show how they can be used to track the motion of the person in new video sequences. This marker-less tracking algorithm is based on the Visual Hull alignment algorithm used in both temporal SFS algorithms and utilizes both geometric (silhouette) and photometric (color) information.Electronic supplementary material Electronic supplementary material is available for this article at and accessible for authorised users.  相似文献   

The logarithmic image processing (LIP) model is amathematical framework based on abstract linear mathematicswhich provides a set of specific algebraic and functionaloperations that can be applied to the processing of intensityimages valued in a bounded range. The LIP model has been provedto be physically justified in the setting of transmitted lightand to be consistent with several laws and characteristics ofthe human visual system. Successful application examples havealso been reported in several image processing areas, e.g.,image enhancement, image restoration, three-dimensional imagereconstruction, edge detection and image segmentation.The aim of this article is to show that the LIP model is atractable mathematical framework for image processing which isconsistent with several laws and characteristics of humanbrightness perception. This is a survey article in the sensethat it presents (almost) previously published results in arevised, refined and self-contained form. First, an introductionto the LIP model is exposed. Emphasis will be especially placedon the initial motivation and goal, and on the scope of themodel. Then, an introductory summary of mathematicalfundamentals of the LIP model is detailed. Next, the articleaims at surveying the connections of the LIP model with severallaws and characteristics of human brightness perception, namelythe brightness scale inversion, saturation characteristic, Weber'sand Fechner's laws, and the psychophysical contrast notion. Finally,it is shown that the LIP model is a powerful and tractable framework for handling the contrast notion. This is done througha survey of several LIP-model-based contrast estimators associated with special subparts (point, pair of points,boundary, region) of intensity images, that are justified bothfrom a physical and mathematical point of view.  相似文献   

This research assessed how emotive animated agents in a simulation‐based training affect the performance outcomes and perceptions of the individuals interacting in real time with the training application. A total of 56 participants consented to complete the study. The material for this investigation included a nursing simulation in which participants interacted with three animated agents. The results of this investigation indicated that both experienced and novice participants focused more visual attention time on the body of the animated agent than the other defined areas of interest in the simulated environment. The results also indicated that novice participants conveyed more neutral facial expressions during the interaction with the animated agents than experience participants. The results of the simulation performance scores indicated that novice participants achieved higher simulation performance scores on the simulation task than experienced participants. Lastly, the results of the agent persona instrument showed that experienced and novice participants perceived the animated agents as facilitators of learning, credible, human‐like and engaging.  相似文献   

In this paper we study the problem of recovering the 3D shape, reflectance, and non-rigid motion properties of a dynamic 3D scene. Because these properties are completely unknown and because the scene's shape and motion may be non-smooth, our approach uses multiple views to build a piecewise-continuous geometric and radiometric representation of the scene's trace in space-time. A basic primitive of this representation is the dynamic surfel, which (1) encodes the instantaneous local shape, reflectance, and motion of a small and bounded region in the scene, and (2) enables accurate prediction of the region's dynamic appearance under known illumination conditions. We show that complete surfel-based reconstructions can be created by repeatedly applying an algorithm called Surfel Sampling that combines sampling and parameter estimation to fit a single surfel to a small, bounded region of space-time. Experimental results with the Phong reflectancemodel and complex real scenes (clothing, shiny objects, skin) illustrate our method's ability to explain pixels and pixel variations in terms of their underlying causes—shape, reflectance, motion, illumination, and visibility.  相似文献   

The human electroencephalographic (EEG) power spectra when viewing visual stimuli of a real motion image and of motion images with 60 frames/s (fps) and 240 fps were investigated. The EEG spectra in response to the 240 fps motion image stimuli were more similar to those of the real motion image stimuli than those of the 60 fps stimuli. This high frame rate (240 fps) motion image is considered to have a possibility of providing perceptions of motion image quality that are close to the impression upon looking at real world scenes.  相似文献   

