首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Chen  Tianyou  Xiao  Jin  Hu  Xiaoguang  Zhang  Guofeng  Wang  Shaojie 《Neural computing & applications》2022,34(19):16861-16877
Neural Computing and Applications - It has been witnessed that there is an increasing interest in video salient object detection (VSOD) in computer vision field. Different from image salient object...  相似文献   

针对存在运动目标的动态环境,提出一种基于动态目标滤除思想的显著特征提取方法,据此实现基于局部图像特征的场景识别。首先简要介绍基于局部显著图像特征的场景识别方法,然后提出了带动态目标滤除思想的显著特征提取框架,并详细讨论了运动目标检测及提取的实现。实验结果和分析表明,该方法能够有效地过滤环境中的运动目标,提高场景识别的精度。  相似文献   

Simultaneous tracking and action recognition for single actor human actions   总被引:1,自引:0,他引:1  
This paper presents an approach to simultaneously tracking the pose and recognizing human actions in a video. This is achieved by combining a Dynamic Bayesian Action Network (DBAN) with 2D body part models. Existing DBAN implementation relies on fairly weak observation features, which affects the recognition accuracy. In this work, we use a 2D body part model for accurate pose alignment, which in turn improves both pose estimate and action recognition accuracy. To compensate for the additional time required for alignment, we use an action entropy-based scheme to determine the minimum number of states to be maintained in each frame while avoiding sample impoverishment. In addition, we also present an approach to automation of the keypose selection task for learning 3D action models from a few annotations. We demonstrate our approach on a hand gesture dataset with 500 action sequences, and we show that compared to DBAN our algorithm achieves 6% improvement in accuracy.  相似文献   

In this study a new approach is presented for the recognition of human actions of everyday life with a fixed camera. The originality of the presented method consists in characterizing sequences by a temporal succession of semi-global features, which are extracted from “space-time micro-volumes”. The advantage of this approach lies in the use of robust features (estimated on several frames) associated with the ability to manage actions with variable durations and easily segment the sequences with algorithms that are specific to time-varying data. Each action is actually characterized by a temporal sequence that constitutes the input of a Hidden Markov Model system for the recognition. Results presented of 1,614 sequences performed by several persons validate the proposed approach.  相似文献   

The ability to predict the intentions of people based solely on their visual actions is a skill only performed by humans and animals. This requires segmentation of items in the field of view, tracking of moving objects, identifying the importance of each object, determining the current role of each important object individually and in collaboration with other objects, relating these objects into a predefined scenario, assessing the selected scenario with the information retrieve, and finally adjusting the scenario to better fit the data. This is all accomplished with great accuracy in less than a few seconds. The intelligence of current computer algorithms has not reached this level of complexity with the accuracy and time constraints that humans and animals have, but there are several research efforts that are working towards this by identifying new algorithms for solving parts of this problem. This survey paper lists several of these efforts that rely mainly on understanding the image processing and classification of a limited number of actions. It divides the activities up into several groups and ends with a discussion of future needs.  相似文献   

目的 微表情是人自发产生的一种面部肌肉运动,可以展现人试图掩盖的真实情绪,在安防、嫌疑人审问和心理学测试等有潜在的应用。为缓解微表情面部肌肉变化幅度小、持续时间短所带来的识别准确率低的问题,本文提出了一种用于识别微表情的时空注意力网络(spatiotemporal attention network,STANet)。方法 STANet包含一个空间注意力模块和一个时间注意力模块。首先,利用空间注意力模块使模型的注意力集中在产生微表情强度更大的区域,再利用时间注意力模块对微表情变化更大因而判别性更强的帧给予更大的权重。结果 在3个公开微表情数据集(The Chinese Academy of Sciences microexpression,CASME;CASME II;spontaneous microexpression database-high speed camera,SMIC-HS)上,使用留一交叉验证与其他8个算法进行了对比实验。实验结果表明,STANet在CASME数据集上的分类准确率相比于性能第2的模型Sparse MDMO(sparse main directional mean optical flow)提高了1.78%;在CASME II数据集上,分类准确率相比于性能第2的模型HIGO(histogram of image gradient orientation)提高了1.90%;在SMIC-HS数据集上,分类准确率达到了68.90%。结论 针对微表情肌肉幅度小、产生区域小、持续时间短的特点,本文将注意力机制用于微表情识别任务中,提出了STANet模型,使得模型将注意力集中于产生微表情幅度更大的区域和相邻帧之间变化更大的片段。  相似文献   

Achieving joint segmentation and recognition of continuous actions in a long-term video is a challenging task due to the varying durations of actions and the complex transitions of multiple actions. In this paper, a novel discriminative structural model is proposed for splitting a long-term video into segments and annotating the action label of each segment. A set of state variables is introduced into the model to explore discriminative semantic concepts shared among different actions. To exploit the statistical dependences among segments, temporal context is captured at both the action level and the semantic concept level. The state variables are treated as latent information in the discriminative structural model and inferred during both training and testing. Experiments on multi-view IXMAS and realistic Hollywood datasets demonstrate the effectiveness of the proposed method.  相似文献   

For the real-time recognition of unspecified gestures by an arbitrary person, a comprehensive framework is presented that addresses two important problems in gesture recognition systems: selective attention and processing frame rate. To address the first problem, we propose the Quadruple Visual Interest Point Strategy. No assumptions are made with regard to scale or rotation of visual features, which are computed from dynamically changing regions of interest in a given image sequence. In this paper, each of the visual features is referred to as a visual interest point, to which a probability density function is assigned, and the selection is carried out. To address the second problem, we developed a selective control method to equip the recognition system with self-load monitoring and controlling functionality. Through evaluation experiments, we show that our approach provides robust recognition with respect to such factors as type of clothing, type of gesture, extent of motion trajectories, and individual differences in motion characteristics. In order to indicate the real-time performance and utility aspects of our approach, a gesture video system is developed that demonstrates full video-rate interaction with displayed image objects.  相似文献   

吴峰  王颖 《计算机应用》2017,37(8):2240-2243
针对词袋(BoW)模型方法基于信息增益的视觉词典建立方法未考虑词频对动作识别的影响,为提高动作识别准确率,提出了基于改进信息增益建立视觉词典的方法。首先,基于3D Harris提取人体动作视频时空兴趣点并利用K均值聚类建立初始视觉词典;然后引入类内词频集中度和类间词频分散度改进信息增益,计算初始词典中词汇的改进信息增益,选择改进信息增益大的视觉词汇建立新的视觉词典;最后基于支持向量机(SVM)采用改进信息增益建立的视觉词典进行人体动作识别。采用KTH和Weizmann人体动作数据库进行实验验证。相比传统信息增益,两个数据库利用改进信息增益建立的视觉词典动作识别准确率分别提高了1.67%和3.45%。实验结果表明,提出的基于改进信息增益的视觉词典建立方法能够选择动作识别能力强的视觉词汇,提高动作识别准确率。  相似文献   

杨雪 《微型机与应用》2015,(2):47-48,51
基于Itti模型,提出了一种改进的模型来提取图像显著区域,采用Itti方法提取图像的亮度、朝向特征显著图,在此基础上,将图像的频域特征融入到图像的颜色特征提取中,并且加入图像的轮廊特征提取,避免了Itti模型提取特征时没有明显的轮廊边界的现象。在显著图的合并阶段,采用局部迭代法取代直接相加的合并方式。此模型与Itti模型相比,提取的显著图效果更加明显。  相似文献   

This paper presents a human action recognition framework based on the theory of nonlinear dynamical systems. The ultimate aim of our method is to recognize actions from multi-view video. We estimate and represent human motion by means of a virtual skeleton model providing the basis for a view-invariant representation of human actions. Actions are modeled as a set of weighted dynamical systems associated to different model variables. We use time-delay embeddings on the time series resulting of the evolution of model variables along time to reconstruct phase portraits of appropriate dimensions. These phase portraits characterize the underlying dynamical systems. We propose a distance to compare trajectories within the reconstructed phase portraits. These distances are used to train SVM models for action recognition. Additionally, we propose an efficient method to learn a set of weights reflecting the discriminative power of a given model variable in a given action class. Our approach presents a good behavior on noisy data, even in cases where action sequences last just for a few frames. Experiments with marker-based and markerless motion capture data show the effectiveness of the proposed method. To the best of our knowledge, this contribution is the first to apply time-delay embeddings on data obtained from multi-view video.  相似文献   

Neural Computing and Applications - Recognition of human actions from visual contents is a budding field of computer vision and image understanding. The problem with such a recognition system is...  相似文献   

In this paper, we introduce a shape matching method by matching sequences of salient contour points that are characterized by Voronoi region features. The proposed approach is summarized as follows: (1) a sequence of salient contour points is selected using the Voronoi diagram of the contour point set, (2) the features of the salient points are computed based on the interior and exterior regions of the Voronoi diagram, and (3) a cyclic edit distance is used to match two shapes. Tests on the MPEG-7 and ETH-80 datasets demonstrated the effectiveness and efficiency of the proposed method.  相似文献   

A novel framework to context modeling based on the probability of co-occurrence of objects and scenes is proposed. The modeling is quite simple, and builds upon the availability of robust appearance classifiers. Images are represented by their posterior probabilities with respect to a set of contextual models, built upon the bag-of-features image representation, through two layers of probabilistic modeling. The first layer represents the image in a semantic space, where each dimension encodes an appearance-based posterior probability with respect to a concept. Due to the inherent ambiguity of classifying image patches, this representation suffers from a certain amount of contextual noise. The second layer enables robust inference in the presence of this noise by modeling the distribution of each concept in the semantic space. A thorough and systematic experimental evaluation of the proposed context modeling is presented. It is shown that it captures the contextual “gist” of natural images. Scene classification experiments show that contextual classifiers outperform their appearance-based counterparts, irrespective of the precise choice and accuracy of the latter. The effectiveness of the proposed approach to context modeling is further demonstrated through a comparison to existing approaches on scene classification and image retrieval, on benchmark data sets. In all cases, the proposed approach achieves superior results.  相似文献   

Three-dimensional (3-D) geometrical models provide the best representations for 3-D objects. Not all representation schemes are suitable, however, for computer-based visual recognition. This survey analyses the historical development of recognition-oriented models from points and lines, to surfaces and volumes. It also considers those aspects of the models that successfully promoted recognition, and suggests likely areas for future development.  相似文献   

In this paper we propose a hypothetical scheme for recognizing the alphanumerics. The scheme is based on the known physiological structure of the visual cortex and the concept of a short Lino extractor nouron (SLEN). We assumo four basic typca of such units for extracting vertical, horizontal, right and left inclined straight line segments. The patterns reconstructed from the scheme show perfect agreement with the test patterns. The model indicates that the recognition of letters T and H requires extraction of the largest number of features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号