期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Spatiotemporal context-aware network for video salient object detection

Chen Tianyou Xiao Jin Hu Xiaoguang Zhang Guofeng Wang Shaojie 《Neural computing & applications》2022,34(19):16861-16877

Neural Computing and Applications - It has been witnessed that there is an increasing interest in video salient object detection (VSOD) in computer vision field. Different from image salient object... 相似文献

2.

结合显著特征与运动目标滤除的动态场景识别

下载免费PDF全文

王璐张道光蔡自兴《计算机工程与应用》2010,46(26):152-154

针对存在运动目标的动态环境,提出一种基于动态目标滤除思想的显著特征提取方法,据此实现基于局部图像特征的场景识别。首先简要介绍基于局部显著图像特征的场景识别方法,然后提出了带动态目标滤除思想的显著特征提取框架,并详细讨论了运动目标检测及提取的实现。实验结果和分析表明,该方法能够有效地过滤环境中的运动目标,提高场景识别的精度。相似文献

3.

Unsupervised human action retrieval using salient points in 3D mesh sequences

Veinidis Christos Pratikakis Ioannis Theoharis Theoharis 《Multimedia Tools and Applications》2019,78(3):2789-2814

相似文献

4.

Simultaneous tracking and action recognition for single actor human actions 总被引：1，自引：0，他引：1

Vivek Kumar Singh Ram Nevatia 《The Visual computer》2011,27(12):1115-1123

This paper presents an approach to simultaneously tracking the pose and recognizing human actions in a video. This is achieved by combining a Dynamic Bayesian Action Network (DBAN) with 2D body part models. Existing DBAN implementation relies on fairly weak observation features, which affects the recognition accuracy. In this work, we use a 2D body part model for accurate pose alignment, which in turn improves both pose estimate and action recognition accuracy. To compensate for the additional time required for alignment, we use an action entropy-based scheme to determine the minimum number of states to be maintained in each frame while avoiding sample impoverishment. In addition, we also present an approach to automation of the keypose selection task for learning 3D action models from a few annotations. We demonstrate our approach on a hand gesture dataset with 500 action sequences, and we show that compared to DBAN our algorithm achieves 6% improvement in accuracy. 相似文献

5.

A novel approach for recognition of human actions with semi-global features

Catherine Achard Xingtai Qu Arash Mokhber Maurice Milgram 《Machine Vision and Applications》2008,19(1):27-34

In this study a new approach is presented for the recognition of human actions of everyday life with a fixed camera. The originality of the presented method consists in characterizing sequences by a temporal succession of semi-global features, which are extracted from “space-time micro-volumes”. The advantage of this approach lies in the use of robust features (estimated on several frames) associated with the ability to manage actions with variable durations and easily segment the sequences with algorithms that are specific to time-varying data. Each action is actually characterized by a temporal sequence that constitutes the input of a Hidden Markov Model system for the recognition. Results presented of 1,614 sequences performed by several persons validate the proposed approach. 相似文献

6.

Survey on classifying human actions through visual sensors

Michael S. Del Rose Christian C. Wagner 《Artificial Intelligence Review》2012,37(4):301-311

The ability to predict the intentions of people based solely on their visual actions is a skill only performed by humans and animals. This requires segmentation of items in the field of view, tracking of moving objects, identifying the importance of each object, determining the current role of each important object individually and in collaboration with other objects, relating these objects into a predefined scenario, assessing the selected scenario with the information retrieve, and finally adjusting the scenario to better fit the data. This is all accomplished with great accuracy in less than a few seconds. The intelligence of current computer algorithms has not reached this level of complexity with the accuracy and time constraints that humans and animals have, but there are several research efforts that are working towards this by identifying new algorithms for solving parts of this problem. This survey paper lists several of these efforts that rely mainly on understanding the image processing and classification of a limited number of actions. It divides the activities up into several groups and ends with a discussion of future needs. 相似文献

7.

采用时空注意力机制的人脸微表情识别

下载免费PDF全文

李国豪袁一帆贲晛烨张军平《中国图象图形学报》2020,25(11):2380-2390

目的微表情是人自发产生的一种面部肌肉运动,可以展现人试图掩盖的真实情绪,在安防、嫌疑人审问和心理学测试等有潜在的应用。为缓解微表情面部肌肉变化幅度小、持续时间短所带来的识别准确率低的问题,本文提出了一种用于识别微表情的时空注意力网络（spatiotemporal attention network,STANet）。方法 STANet包含一个空间注意力模块和一个时间注意力模块。首先,利用空间注意力模块使模型的注意力集中在产生微表情强度更大的区域,再利用时间注意力模块对微表情变化更大因而判别性更强的帧给予更大的权重。结果在3个公开微表情数据集（The Chinese Academy of Sciences microexpression,CASME;CASME II;spontaneous microexpression database-high speed camera,SMIC-HS）上,使用留一交叉验证与其他8个算法进行了对比实验。实验结果表明,STANet在CASME数据集上的分类准确率相比于性能第2的模型Sparse MDMO（sparse main directional mean optical flow）提高了1.78%;在CASME II数据集上,分类准确率相比于性能第2的模型HIGO（histogram of image gradient orientation）提高了1.90%;在SMIC-HS数据集上,分类准确率达到了68.90%。结论针对微表情肌肉幅度小、产生区域小、持续时间短的特点,本文将注意力机制用于微表情识别任务中,提出了STANet模型,使得模型将注意力集中于产生微表情幅度更大的区域和相邻帧之间变化更大的片段。相似文献

8.

A discriminative structural model for joint segmentation and recognition of human actions

Cuiwei Liu Jingyi Hou Xinxiao Wu Yunde Jia 《Multimedia Tools and Applications》2018,77(24):31627-31645

Achieving joint segmentation and recognition of continuous actions in a long-term video is a challenging task due to the varying durations of actions and the complex transitions of multiple actions. In this paper, a novel discriminative structural model is proposed for splitting a long-term video into segments and annotating the action label of each segment. A set of state variables is introduced into the model to explore discriminative semantic concepts shared among different actions. To exploit the statistical dependences among segments, temporal context is captured at both the action level and the semantic concept level. The state variables are treated as latent information in the discriminative structural model and inferred during both training and testing. Experiments on multi-view IXMAS and realistic Hollywood datasets demonstrate the effectiveness of the proposed method. 相似文献

9.

A hierarchical Bayesian network for event recognition of human actions and interactions 总被引：3，自引：0，他引：3

Sangho?Park Email author J.?K.?Aggarwal 《Multimedia Systems》2004,10(2):164-179

相似文献

10.

Real-time gesture recognition by learning and selective control of visual interest points 总被引：3，自引：0，他引：3

Kirishima T Sato K Chihara K 《IEEE transactions on pattern analysis and machine intelligence》2005,27(3):351-364

For the real-time recognition of unspecified gestures by an arbitrary person, a comprehensive framework is presented that addresses two important problems in gesture recognition systems: selective attention and processing frame rate. To address the first problem, we propose the Quadruple Visual Interest Point Strategy. No assumptions are made with regard to scale or rotation of visual features, which are computed from dynamically changing regions of interest in a given image sequence. In this paper, each of the visual features is referred to as a visual interest point, to which a probability density function is assigned, and the selection is carried out. To address the second problem, we developed a selective control method to equip the recognition system with self-load monitoring and controlling functionality. Through evaluation experiments, we show that our approach provides robust recognition with respect to such factors as type of clothing, type of gesture, extent of motion trajectories, and individual differences in motion characteristics. In order to indicate the real-time performance and utility aspects of our approach, a gesture video system is developed that demonstrates full video-rate interaction with displayed image objects. 相似文献

11.

基于改进信息增益的人体动作识别视觉词典建立

吴峰王颖《计算机应用》2017,37(8):2240-2243

针对词袋（BoW）模型方法基于信息增益的视觉词典建立方法未考虑词频对动作识别的影响,为提高动作识别准确率,提出了基于改进信息增益建立视觉词典的方法。首先,基于3D Harris提取人体动作视频时空兴趣点并利用K均值聚类建立初始视觉词典;然后引入类内词频集中度和类间词频分散度改进信息增益,计算初始词典中词汇的改进信息增益,选择改进信息增益大的视觉词汇建立新的视觉词典;最后基于支持向量机（SVM）采用改进信息增益建立的视觉词典进行人体动作识别。采用KTH和Weizmann人体动作数据库进行实验验证。相比传统信息增益,两个数据库利用改进信息增益建立的视觉词典动作识别准确率分别提高了1.67%和3.45%。实验结果表明,提出的基于改进信息增益的视觉词典建立方法能够选择动作识别能力强的视觉词汇,提高动作识别准确率。相似文献

12.

基于视觉感知的图像显著区域的提取

杨雪《微型机与应用》2015,(2):47-48,51

基于Itti模型,提出了一种改进的模型来提取图像显著区域,采用Itti方法提取图像的亮度、朝向特征显著图,在此基础上,将图像的频域特征融入到图像的颜色特征提取中,并且加入图像的轮廊特征提取,避免了Itti模型提取特征时没有明显的轮廊边界的现象。在显著图的合并阶段,采用局部迭代法取代直接相加的合并方式。此模型与Itti模型相比,提取的显著图效果更加明显。相似文献

13.

Model-based recognition of human actions by trajectory matching in phase spaces

Adolfo López-Méndez Josep R. Casas 《Image and vision computing》2012

This paper presents a human action recognition framework based on the theory of nonlinear dynamical systems. The ultimate aim of our method is to recognize actions from multi-view video. We estimate and represent human motion by means of a virtual skeleton model providing the basis for a view-invariant representation of human actions. Actions are modeled as a set of weighted dynamical systems associated to different model variables. We use time-delay embeddings on the time series resulting of the evolution of model variables along time to reconstruct phase portraits of appropriate dimensions. These phase portraits characterize the underlying dynamical systems. We propose a distance to compare trajectories within the reconstructed phase portraits. These distances are used to train SVM models for action recognition. Additionally, we propose an efficient method to learn a set of weights reflecting the discriminative power of a given model variable in a given action class. Our approach presents a good behavior on noisy data, even in cases where action sequences last just for a few frames. Experiments with marker-based and markerless motion capture data show the effectiveness of the proposed method. To the best of our knowledge, this contribution is the first to apply time-delay embeddings on data obtained from multi-view video. 相似文献

14.

Fixed partitioning and salient points with MPEG-7 cluster correlograms for image categorization

Azizi Abdullah Author Vitae Remco C. Veltkamp^{Author Vitae} 《Pattern recognition》2010,43(3):650-662

相似文献

15.

CGA: a new feature selection model for visual human action recognition

Guha Ritam Khan Ali Hussain Singh Pawan Kumar Sarkar Ram Bhattacharjee Debotosh 《Neural computing & applications》2021,33(10):5267-5286

Neural Computing and Applications - Recognition of human actions from visual contents is a budding field of computer vision and image understanding. The problem with such a recognition system is... 相似文献

16.

Matching sequences of salient contour points characterized by Voronoi region features

Yuqing Song Shuyuan Jin 《The Visual computer》2012,28(5):475-491

In this paper, we introduce a shape matching method by matching sequences of salient contour points that are characterized by Voronoi region features. The proposed approach is summarized as follows: (1) a sequence of salient contour points is selected using the Voronoi diagram of the contour point set, (2) the features of the salient points are computed based on the interior and exterior regions of the Voronoi diagram, and (3) a cyclic edit distance is used to match two shapes. Tests on the MPEG-7 and ETH-80 datasets demonstrated the effectiveness and efficiency of the proposed method. 相似文献

17.

Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos

《Image and vision computing》2017

相似文献

18.

Holistic context models for visual recognition

Rasiwasia N Vasconcelos N 《IEEE transactions on pattern analysis and machine intelligence》2012,34(5):902-917

A novel framework to context modeling based on the probability of co-occurrence of objects and scenes is proposed. The modeling is quite simple, and builds upon the availability of robust appearance classifiers. Images are represented by their posterior probabilities with respect to a set of contextual models, built upon the bag-of-features image representation, through two layers of probabilistic modeling. The first layer represents the image in a semantic space, where each dimension encodes an appearance-based posterior probability with respect to a concept. Due to the inherent ambiguity of classifying image patches, this representation suffers from a certain amount of contextual noise. The second layer enables robust inference in the presence of this noise by modeling the distribution of each concept in the semantic space. A thorough and systematic experimental evaluation of the proposed context modeling is presented. It is shown that it captures the contextual “gist” of natural images. Scene classification experiments show that contextual classifiers outperform their appearance-based counterparts, irrespective of the precise choice and accuracy of the latter. The effectiveness of the proposed approach to context modeling is further demonstrated through a comparison to existing approaches on scene classification and image retrieval, on benchmark data sets. In all cases, the proposed approach achieves superior results. 相似文献

19.

Representing three-dimensional structures for visual recognition

R. B. Fisher 《Artificial Intelligence Review》1987,1(3):183-200

Three-dimensional (3-D) geometrical models provide the best representations for 3-D objects. Not all representation schemes are suitable, however, for computer-based visual recognition. This survey analyses the historical development of recognition-oriented models from points and lines, to surfaces and volumes. It also considers those aspects of the models that successfully promoted recognition, and suggests likely areas for future development. 相似文献

20.

A model for visual recognition of alphanumerics

K. JAYARAM TJDUPA I. S. N. MURTY 《International journal of systems science》2013,44(6):575-603

In this paper we propose a hypothetical scheme for recognizing the alphanumerics. The scheme is based on the known physiological structure of the visual cortex and the concept of a short Lino extractor nouron (SLEN). We assumo four basic typca of such units for extracting vertical, horizontal, right and left inclined straight line segments. The patterns reconstructed from the scheme show perfect agreement with the test patterns. The model indicates that the recognition of letters T and H requires extraction of the largest number of features. 相似文献