首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

In recent years, the significant progress has been achieved in the field of visual saliency modeling. Our research key is in video saliency, which differs substantially from image saliency and could be better detected by adding the gaze information from the movement of eyes while people are looking at the video. In this paper we purposed a novel gaze saliency method to predict video attention, which is inspired by the widespread usage of mobile smart devices with camera. It is a non-contacted method to predict visual attention, and it does not bring the burden on the hardware. Our method first extracts the bottom-up saliency maps from the video frames, and then constructs the mapping from eye images obtained by the camera in synchronization with the video frames to the screen region. Finally the combination between top-down gaze information and bottom-up saliency maps is conducted by point-wise multiplication to predict the video attention. Furthermore, the proposed approach is validated on the two datasets: one is the public dataset MIT, the other is the dataset we collected, versus other four usual methods, and the experiment results show that our method achieves the state-of-the-art.

  相似文献   

2.
Multimedia Tools and Applications - Action recognition in still images is an interesting subject in computer vision. One of the most important problems in still image-based action recognition is...  相似文献   

3.
In a recent study, we have introduced the problem of identifying cell-phones using recorded speech and shown that speech signals convey information about the source device, making it possible to identify the source with some accuracy. In this paper, we consider recognizing source cell-phone microphones using non-speech segments of recorded speech. Taking an information-theoretic approach, we use Gaussian Mixture Model (GMM) trained with maximum mutual information (MMI) to represent device-specific features. Experimental results using Mel-frequency and linear frequency cepstral coefficients (MFCC and LFCC) show that features extracted from the non-speech segments of speech contain higher mutual information and yield higher recognition rates than those from speech portions or the whole utterance. Identification rate improves from 96.42% to 98.39% and equal error rate (EER) reduces from 1.20% to 0.47% when non-speech parts are used to extract features. Recognition results are provided with classical GMM trained both with maximum likelihood (ML) and maximum mutual information (MMI) criteria, as well as support vector machines (SVMs). Identification under additive noise case is also considered and it is shown that identification rates reduces dramatically in case of additive noise.  相似文献   

4.
F Cutzu  M Tarr 《Neural computation》1999,11(6):1331-1348
We present an algorithm for computing the relative perceptual saliencies of the features of a three-dimensional object using either goodness-of-view scores measured at several viewpoints or perceptual similarities among several object views. This technique addresses the inverse, ill-posed version of the direct problem of predicting goodness-of-view scores or viewpoint similarities when the object features are known. On the basis of a linear model for the direct problem, we solve the inverse problem using the method of regularization. The critical assumption we make to regularize the solution is that perceptual salience varies slowly on the surface of the object. The salient regions derived using this assumption empirically indicate what object structures are important in human three-dimensional object perception, a domain where theories typically have been based on somewhat ad hoc features.  相似文献   

5.
6.
7.
8.
Action recognition using 3D DAISY descriptor   总被引:1,自引:0,他引:1  
  相似文献   

9.
In this paper, we propose a hierarchical discriminative approach for human action recognition. It consists of feature extraction with mutual motion pattern analysis and discriminative action modeling in the hierarchical manifold space. Hierarchical Gaussian Process Latent Variable Model (HGPLVM) is employed to learn the hierarchical manifold space in which motion patterns are extracted. A cascade CRF is also presented to estimate the motion patterns in the corresponding manifold subspace, and the trained SVM classifier predicts the action label for the current observation. Using motion capture data, we test our method and evaluate how body parts make effect on human action recognition. The results on our test set of synthetic images are also presented to demonstrate the robustness.  相似文献   

10.
针对复杂背景的视频图像车型识别,提出了一种利用尺度显著性的车型识别方法。由于尺度显著性对图像均一亮度变化、缩放、旋转以及噪声都具有不变性,因此引入尺度显著性算法提取车辆图像的分类特征。最后采用RBF网络分类验证该方法对多种车型的识别。实验结果表明,提取尺度显著性特征能够有效地识别汽车车型。  相似文献   

11.
近年来,基于人体动作识别的应用场景越来越广泛。为了更好的识别效果,提出了一种基于人体三维骨骼节点的动作识别方法。用Kinect等设备获取人体骨骼关节点三维数据信息,以人体臀部为原点重新建立人体坐标系;提取人体关键骨骼的数据信息,定义人体动作特征向量;根据动作表达式用行为树构造动作序列,实现识别。通过对5种定义的动作与其他算法做比较实验,表明提出的方法识别率较高,推广性较强。  相似文献   

12.
13.
Li  Zhifei  Zheng  Zhonglong  Lin  Feilong  Leung  Howard  Li  Qing 《Multimedia Tools and Applications》2019,78(14):19587-19601
Multimedia Tools and Applications - This paper presents a method for human action recognition from depth sequences captured by the depth camera. The main idea of the method is the action mapping...  相似文献   

14.
Human action classification is fundamental technology for robots that have to interpret a human’s intended actions and make appropriate responses, as they will have to do if they are to be integrated into our daily lives. Improved measurement of human motion, using an optical motion capture system or a depth sensor, allows robots to recognize human actions from superficial motion data, such as camera images containing human actions or positions of human bodies. But existing technology for motion recognition does not handle the contact force that always exists between the human and the environment that the human is acting upon. More specifically, humans perform feasible actions by controlling not only their posture but also the contact forces. Furthermore these contact forces require appropriate muscle tensions in the full body. These muscle tensions or activities are expected to be useful for robots observing human actions to estimate the human’s somatosensory states and consequently understand the intended action. This paper proposes a novel approach to classifying human actions using only the activities of all the muscles in the human body. Continuous spatio-temporal data of the activity of an individual muscle is encoded into a discrete hidden Markov model (HMM), and the set of HMMs for all the muscles forms a classifier for the specific action. Our classifiers were tested on muscle activities estimated from captured human motions, electromyography data, and reaction forces. The results demonstrate their superiority over commonly used HMM-based classifiers.  相似文献   

15.
Silhouette-based human action recognition using SAX-Shapes   总被引:1,自引:0,他引:1  
Human action recognition is an important problem in Computer Vision. Although most of the existing solutions provide good accuracy results, the methods are often overly complex and computationally expensive, hindering practical applications. In this regard, we introduce the combination of time-series representation for the silhouette and Symbolic Aggregate approXimation (SAX), which we refer to as SAX-Shapes, to address the problem of human action recognition. Given an action sequence, the extracted silhouettes of an actor from every frame are transformed into time series. Each of these time series is then efficiently converted into the symbolic vector: SAX. The set of all these SAX vectors (SAX-Shape) represents the action. We propose a rotation invariant distance function to be used by a random forest algorithm to perform the human action recognition. Requiring only silhouettes of actors, the proposed method is validated on two public datasets. It has an accuracy comparable to the related works and it performs well even in varying rotation.  相似文献   

16.
Human action recognition is an important branch among the studies of both human perception and computer vision systems. Along with the development of artificial intelligence, deep learning techniques have gained remarkable reputation when dealing with image categorization tasks (e.g., object detection and classification). However, since human actions normally present in the form of sequential image frames, analyzing human action data requires significantly increased computational power than still images when deep learning techniques are employed. Such a challenge has been the bottleneck for the migration of learning-based image representation techniques to action sequences, so that the old fashioned handcrafted human action representations are still widely used for human action recognition tasks. On the other hand, since handcrafted representations are usually ad-hoc and overfit to specific data, they are incapable of being generalized to deal with various realistic scenarios. Consequently, resorting to deep learning action representations for human action recognition tasks is eventually a natural option. In this work, we provide a detailed overview of recent advancements in human action representations. As the first survey that covers both handcrafted and learning-based action representations, we explicitly discuss the superiorities and limitations of exiting techniques from both kinds. The ultimate goal of this survey is to provide comprehensive analysis and comparisons between learning-based and handcrafted action representations respectively, so as to inspire action recognition researchers towards the study of both kinds of representation techniques.  相似文献   

17.

Saliency is the ability of being important, noticeable or attention worthy. Finding salient regions in images has important applications in automatic image cropping, image compression and advertisements. The salient regions for an individual in an image changes according to their gender, race, culture, likes, dislikes and experiences. Universal Saliency Maps point out the overall general salient regions without any considerations of personal traits of the subject. Therefore, personalized saliency maps are required for better and more personalized predictions of salient regions. In this study, using the RGB (Red, Green, Blue), CYMK (Cyan, Yellow, Magenta, Key), HSV (Hue, Saturation, Value) and HSL (Hue, Saturation, Lightness) fixation patterns of individuals, we propose a Gradient Boosted Tree Regression model to extract personalized saliency map from the universal saliency map with an average accuracy of 80% (Area Under Curve Judd Metrics). We also put forth our discussion for why some images and subjects have better saliency map predictions than others.

  相似文献   

18.
19.
稠密轨迹的人体行为识别对每一帧全图像密集采样导致特征维数高、计算量大且包含了无关的背景信息。提出基于显著性检测和稠密轨迹的人体行为识别方法。首先对视频帧进行多尺度静态显著性检测获取动作主体位置,并与对视频动态显著性检测的结果线性融合获取主体动作区域,通过仅在主体动作区域内提取稠密轨迹来改进原算法;然后采用Fisher Vector取代词袋模型对特征编码增强特征表达充分性;最后利用支持向量机实现人体行为识别。在KTH数据集和UCF Sports数据集上进行仿真实验,结果表明改进的算法相比于原算法识别准确率有所提升。  相似文献   

20.
Predefined sequences of eye movements, or ‘gaze gestures’, can be consciously performed by humans and monitored non-invasively using remote video oculography. Gaze gestures hold great potential in human–computer interaction, HCI, as long as they can be easily assimilated by potential users, monitored using low cost gaze tracking equipment and machine learning algorithms are able to distinguish the spatio-temporal structure of intentional gaze gestures from typical gaze activity performed during standard HCI. In this work, an evaluation of the performance of a bioinspired Bayesian pattern recognition algorithm known as Hierarchical Temporal Memory (HTM) on the real time recognition of gaze gestures is carried out through a user study. To improve the performance of traditional HTM during real time recognition, an extension of the algorithm is proposed in order to adapt HTM to the temporal structure of gaze gestures. The extension consists of an additional top node in the HTM topology that stores and compares sequences of input data by sequence alignment using dynamic programming. The spatio-temporal codification of a gesture in a sequence serves the purpose of handling the temporal evolution of gaze gestures instances. The extended HTM allows for reliable discrimination of intentional gaze gestures from otherwise standard human–machine gaze interaction reaching up to 98% recognition accuracy for a data set of 10 categories of gaze gestures, acceptable completion speeds and a low rate of false positives during standard gaze–computer interaction. These positive results despite the low cost hardware employed supports the notion of using gaze gestures as a new HCI paradigm for the fields of accessibility and interaction with smartphones, tablets, projected displays and traditional desktop computers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号