共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we propose an approach of inferring the labels of unlabeled consumer videos and at the same time recognizing the key segments of the videos by learning from Web image sets for video annotation. The key segments of the videos are automatically recognized by transferring the knowledge learned from related Web image sets to the videos. We introduce an adaptive latent structural SVM method to adapt the pre-learned classifiers using Web image sets to an optimal target classifier, where the locations of the key segments are modeled as latent variables because the ground-truth of key segments are not available. We utilize a limited number of labeled videos and abundant labeled Web images for training annotation models, which significantly alleviates the time-consuming and labor-expensive collection of a large number of labeled training videos. Experiment on the two challenge datasets Columbia’s Consumer Video (CCV) and TRECVID 2014 Multimedia Event Detection (MED2014) shows our method performs better than state-of-art methods. 相似文献
2.
Multimedia Tools and Applications - Human action recognition in realistic videos is an important and challenging task. Recent studies demonstrate that multi-feature fusion can significantly improve... 相似文献
3.
Gutoski Matheus Lazzaretti André Eugênio Lopes Heitor Silvério 《Neural computing & applications》2021,33(4):1207-1220
Neural Computing and Applications - Human action recognition (HAR) is a topic widely studied in computer vision and pattern recognition. Despite the success of recent models for this issue, most of... 相似文献
4.
In our days, due the evolution of high-speed computers, the old Human–Computer Interface (HCI) legacies based on mouse and keyboard are slowly becoming obsolete and cannot be accurate enough and respond in a timely manner to the flow of information today. This is why new ways of communicating with the computer have to be researched, the most natural one being the use of gestures. In this paper, a two-level architecture for recognizing human gestures from video frames is proposed. The architecture makes use of several feed-forward neural networks to compute the gestures based on the Haar-like features of body, hand and finger as well as a stochastic-free context grammar that is employed to comprise the mutual context between body pose and hand movement. Trained and tested on 10 gestures (Swipe Right, Swipe Left, Swipe Up, Swipe Down, Horizontal Wave, Vertical Wave, Circle, Point, Palm Up and Fist) the over 94 % accuracy of the system surpasses the current state of the art and compared with a system with no mutual context between body position and hand movement our proposed architecture shows an increase in accuracy with up to 7 %. 相似文献
5.
6.
Multimedia Tools and Applications - One of the most challenging tasks in computer vision is human action recognition. The recent development of depth sensors has created new opportunities in this... 相似文献
7.
8.
Multimedia Tools and Applications - The Dynamic Adaptive Streaming over HTTP (MPEG-DASH) ensures online videos display of good quality and without interruption. It provides an adequate streaming... 相似文献
9.
10.
Along with the emerging focus of community-contributed videos on the web, there is a strong demand of a well-designed web video benchmark for the research of social network based video content analysis. The existing video datasets are challenged in two aspects: (1) as the data resource, most of them are narrowed for a specific task, either focusing on one content analysis task with limited scales, or focusing on the pure social network analysis without downloading video content. (2) as the evaluation platform, few of them pay attention to the potential bias introduced by the sampling criteria, therefore cannot fairly measure the task performance. In this paper, we release a large-scale web video benchmark named MCG-WEBV 2.0, which crawls 248,887 YouTube videos and their corresponding social network structure with 123,063 video contributors. MCG-WEBV 2.0 can be used to explore the fusion between content and network for several web video analysis tasks. Based on MCG-WEBV 2.0, we further explore the sampling bias lies in web video benchmark construction. While sampling a completely unbiased video benchmark from million-scale collection is unpractical, we propose a task-dependent measurement of such bias, which minimizes the correlation between the potential video sampling bias and the corresponding content analysis task, if such bias is unavoidable. Following this principle, we have shown several exemplar application scenarios in MCG-WEBV 2.0. 相似文献
11.
Information as to identity appears to be concentrated in and around certain areas of shapes and objects whereas other parts are more-or-less redundant. The experiments described in this paper were intended to establish which particular features of human faces, both in isolation and combination, convey most information for recognition. The results indicate the relative importance to recognition of different facial features. However, certain faces appear to embody idiosyncratic cues while some are confused consistently with others whom they appear to resemble. 相似文献
12.
Pillai Karthik Ganesh Ramaswamy Radhakrishnan Kanthavel Ramakrishnan Dhaya Yesudhas Harold Robinson Eanoch Golden Julie Kumar Raghvendra Long Hoang Viet Son Le Hoang 《Multimedia Tools and Applications》2021,80(5):7077-7101
Multimedia Tools and Applications - Detection and clustering of commercial advertisements plays an important role in multimedia indexing also in the creation of personalized user content. In... 相似文献
13.
Numerous web videos associated with rich metadata are available on the Internet today. While such metadata like video tags bring us facilitation and opportunities for video search and multimedia content understanding, some challenges also arise due to the fact that those video tags are usually annotated at the video level, while many tags actually only describe parts of the video content. How to localize the relevant parts or frames of web video for given tags is the key to many applications and research tasks. In this paper we propose combining topic model and relevance filtering to localize relevant frames. Our method is designed in three steps. First, we apply relevance filtering to assign relevance scores to video frames and a raw relevant frame set is obtained by selecting the top ranked frames. Then, we separate the frames into topics by mining the underlying semantics using latent Dirichlet allocation and use the raw relevance set as validation set to select relevant topics. Finally, the topical relevances are used to refine the raw relevant frame set and the final results are obtained. Experiment results on two real web video databases validate the effectiveness of the proposed approach. 相似文献
14.
Recognizing action units for facial expression analysis 总被引:16,自引:0,他引:16
Tian Y.-I. Kanade T. Cohn J.F. 《IEEE transactions on pattern analysis and machine intelligence》2001,23(2):97-115
15.
《Applied Soft Computing》2004,4(1):35-47
In this paper, we describe development of a mobile robot which does unsupervised learning for recognizing an environment from action sequences. We call this novel recognition approach action-based environment modeling (AEM). Most studies on recognizing an environment have tried to build precise geometric maps with high sensitive and global sensors. However such precise and global information may be hardly obtained in a real environment, and may be unnecessary to recognize an environment. Furthermore unsupervised-learning is necessary for recognition in an unknown environment without help of a teacher. Thus we attempt to build a mobile robot which does unsupervised-learning to recognize environments with low sensitive and local sensors. The mobile robot is behavior-based and does wall-following in enclosures (called rooms). Then the sequences of actions executed in each room are transformed into environment vectors for self-organizing maps. Learning without a teacher is done, and the robot becomes able to identify rooms. Moreover, we develop a method to identify environments independent of a start point using a partial sequence. We have fully implemented the system with a real mobile robot, and made experiments for evaluating the ability. As a result, we found out that the environment recognition was done well and our method was adaptive to noisy environments. 相似文献
16.
Most of the research on text categorization has focused on classifying text documents into a set of categories with no structural relationships among them (flat classification). However, in many information repositories documents are organized in a hierarchy of categories to support a thematic search by browsing topics of interests. The consideration of the hierarchical relationship among categories opens several additional issues in the development of methods for automated document classification. Questions concern the representation of documents, the learning process, the classification process and the evaluation criteria of experimental results. They are systematically investigated in this paper, whose main contribution is a general hierarchical text categorization framework where the hierarchy of categories is involved in all phases of automated document classification, namely feature selection, learning and classification of a new document. An automated threshold determination method for classification scores is embedded in the proposed framework. It can be applied to any classifier that returns a degree of membership of a document to a category. In this work three learning methods are considered for the construction of document classifiers, namely centroid-based, naïve Bayes and SVM. The proposed framework has been implemented in the system WebClassIII and has been tested on three datasets (Yahoo, DMOZ, RCV1) which present a variety of situations in terms of hierarchical structure. Experimental results are reported and several conclusions are drawn on the comparison of the flat vs. the hierarchical approach as well as on the comparison of different hierarchical classifiers. The paper concludes with a review of related work and a discussion of previous findings vs. our findings. 相似文献
17.
In this paper, we present a user-based event detection method for social web videos. Previous research in event detection has focused on content-based techniques, such as pattern recognition algorithms that attempt to understand the contents of a video. There are few user-centric approaches that have considered either search keywords, or external data such as comments, tags, and annotations. Moreover, some of the user-centric approaches imposed an extra effort to the users in order to capture required information. In this research, we are describing a method for the analysis of implicit users’ interactions with a web video player, such as pause, play, and thirty-seconds skip or rewind. The results of our experiments indicated that even the simple user heuristic of local maxima might effectively detect the same video-events, as indicated manually. Notably, the proposed technique was more accurate in the detection of events that have a short duration, because those events motivated increased user interaction in video hot-spots. The findings of this research provide evidence that we might be able to infer semantics about a piece of unstructured data just from the way people actually use it. 相似文献
18.
This paper
focuses on human behavior recognition where the main problem is to bridge the
semantic gap between the analogue observations of the real world and the
symbolic world of human interpretation. For that, a fusion architecture based
on the Transferable Belief Model framework is proposed and applied to action
recognition of an athlete in video sequences of athletics meeting with moving
camera. Relevant features are extracted from videos, based on both the camera
motion analysis and the tracking of particular points on the athlete’s
silhouette. Some models of interpretation are used to link the numerical
features to the symbols to be recognized, which are running, jumping and
falling actions. A Temporal Belief Filter is then used to improve the
robustness of action recognition. The proposed approach demonstrates good
performance when tested on real videos of athletics sports videos (high jumps,
pole vaults, triple jumps and long jumps) acquired by a moving camera and
different view angles. The proposed system is also compared to Bayesian
Networks.
Emmanuel Ramasso is currently pursuing a PhD at GIPSA-lab, Department of Images and Signal located in Grenoble, France. He received both his BS degree in Electrical Engineering and Control Theory and his MS degree in Computer Science in 2004 from Ecole Polytechnique de Savoie (Annecy, France). His research interests include Sequential Data Analysis, Transferable Belief Model, Fusion, Image and Videos Analysis and Human Motion Analysis. Costas Panagiotakis was born in Heraklion, Crete, Greece in 1979. He received the BS and the MS degrees in Computer Science from University of Crete in 2001 and 2003, respectively. Currently, he is a PhD candidate in Computer Science at University of Crete. His research interests include computer vision, image and video analysis, motion analysis and synthesis, computer graphics, computational geometry and signal processing. Denis Pellerin received the Engineering degree in Electrical Engineering in 1984 and the PhD degree in 1988 from the Institut National des Sciences Appliquées, Lyon, France. He is currently a full Professor at the Université Joseph Fourier, Grenoble, France. His research interests include visual perception, motion analysis in image sequences, video analysis, and indexing. Michèle Rombaut is currently a full Professor at the Université Joseph Fourier, Grenoble, France. Her research interests include Data Fusion, Sequential Data Analysis, High Level Interpretation, Image and Video Analysis. 相似文献
M. RombautEmail: |
Emmanuel Ramasso is currently pursuing a PhD at GIPSA-lab, Department of Images and Signal located in Grenoble, France. He received both his BS degree in Electrical Engineering and Control Theory and his MS degree in Computer Science in 2004 from Ecole Polytechnique de Savoie (Annecy, France). His research interests include Sequential Data Analysis, Transferable Belief Model, Fusion, Image and Videos Analysis and Human Motion Analysis. Costas Panagiotakis was born in Heraklion, Crete, Greece in 1979. He received the BS and the MS degrees in Computer Science from University of Crete in 2001 and 2003, respectively. Currently, he is a PhD candidate in Computer Science at University of Crete. His research interests include computer vision, image and video analysis, motion analysis and synthesis, computer graphics, computational geometry and signal processing. Denis Pellerin received the Engineering degree in Electrical Engineering in 1984 and the PhD degree in 1988 from the Institut National des Sciences Appliquées, Lyon, France. He is currently a full Professor at the Université Joseph Fourier, Grenoble, France. His research interests include visual perception, motion analysis in image sequences, video analysis, and indexing. Michèle Rombaut is currently a full Professor at the Université Joseph Fourier, Grenoble, France. Her research interests include Data Fusion, Sequential Data Analysis, High Level Interpretation, Image and Video Analysis. 相似文献
19.
目的 面向实时、准确、鲁棒的人体运动分析应用需求,从运动分析的特征提取和运动建模问题出发,本文人体运动分析的实例学习方法。方法 在构建人体姿态实例库基础上,首先,采用运动检测方法得到视频每帧的人体轮廓;其次,基于形状上下文轮廓匹配方法,从实例库中检索得到每帧视频的候选姿态集;最后,通过统计建模和转移概率建模实现人体运动分析。结果 对步行、跑步、跳跃等测试视频进行实验,基于轮廓的形状上下文特征表示和匹配方法具有良好的表达能力;本文方法运动分析结果,关节夹角平均误差在5°左右,与其他算法相比,有效提高了运动分析的精度。结论 本文人体运动分析的实例学习方法,能有效分析单目视频中的人体运动,并克服了映射的深度歧义,对运动的视角变化鲁棒,具有良好的计算效率和精度。 相似文献
20.
目的 视频中的人体行为识别技术对智能安防、人机协作和助老助残等领域的智能化起着积极的促进作用,具有广泛的应用前景。但是,现有的识别方法在人体行为时空特征的有效利用方面仍存在问题,识别准确率仍有待提高。为此,本文提出一种在空间域使用深度学习网络提取人体行为关键语义信息并在时间域串联分析从而准确识别视频中人体行为的方法。方法 根据视频图像内容,剔除人体行为重复及冗余信息,提取最能表达人体行为变化的关键帧。设计并构造深度学习网络,对图像语义信息进行分析,提取表达重要语义信息的图像关键语义区域,有效描述人体行为的空间信息。使用孪生神经网络计算视频帧间关键语义区域的相关性,将语义信息相似的区域串联为关键语义区域链,将关键语义区域链的深度学习特征计算并融合为表达视频中人体行为的特征,训练分类器实现人体行为识别。结果 使用具有挑战性的人体行为识别数据集UCF (University of Central Florida)50对本文方法进行验证,得到的人体行为识别准确率为94.3%,与现有方法相比有显著提高。有效性验证实验表明,本文提出的视频中关键语义区域计算和帧间关键语义区域相关性计算方法能够有效提高人体行为识别的准确率。结论 实验结果表明,本文提出的人体行为识别方法能够有效利用视频中人体行为的时空信息,显著提高人体行为识别准确率。 相似文献