共查询到20条相似文献,搜索用时 15 毫秒
1.
Multimedia Tools and Applications - Human action recognition in realistic videos is an important and challenging task. Recent studies demonstrate that multi-feature fusion can significantly improve... 相似文献
2.
Gutoski Matheus Lazzaretti André Eugênio Lopes Heitor Silvério 《Neural computing & applications》2021,33(4):1207-1220
Neural Computing and Applications - Human action recognition (HAR) is a topic widely studied in computer vision and pattern recognition. Despite the success of recent models for this issue, most of... 相似文献
3.
Content-based video retrieval is an increasingly popular research field, in large part due to the quickly growing catalogue of multimedia data to be found online. Even though a large portion of this data concerns humans, however, retrieval of human actions has received relatively little attention. Presented in this paper is a video retrieval system that can be used to perform a content-based query on a large database of videos very efficiently. Furthermore, it is shown that by using ABRS-SVM, a technique for incorporating Relevance feedback (RF) on the search results, it is possible to quickly achieve useful results even when dealing with very complex human action queries, such as in Hollywood movies. 相似文献
4.
5.
This paper describes a fully automatic content-based approach for browsing and retrieval of MPEG-2 compressed video. The first step of the approach is the detection of shot boundaries based on motion vectors available from the compressed video stream. The next step involves the construction of a scene tree from the shots obtained earlier. The scene tree is shown to capture some semantic information as well as to provide a construct for hierarchical browsing of compressed videos. Finally, we build a new model for video similarity based on global as well as local motion associated with each node in the scene tree. To this end, we propose new approaches to camera motion and object motion estimation. The experimental results demonstrate that the integration of the above techniques results in an efficient framework for browsing and searching large video databases. 相似文献
6.
Action recognition on large categories of unconstrained videos taken from the web is a very challenging problem compared to datasets like KTH (6 actions), IXMAS (13 actions), and Weizmann (10 actions). Challenges like camera motion, different viewpoints, large interclass variations, cluttered background, occlusions, bad illumination conditions, and poor quality of web videos cause the majority of the state-of-the-art action recognition approaches to fail. Also, an increased number of categories and the inclusion of actions with high confusion add to the challenges. In this paper, we propose using the scene context information obtained from moving and stationary pixels in the key frames, in conjunction with motion features, to solve the action recognition problem on a large (50 actions) dataset with videos from the web. We perform a combination of early and late fusion on multiple features to handle the very large number of categories. We demonstrate that scene context is a very important feature to perform action recognition on very large datasets. The proposed method does not require any kind of video stabilization, person detection, or tracking and pruning of features. Our approach gives good performance on a large number of action categories; it has been tested on the UCF50 dataset with 50 action categories, which is an extension of the UCF YouTube Action (UCF11) dataset containing 11 action categories. We also tested our approach on the KTH and HMDB51 datasets for comparison. 相似文献
7.
Multimedia Tools and Applications - One of the most challenging tasks in computer vision is human action recognition. The recent development of depth sensors has created new opportunities in this... 相似文献
8.
目的 面向实时、准确、鲁棒的人体运动分析应用需求,从运动分析的特征提取和运动建模问题出发,本文人体运动分析的实例学习方法。方法 在构建人体姿态实例库基础上,首先,采用运动检测方法得到视频每帧的人体轮廓;其次,基于形状上下文轮廓匹配方法,从实例库中检索得到每帧视频的候选姿态集;最后,通过统计建模和转移概率建模实现人体运动分析。结果 对步行、跑步、跳跃等测试视频进行实验,基于轮廓的形状上下文特征表示和匹配方法具有良好的表达能力;本文方法运动分析结果,关节夹角平均误差在5°左右,与其他算法相比,有效提高了运动分析的精度。结论 本文人体运动分析的实例学习方法,能有效分析单目视频中的人体运动,并克服了映射的深度歧义,对运动的视角变化鲁棒,具有良好的计算效率和精度。 相似文献
9.
10.
基于骨骼信息的人体行为识别旨在从输入的包含一个或多个行为的骨骼序列中,正确地分析出行为的种类,是计算机视觉领域的研究热点之一。与基于图像的人体行为识别方法相比,基于骨骼信息的人体行为识别方法不受背景、人体外观等干扰因素的影响,具有更高的准确性、鲁棒性和计算效率。针对基于骨骼信息的人体行为识别方法的重要性和前沿性,对其进行全面和系统的总结分析具有十分重要的意义。本文首先回顾了9个广泛应用的骨骼行为识别数据集,按照数据收集视角的差异将它们分为单视角数据集和多视角数据集,并着重探讨了不同数据集的特点和用法。其次,根据算法所使用的基础网络,将基于骨骼信息的行为识别方法分为基于手工制作特征的方法、基于循环神经网络的方法、基于卷积神经网络的方法、基于图卷积网络的方法以及基于Transformer的方法,重点阐述分析了这些方法的原理及优缺点。其中,图卷积方法因其强大的空间关系捕捉能力而成为目前应用最为广泛的方法。采用了全新的归纳方法,对图卷积方法进行了全面综述,旨在为研究人员提供更多的思路和方法。最后,从8个方面总结现有方法存在的问题,并针对性地提出工作展望。 相似文献
11.
Waheed Moomina Hussain Shahid Khan Arif Ali Ahmed Mansoor Ahmad Bashir 《Multimedia Tools and Applications》2020,79(33-34):24347-24365
Multimedia Tools and Applications - In the context of video-based image classification, image annotation plays a vital role in improving the image classification decision based on it’s... 相似文献
12.
Visual appearance-based person retrieval is a challenging problem in surveillance. It uses attributes like height, cloth color, cloth type and gender to describe a human. Such attributes are known as soft biometrics. This paper proposes person retrieval from surveillance video using height, torso cloth type, torso cloth color and gender. The approach introduces an adaptive torso patch extraction and bounding box regression to improve the retrieval. The algorithm uses fine-tuned Mask R-CNN and DenseNet-169 for person detection and attribute classification respectively. The performance is analyzed on AVSS 2018 challenge II dataset and it achieves 11.35% improvement over state-of-the-art based on average Intersection over Union measure. 相似文献
13.
14.
Giannakakis Giorgos Koujan Mohammad Rami Roussos Anastasios Marias Kostas 《Pattern Analysis & Applications》2022,25(3):521-535
Pattern Analysis and Applications - Stress conditions are manifested in different human body’s physiological processes and the human face. Facial expressions are modelled consistently through... 相似文献
15.
16.
17.
18.
《Journal of Network and Computer Applications》2002,25(2):109-127
The increasing use of multimedia streams nowadays necessitates the development of efficient and effective methodologies for manipulating databases storing these streams. Moreover, content-based access to multimedia databases requires in its first stage to parse the video stream into separate shots then apply a method to summarize the huge amount of data involved in each shot. This work proposes a new paradigm capable of robustly and effectively analyzing the compressed MPEG video data. First, an abstract representation of the compressed MPEG video stream is extracted and used as input to a neural network module (NNM) that performs the shot detection task. Second, we propose two adaptive algorithms to effectively select key frames from segmented video shots produced by the segmentation stage. Both algorithms apply a two-level adaptation mechanism in which the first level is based on the dimension of the input video file while the second level is performed on a shot-by-shot basis in order to account for the fact that different shots have different levels of activity. Experimental results show the efficiency and robustness of the proposed system in detecting shot boundaries and flashlights occurring within shots and in selecting the near optimal set of key frames (KFs) required to represent each shot. 相似文献
19.
Rajkumar Kannan Frederic Andres Christian Guetl 《Multimedia Tools and Applications》2010,46(2-3):545-572
A well-annotated dance media is an essential part of a nation’s identity, transcending cultural and language barriers. Many dance video archives suffer from problems concerning authoring and access, because of the complex spatio-temporal relationships that exist between the dancers in terms of movements of their body parts and the emotions expressed by them in a dance. This paper presents a system named DanVideo for semi-automatic authoring and access to dance archives. DanVideo provides methods of annotation and authoring and retrieval tools for choreographers, dancers, and students. We demonstrate how dance media can be semantically annotated and how this information can be used for the retrieval of the dance video semantics. In particular, DanVideo offers an MPEG-7 based semi-automatic authoring tool that takes dance video annotations generated by dance experts and produces MPEG-7 metadata. DanVideo also has a search engine that takes users’ queries and retrieves dance semantics from metadata arranged using tree-embedding technique and based on spatial, temporal and spatio-temporal features of dancers. The search engine also leverages a domain-specific ontology to process knowledge-based queries. We have assessed the dance-video queries and semantic annotations in terms of precision, recall, and fidelity. 相似文献
20.
This paper
focuses on human behavior recognition where the main problem is to bridge the
semantic gap between the analogue observations of the real world and the
symbolic world of human interpretation. For that, a fusion architecture based
on the Transferable Belief Model framework is proposed and applied to action
recognition of an athlete in video sequences of athletics meeting with moving
camera. Relevant features are extracted from videos, based on both the camera
motion analysis and the tracking of particular points on the athlete’s
silhouette. Some models of interpretation are used to link the numerical
features to the symbols to be recognized, which are running, jumping and
falling actions. A Temporal Belief Filter is then used to improve the
robustness of action recognition. The proposed approach demonstrates good
performance when tested on real videos of athletics sports videos (high jumps,
pole vaults, triple jumps and long jumps) acquired by a moving camera and
different view angles. The proposed system is also compared to Bayesian
Networks.
Emmanuel Ramasso is currently pursuing a PhD at GIPSA-lab, Department of Images and Signal located in Grenoble, France. He received both his BS degree in Electrical Engineering and Control Theory and his MS degree in Computer Science in 2004 from Ecole Polytechnique de Savoie (Annecy, France). His research interests include Sequential Data Analysis, Transferable Belief Model, Fusion, Image and Videos Analysis and Human Motion Analysis. Costas Panagiotakis was born in Heraklion, Crete, Greece in 1979. He received the BS and the MS degrees in Computer Science from University of Crete in 2001 and 2003, respectively. Currently, he is a PhD candidate in Computer Science at University of Crete. His research interests include computer vision, image and video analysis, motion analysis and synthesis, computer graphics, computational geometry and signal processing. Denis Pellerin received the Engineering degree in Electrical Engineering in 1984 and the PhD degree in 1988 from the Institut National des Sciences Appliquées, Lyon, France. He is currently a full Professor at the Université Joseph Fourier, Grenoble, France. His research interests include visual perception, motion analysis in image sequences, video analysis, and indexing. Michèle Rombaut is currently a full Professor at the Université Joseph Fourier, Grenoble, France. Her research interests include Data Fusion, Sequential Data Analysis, High Level Interpretation, Image and Video Analysis. 相似文献
M. RombautEmail: |
Emmanuel Ramasso is currently pursuing a PhD at GIPSA-lab, Department of Images and Signal located in Grenoble, France. He received both his BS degree in Electrical Engineering and Control Theory and his MS degree in Computer Science in 2004 from Ecole Polytechnique de Savoie (Annecy, France). His research interests include Sequential Data Analysis, Transferable Belief Model, Fusion, Image and Videos Analysis and Human Motion Analysis. Costas Panagiotakis was born in Heraklion, Crete, Greece in 1979. He received the BS and the MS degrees in Computer Science from University of Crete in 2001 and 2003, respectively. Currently, he is a PhD candidate in Computer Science at University of Crete. His research interests include computer vision, image and video analysis, motion analysis and synthesis, computer graphics, computational geometry and signal processing. Denis Pellerin received the Engineering degree in Electrical Engineering in 1984 and the PhD degree in 1988 from the Institut National des Sciences Appliquées, Lyon, France. He is currently a full Professor at the Université Joseph Fourier, Grenoble, France. His research interests include visual perception, motion analysis in image sequences, video analysis, and indexing. Michèle Rombaut is currently a full Professor at the Université Joseph Fourier, Grenoble, France. Her research interests include Data Fusion, Sequential Data Analysis, High Level Interpretation, Image and Video Analysis. 相似文献