共查询到20条相似文献,搜索用时 906 毫秒
1.
2.
Jinyoung Moon Junho Jin Yongjin Kwon Kyuchang Kang Jongyoul Park Kyoung Park 《ETRI Journal》2017,39(4):502-513
For video understanding, namely analyzing who did what in a video, actions along with objects are primary elements. Most studies on actions have handled recognition problems for a well‐trimmed video and focused on enhancing their classification performance. However, action detection, including localization as well as recognition, is required because, in general, actions intersect in time and space. In addition, most studies have not considered extensibility for a newly added action that has been previously trained. Therefore, proposed in this paper is an extensible hierarchical method for detecting generic actions, which combine object movements and spatial relations between two objects, and inherited actions, which are determined by the related objects through an ontology and rule based methodology. The hierarchical design of the method enables it to detect any interactive actions based on the spatial relations between two objects. The method using object information achieves an F‐measure of 90.27%. Moreover, this paper describes the extensibility of the method for a new action contained in a video from a video domain that is different from the dataset used. 相似文献
3.
4.
定性空间推理的RCC理论能够比较全面地表达空间对象的拓扑关系.但由于不支持混合维空间对象,RCC不能直接用于空间查询.本文扩展了RCC,建立了能表达混合维空间关系,更适合于空间查询的空间关系模型MRCC.该模型采用GIS中常用的数据结构,支持混合维空间对象的全拓扑(mereotopology)关系,并根据混合维对象的特点建立了与维数无关的统一的方向、距离关系.利用该模型扩充了标准SQL语言的关系代数,实现了混合维定性空间查询语言MQS-SQL. 相似文献
5.
Jinchang Ren Jianmin Jiang Yue Feng 《Journal of Visual Communication and Image Representation》2010,21(8):930-938
In this paper, we present a novel method for content adaptation and video summarization fully implemented in compressed-domain. Firstly, summarization of generic videos is modeled as the process of extracted human objects under various activities/events. Accordingly, frames are classified into five categories via fuzzy decision including shot changes (cut and gradual transitions), motion activities (camera motion and object motion) and others by using two inter-frame measurements. Secondly, human objects are detected using Haar-like features. With the detected human objects and attained frame categories, activity levels for each frame are determined to adapt with video contents. Continuous frames belonging to same category are grouped to form one activity entry as content of interest (COI) which will convert the original video into a series of activities. An overall adjustable quota is used to control the size of generated summarization for efficient streaming purpose. Upon this quota, the frames selected for summarization are determined by evenly sampling the accumulated activity levels for content adaptation. Quantitative evaluations have proved the effectiveness and efficiency of our proposed approach, which provides a more flexible and general solution for this topic as domain-specific tasks such as accurate recognition of objects can be avoided. 相似文献
6.
《Journal of Visual Communication and Image Representation》2014,25(5):1018-1030
For a variety of applications such as video surveillance and event annotation, the spatial–temporal boundaries between video objects are required for annotating visual content with high-level semantics. In this paper, we define spatial–temporal sampling as a unified process of extracting video objects and computing their spatial–temporal boundaries using a learnt video object model. We first provide a computational approach for learning an optimal key-object codebook sequence from a set of training video clips to characterize the semantics of the detected video objects. Then, dynamic programming with the learnt codebook sequence is used to locate the video objects with spatial–temporal boundaries in a test video clip. To verify the performance of the proposed method, a human action detection and recognition system is constructed. Experimental results show that the proposed method gives good performance on several publicly available datasets in terms of detection accuracy and recognition rate. 相似文献
7.
《Signal Processing: Image Communication》2007,22(7-8):669-679
In conventional video production, logotypes are used to convey information about content originator or the actual video content. Logotypes contain information that is critical to infer genre, class and other important semantic features of video. This paper presents a framework to support semantic-based video classification and annotation. The backbone of the proposed framework is a technique for logotype extraction and recognition. The method consists of two main processing stages. The first stage performs temporal and spatial segmentation by calculating the minimal luminance variance region (MVLR) for a set of frames. Non-linear diffusion filters (NLDF) are used at this stage to reduce noise in the shape of the logotype. In the second stage, logotype classification and recognition are achieved. The earth mover's distance (EMD) is used as a metric to decide if the detected MLVR belongs to one of the following logotype categories: learned or candidate. Learned logos are semantically annotated shapes available in the database. The semantic characterization of such logos is obtained through an iterative learning process. Candidate logos are non-annotated shapes extracted during the first processing stage. They are assigned to clusters grouping different instances of logos of similar shape. Using these clusters, false logotypes are removed and different instances of the same logo are averaged to obtain a unique prototype representing the underlying noisy cluster. Experiments involving several hours of MPEG video and around 1000 of candidate logotypes have been carried out in order to show the robustness of both detection and classification processes. 相似文献
8.
9.
A self-organizing approach to background subtraction for visual surveillance applications. 总被引:15,自引:0,他引:15
Detection of moving objects in video streams is the first relevant step of information extraction in many computer vision applications. Aside from the intrinsic usefulness of being able to segment video streams into moving and background components, detecting moving objects provides a focus of attention for recognition, classification, and activity analysis, making these later steps more efficient. We propose an approach based on self organization through artificial neural networks, widely applied in human image processing systems and more generally in cognitive science. The proposed approach can handle scenes containing moving backgrounds, gradual illumination variations and camouflage, has no bootstrapping limitations, can include into the background model shadows cast by moving objects, and achieves robust detection for different types of videos taken with stationary cameras. We compare our method with other modeling techniques and report experimental results, both in terms of detection accuracy and in terms of processing speed, for color video sequences that represent typical situations critical for video surveillance systems. 相似文献
10.
《Signal Processing: Image Communication》2005,20(3):205-218
A VideoGIS system aims at combining geo-referenced video information with traditional geographic information in order to provide a more comprehensive understanding over a spatial location. Video data have been used with geographic information in some projects to facilitate a better understanding of the spatial objects of interest. This paper presents an on-going VideoGIS project, in which scalable geo-referenced video and geographic information (GI) are transmitted to GPS-guided vehicles. The hypermedia, which contains cross-referenced video and GI, are organized in a scalable (layered) fashion. The remote users can request, through 3G mobile devices, the abundant information related to the objects of interest, while adapting to heterogeneous network condition and local CPU usage. Available bandwidth estimation technique is used in the adaptive video transmission. 相似文献
11.
12.
Driver activity recognition by learning spatiotemporal features of pose and human object interaction
Detecting hazardous activity during driving can be useful in curbing roadside accidents. Existing techniques utilizing image based features for encoding such activity can sometimes misclassify crucial scenarios. One particular work by Zhao et al. (2013 [1], 2013 [2], 2011 [3]) suggests an image based feature set that encodes the driver’s pose, which is categorized into one of four activities. We bring more clarity in understanding the activity by proposing a richer, video based feature set that adeptly exploits spatiotemporal information of the driver. Our feature set encodes the driver’s pose, crucial variations in pose and interactions with objects within the vehicle. The feature set is tested on our newly created dataset since the ones used in literature are not publicly available. Our proposed feature set captures a larger number of activities and using standard classifiers and benchmarks it has shown significant improvements over the existing ones. 相似文献
13.
14.
Zhongjie ZhuAuthor Vitae Yuer Wang Author Vitae 《AEUE-International Journal of Electronics and Communications》2012,66(3):249-254
Segmentation of moving objects in video sequences is a basic task in many applications. However, it is still challenging due to the semantic gap between the low-level visual features and the high-level human interpretation of video semantics. Compared with segmentation of fast moving objects, accurate and perceptually consistent segmentation of slowly moving objects is more difficult. In this paper, a novel hybrid algorithm is proposed for segmentation of slowly moving objects in video sequence aiming to acquire perceptually consistent results. Firstly, the temporal information of the differences among multiple frames is employed to detect initial moving regions. Then, the Gaussian mixture model (GMM) is employed and an improved expectation maximization (EM) algorithm is introduced to segment a spatial image into homogeneous regions. Finally, the results of motion detection and spatial segmentation are fused to extract final moving objects. Experiments are conducted and provide convincing results. 相似文献
15.
16.
17.
在基于视频图像的动作识别中,由于固定视角相机所获取的不同动作视频存在视角差异,会造成识别准确率降低等问题。使用多视角视频图像是提高识别准确率的方法之一,提出基于三维残差网络(3D Residual Network,3D ResNet)和长短时记忆(Long Short-term Memory,LSTM)网络的多视角人体动作识别算法,通过3D ResNet学习各视角动作序列的融合时空特征,利用多层LSTM网络继续学习视频流中的长期活动序列表示并深度挖掘视频帧序列之间的时序信息。在NTU RGB+D 120数据集上的实验结果表明,该模型对多视角视频序列动作识别的准确率可达83.2%。 相似文献
18.
《Journal of Visual Communication and Image Representation》2014,25(4):719-726
We propose a framework, consisting of several algorithms to recognize human activities that involve manipulating objects. Our proposed algorithm identifies objects being manipulated and models high-level tasks being performed accordingly. Realistic settings for such tasks pose several problems for computer vision, including sporadic occlusion by subjects, non-frontal poses, and objects with few local features. We show how size and segmentation information derived from depth data can address these challenges using simple and fast techniques. In particular, we show how to robustly and without supervision find the manipulating hand, properly detect/recognize objects and properly use the temporal information to fill in the gaps between sporadically detected objects, all through careful inclusion of depth cues. We evaluate our approach on a challenging dataset of 12 kitchen tasks that involve 24 objects performed by 2 subjects. The entire framework yields 82%/84% precision (74%/83%recall) for task/object recognition. Our techniques outperform the state-of-the-art significantly in activity/object recognition. 相似文献
19.
We implement a video object segmentation system that integrates the novel concept of Voronoi Order with existing surface optimization techniques to support the MPEG-4 functionality of object-addressable video content in the form of video objects. The major enabling technology for the MPEG-4 standard are systems that compute video object segmentation, i.e., the extraction of video objects from a given video sequence. Our surface optimization formulation describes the video object segmentation problem in the form of an energy function that integrates many visual processing techniques. By optimizing this surface, we balance visual information against predictions of models with a priori information and extract video objects from a video sequence. Since the global optimization of such an energy function is still an open problem, we use Voronoi Order to decompose our formulation into a tractable optimization via dynamic programming within an iterative framework. In conclusion, we show the results of the system on the MPEG-4 test sequences, introduce a novel objective measure, and compare results against those that are hand-segmented by the MPEG-4 committee. 相似文献
20.