首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Spatio-temporal composition and indexing for large multimedia applications   总被引:1,自引:0,他引:1  
Multimedia applications usually involve a large number of multimedia objects (texts, images, sounds, etc.). An important issue in this context is the specification of spatial and temporal relationships among these objects. In this paper we define such a model, based on a set of spatial and temporal relationships between objects participating in multimedia applications. Our work exploits existing approaches for spatial and temporal relationships. We extend these relationships in order to cover the specific requirements of multimedia applications and we integrate the results in a uniform framework for spatio-temporal composition representation. Another issue is the efficient handling of queries related to the spatio-temporal relationships among the objects during the authoring process. Such queries may be very costly and appropriate indexing schemes are needed so as to handle them efficiently. We propose efficient such schemes, based on multidimensional (spatial) data structures, for large multimedia applications that involve thousands of objects. Evaluation models of the proposed schemes are also presented, as well as hints for the selection of the most appropriate one, according to the multimedia author's requirements.  相似文献   

2.
The geographic application domain includes important information such as design plans, record drawings, photographs, and video data records. The corresponding geographic information systems (GISs) should maintain a specific model for each geographic data modality such as geographic video model for video records. Real-time 3-D geographic information systems provide comprehensive interface to complex and dynamic databases and truly immersive capability for visualizing geographic data. In cases, where information about location of geographic objects is needed at different moments of time, a GIS should process video data that is directly manipulated and retrieved through representation of its spatio-temporal characteristics. In this context, the most advanced multimedia form—digital video, finds an efficient application in GIS for versatile specification of geographic data. In this paper, a model for spatial data evolving with time is introduced in the context of video data manipulation. We designed a model that represents the spatio-temporal continuum among geographic objects in geographic video sequences, or digital video. The model developed here was motivated by the requirements for manipulating, managing, and analyzing geographic data for the necessities of infrastructure management, urban and regional planning, hazard prevention and management, transportation networks, vehicles routing, etc. This model allows the important issues for GIS such as conditions of adjacency (what is next to what), containment (what is enclosed by what), and proximity (how close one geographic object is to another) to be determined. Our model describes the spatial relationships among objects for each key frame in a given video scene, and the temporal relationships of the temporal intervals measuring the validity duration of the spatial relationships spanning over the given key frame. One of the main GIS issues—distance estimation, is solved as quantitative metrics of geographic objects in digital video are easily and precisely specified. This model is a basis for annotation of raw video for subsequent use in geographic video databases and digital libraries that provide access to and efficient storage of large volume of geographic data.  相似文献   

3.
The video databases have become popular in various areas due to the recent advances in technology. Video archive systems need user-friendly interfaces to retrieve video frames. In this paper, a user interface based on natural language processing (NLP) to a video database system is described. The video database is based on a content-based spatio-temporal video data model. The data model is focused on the semantic content which includes objects, activities, and spatial properties of objects. Spatio-temporal relationships between video objects and also trajectories of moving objects can be queried with this data model. In this video database system, a natural language interface enables flexible querying. The queries, which are given as English sentences, are parsed using link parser. The semantic representations of the queries are extracted from their syntactic structures using information extraction techniques. The extracted semantic representations are used to call the related parts of the underlying video database system to return the results of the queries. Not only exact matches but similar objects and activities are also returned from the database with the help of the conceptual ontology module. This module is implemented using a distance-based method of semantic similarity search on the semantic domain-independent ontology, WordNet.  相似文献   

4.
A video data model that supports spatio-temporal querying in videos is presented. The data model is focused on the semantic content of video streams. Objects, events, activities, and spatial properties of objects are main interests of the model. The data model enables the user to query fuzzy spatio-temporal relationships between video objects and also trajectories of moving objects. A prototype of the proposed model has been implemented.  相似文献   

5.
This paper presents a symbolic formalism for modeling and retrieving video data via the moving objects contained in the video images. The model integrates the representations of individual moving objects in a scene with the time-varying relationships between them by incorporating both the notions of object tracks and temporal sequences of PIRs (projection interval relationships). The model is supported by a set of operations which form the basis of a moving object algebra. This algebra allows one to retrieve scenes and information from scenes by specifying both spatial and temporal properties of the objects involved. It also provides operations to create new scenes from existing ones. A prototype implementation is described which allows queries to be specified either via an animation sketch or using the moving object algebra.  相似文献   

6.
K  Namitha  Narayanan  Athi 《Multimedia Tools and Applications》2020,79(43-44):32331-32360

Video synopsis is an effective solution for fast browsing and retrieval of long surveillance videos. It aims to shorten long video sequences into its equivalent compact video representation by rearranging the video events in the temporal domain and/or spatial domain. Conventional video synopsis methods focus on reducing the collisions between tubes and maintaining their chronological order, which may alter the original interactions between tubes due to improper tube rearrangement. In this paper, we present an approach to preserve the relationships among tubes (tracks of moving objects) of the original video in the synopsis video. First, a recursive tube-grouping algorithm is proposed to determine the behavior interactions among tubes in a video and group the related tubes together to form tube sets. Second, to preserve the discovered relationships, a spatio-temporal cube voting algorithm is proposed. This cube voting method optimally rearranges the tube sets in the synopsis video, minimizing false collisions between tubes. Third, a method to estimate the duration of the synopsis video is proposed based on an entropy measure of tube collisions. The extensive experimental results demonstrate that the proposed video synopsis framework condenses videos by preserving the original tube interactions and reducing false tube collisions.

  相似文献   

7.
A query optimizer requires cost models to calculate the costs of various access plans for a query. An effective method to estimate the number of disk (or page) accesses for spatio-temporal queries has not yet been proposed. The TPR-tree is an efficient index that supports spatio-temporal queries for moving objects. Existing cost models for the spatial index such as the R-tree do not accurately estimate the number of disk accesses for spatio-temporal queries using the TPR-tree, because they do not consider the future locations of moving objects, which change continuously as time passes.In this paper, we propose an efficient cost model for spatio-temporal queries to solve this problem. We present analytical formulas which accurately calculate the number of disk accesses for spatio-temporal queries. Extensive experimental results show that our proposed method accurately estimates the number of disk accesses over various queries to spatio-temporal data combining real-life spatial data and synthetic temporal data. To evaluate the effectiveness of our method, we compared our spatio-temporal cost model (STCM) with an existing spatial cost model (SCM). The application of the existing SCM has the average error ratio from 52% to 93%, whereas our STCM has the average error ratio from 11% to 32%.  相似文献   

8.
In this paper, we propose a multi-level abstraction mechanism for capturing the spatial and temporal semantics associated with various objects in an input image or in a sequence of video frames. This abstraction can manifest itself effectively in conceptualizing events and views in multimedia data as perceived by individual users. The objective is to provide an efficient mechanism for handling content-based queries, with the minimum amount of processing performed on raw data during query evaluation. We introduce a multi-level architecture for video data management at different levels of abstraction. The architecture facilitates a multi-level indexing/searching mechanism. At the finest level of granularity, video data can be indexed based on mere appearance of objects and faces. For management of information at higher levels of abstractions, an object-oriented paradigm is proposed which is capable of supporting domain specific views.  相似文献   

9.
Motion, as a feature of video that changes in temporal sequences, is crucial to visual understanding. The powerful video representation and extraction models are typically able to focus attention on motion features in challenging dynamic environments to complete more complex video understanding tasks. However, previous approaches discriminate mainly based on similar features in the spatial or temporal domain, ignoring the interdependence of consecutive video frames. In this paper, we propose the motion sensitive self-supervised collaborative network, a video representation learning framework that exploits a pretext task to assist feature comparison and strengthen the spatiotemporal discrimination power of the model. Specifically, we first propose the motion-aware module, which extracts consecutive motion features from the spatial regions by frame difference. The global–local contrastive module is then introduced, with context and enhanced video snippets being defined as appropriate positive samples for a broader feature similarity comparison. Finally, we introduce the snippet operation prediction module, which further assists contrastive learning to obtain more reliable global semantics by sensing changes in continuous frame features. Experimental results demonstrate that our work can effectively extract robust motion features and achieve competitive performance compared with other state-of-the-art self-supervised methods on downstream action recognition and video retrieval tasks.  相似文献   

10.
In video database systems, one of the most important methods for discriminating the videos is by using the objects and the perception of spatial and temporal relations that exist between objects in the desired videos. In this paper, we propose a new spatio-temporal knowledge representation called 3D C-string. The knowledge structure of 3D C-string, extended from the 2D C+-string, uses the projections of objects to represent spatial and temporal relations between the objects in a video. Moreover, it can keep track of the motions and size changes of the objects in a video. The string generation and video reconstruction algorithms for the 3D C-string representation of video objects are also developed. By introducing the concept of the template objects and nearest former objects, the string generated by the string generation algorithm is unique for a given video and the video reconstructed from a given 3D C-string is unique too. This approach can provide us an easy and efficient way to retrieve, visualize and manipulate video objects in video database systems. Finally, some experiments are performed to show the performance of the proposed algorithms.  相似文献   

11.
Several salient-object-based data models have been proposed to model video data. However, none of them addresses the development of an index structure to efficiently handle salient-object-based queries. There are several indexing schemes that have been proposed for spatiotemporal relationships among objects, and they are used to optimize timestamp and interval queries, which are rarely used in video databases. Moreover, these index structures are designed without consideration of the granularity levels of constraints on salient objects and the characteristics of video data. In this paper, we propose a multilevel index structure (MINDEX) to efficiently handle the salient-object-based queries with different levels of constraints. We present experimental results showing the performance of different methods of MINDEX construction.  相似文献   

12.
Spatio-temporal segmentation based on region merging   总被引:2,自引:0,他引:2  
This paper proposes a technique for spatio-temporal segmentation to identify the objects present in the scene represented in a video sequence. This technique processes two consecutive frames at a time. A region-merging approach is used to identify the objects in the scene. Starting from an oversegmentation of the current frame, the objects are formed by iteratively merging regions together. Regions are merged based on their mutual spatio-temporal similarity. We propose a modified Kolmogorov-Smirnov test for estimating the temporal similarity. The region-merging process is based on a weighted, directed graph. Two complementary graph-based clustering rules are proposed, namely, the strong rule and the weak rule. These rules take advantage of the natural structures present in the graph. Experimental results on different types of scenes demonstrate the ability of the proposed technique to automatically partition the scene into its constituent objects  相似文献   

13.
Hierarchical database for a multi-camera surveillance system   总被引:1,自引:0,他引:1  
This paper presents a framework for event detection and video content analysis for visual surveillance applications. The system is able to coordinate the tracking of objects between multiple camera views, which may be overlapping or non-overlapping. The key novelty of our approach is that we can automatically learn a semantic scene model for a surveillance region, and have defined data models to support the storage of tracking data with different layers of abstraction into a surveillance database. The surveillance database provides a mechanism to generate video content summaries of objects detected by the system across the entire surveillance region in terms of the semantic scene model. In addition, the surveillance database supports spatio-temporal queries, which can be applied for event detection and notification applications.  相似文献   

14.
基于深度学习的视频超分辨率方法主要关注视频帧内和帧间的时空关系,但以往的方法在视频帧的特征对齐和融合方面存在运动信息估计不精确、特征融合不充分等问题。针对这些问题,采用反向投影原理并结合多种注意力机制和融合策略构建了一个基于注意力融合网络(AFN)的视频超分辨率模型。首先,在特征提取阶段,为了处理相邻帧和参考帧之间的多种运动,采用反向投影结构来获取运动信息的误差反馈;然后,使用时间、空间和通道注意力融合模块来进行多维度的特征挖掘和融合;最后,在重建阶段,将得到的高维特征经过卷积重建出高分辨率的视频帧。通过学习视频帧内和帧间特征的不同权重,充分挖掘了视频帧之间的相关关系,并利用迭代网络结构采取渐进的方式由粗到精地处理提取到的特征。在两个公开的基准数据集上的实验结果表明,AFN能够有效处理包含多种运动和遮挡的视频,与一些主流方法相比在量化指标上提升较大,如对于4倍重建任务,AFN产生的视频帧的峰值信噪比(PSNR)在Vid4数据集上比帧循环视频超分辨率网络(FRVSR)产生的视频帧的PSNR提高了13.2%,在SPMCS数据集上比动态上采样滤波视频超分辨率网络(VSR-DUF)产生的视频帧的PSNR提高了15.3%。  相似文献   

15.
In many application areas there is a need to represent human-like knowledge related to spatio-temporal relations among multiple moving objects. This type of knowledge is usually imprecise, vague and fuzzy, while the reasoning about spatio-temporal relations is intuitive. In this paper we present a model of fuzzy spatio-temporal knowledge representation and reasoning based on high-level Petri nets. The model should be suitable for the design of a knowledge base for real-time, multi-agent-based intelligent systems that include expert or user human-like knowledge. The central part of the model is the knowledge representation scheme called FuSpaT, which supports the representation and reasoning for domains that include imprecise and fuzzy spatial, temporal and spatio-temporal relationships. The scheme is based on the high-level Petri nets called Petri nets with fuzzy spatio-temporal tokens (PeNeFuST). The FuSpaT scheme integrates the theory of the PeNeFuST and 117 spatio-temporal relations.The reasoning in the proposed model is a spatio-temporal data-driven process based on the dynamical properties of the scheme, i.e., the execution of the Petri nets with fuzzy spatio-temporal tokens. An illustrative example of the spatio-temporal reasoning for two agents in a simplified robot-soccer scene is given.  相似文献   

16.
To enable content based functionalities in video processing algorithms, decomposition of scenes into semantic objects is necessary. A semi-automatic Markov random field based multiresolution algorithm is presented for video object extraction in a complex scene. In the first frame, spatial segmentation and user intervention determine objects of interest. The specified objects are subsequently tracked in successive frames and newly appeared objects/regions are also detected. The video object extraction algorithm includes discrete wavelet transform decomposition multiresolution Markov random field (MRF)-based spatial segmentation with emphasis on border smoothness at different resolutions, and an MRF-based backward region classification that determines the tracked objects in the scene. Finally, a motion constraint, embedded in the region classifier, determines the newly appeared objects/regions and completes the proposed algorithm towards an efficient video segmentation algorithm. The results are applicable for generic segmentation applications, however the proposed multiresolution video segmentation algorithm supports scalable object-based wavelet coding in particular. Moreover, compared to traditional object extraction algorithms, it produces smoother and more visually pleasing shape masks at different resolutions. The proposed effective multiresolution video object extraction method allows for larger motion, better noise tolerance and less computational complexity  相似文献   

17.
目的 具有立体感和高端真实感的3D视频正越来越受到学术界和产业界的关注和重视,未来在3D影视、机器视觉、远程医疗、军事航天等领域将有着广泛的应用前景。对象基3D视频是未来3D视频技术的重要发展趋势,其中高效形状编码是对象基3D视频应用中的关键问题。但现有形状编码方法主要针对图像和视频对象,面向3D视频的形状编码算法还很少。为此,基于对象基3D视频的应用需求,提出一种基于轮廓和链码表示的高效多模式3D视频形状编码方法。方法 对于给定的3D视频形状序列逐帧进行对象轮廓提取并预处理后,进行对象轮廓活动性分析,将形状图像分成帧内模式编码图像和帧间预测模式编码图像。对于帧内编码图像,基于轮廓内链码方向约束和线性特征进行高效编码。对于帧间编码图像,采用基于链码表示的轮廓基运动补偿预测、视差补偿预测、联合运动与视差补偿预测等多种模式进行编码,以充分利用视点内对象轮廓的帧间时域相关性和视点间对象轮廓的空域相关性,从而达到高效编码的目的。结果 实验仿真结果显示所提算法性能优于经典和现有的最新同类方法,压缩效率平均能提高9.3%到64.8%不等。结论 提出的多模式3D视频形状编码方法可以有效去除对象轮廓的帧间和视点间冗余,能够进行高效编码压缩,性能优于现有同类方法,可广泛应用于对象基编码、对象基检索、对象基内容分析与理解等。  相似文献   

18.
This study presents a hybrid network for no-reference (NR) video quality assessment (VQA). Besides spatial cues, the network concerns temporal motion effect and temporal hysteresis effect on the visual quality estimation, and two modules are embedded. One module is dedicated to incorporate short-term spatio-temporal features based on spatial quality maps and temporal quality maps, and the follow-up module explores graph convolutional network to quantify the relationship between image frames in a sequence. The proposed network and several popular models are evaluated on three video quality databases (CSIQ, LIVE, and KoNViD-1K). Experimental results indicate that the network outperforms other involved NR models, and its competitive performance is close to that of state-of-the-art full-reference VQA models. Conclusively, short-term spatio-temporal feature fusion benefits the modeling of interaction between spatial and temporal cues in VQA tasks, long-term sequence fusion further improves the performance, and a strong correlation with human subjective judgment is achieved.  相似文献   

19.
In this work, we propose a general method for computing distance between video frames or sequences. Unlike conventional appearance-based methods, we first extract motion fields from original videos. To avoid the huge memory requirement demanded by the previous approaches, we utilize the “bag of motion vectors” model, and select Gaussian mixture model as compact representation. Thus, estimating distance between two frames is equivalent to calculating the distance between their corresponding Gaussian mixture models, which is solved via earth mover distance (EMD) in this paper. On the basis of the inter-frame distance, we further develop the distance measures for both full video sequences. Our main contribution is four-fold. Firstly, we operate on a tangent vector field of spatio-temporal 2D surface manifold generated by video motions, rather than the intensity gradient space. Here we argue that the former space is more fundamental. Secondly, the correlations between frames are explicitly exploited using a generative model named dynamic conditional random fields (DCRF). Under this framework, motion fields are estimated by Markov volumetric regression, which is more robust and may avoid the rank deficiency problem. Thirdly, our definition for video distance is in accord with human intuition and makes a better tradeoff between frame dissimilarity and chronological ordering. Lastly, our definition for frame distance allows for partial distance.  相似文献   

20.
Query by video clip   总被引:15,自引:0,他引:15  
Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames. For each key frame in the query, a similarity value (using color, texture, and motion) is obtained with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly sub-sample the query clip as well as the database video. Retrieval is based on matching color and texture features of the sub-sampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one basketball video as query and a different basketball video as the database show the effectiveness of feature representation and matching schemes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号