首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 49 毫秒
1.
This paper proposes a human-centered interactive framework for automatically mining and retrieving semantic events in videos. After preprocessing, the object trajectories and event models are fed into the core components of the framework for learning and retrieval. As trajectories are spatiotemporal in nature, the learning component is designed to analyze time series data. The human feedback to the retrieval results provides progressive guidance for the retrieval component in the framework. The retrieval results are in the form of video sequences instead of contained trajectories for user convenience. Thus, the trajectories are not directly labeled by the feedback as required by the training algorithm. A mapping between semantic video retrieval and multiple instance learning (MIL) is established in order to solve this problem. The effectiveness of the algorithm is demonstrated by experiments on real-life transportation surveillance videos.   相似文献   

2.
3.
4.
Models for motion-based video indexing and retrieval   总被引:9,自引:0,他引:9  
With the rapid proliferation of multimedia applications that require video data management, it is becoming more desirable to provide proper video data indexing techniques capable of representing the rich semantics in video data. In real-time applications, the need for efficient query processing is another reason for the use of such techniques. We present models that use the object motion information in order to characterize the events to allow subsequent retrieval. Algorithms for different spatiotemporal search cases in terms of spatial and temporal translation and scale invariance have been developed using various signal and image processing techniques. We have developed a prototype video search engine, PICTURESQUE (pictorial information and content transformation unified retrieval engine for spatiotemporal queries) to verify the proposed methods. Development of such technology will enable true multimedia search engines that will enable indexing and searching of the digital video data based on its true content.  相似文献   

5.
Digital video now plays an important role in medical education, health care, telemedicine and other medical applications. Several content-based video retrieval (CBVR) systems have been proposed in the past, but they still suffer from the following challenging problems: semantic gap, semantic video concept modeling, semantic video classification, and concept-oriented video database indexing and access. In this paper, we propose a novel framework to make some advances toward the final goal to solve these problems. Specifically, the framework includes: 1) a semantic-sensitive video content representation framework by using principal video shots to enhance the quality of features; 2) semantic video concept interpretation by using flexible mixture model to bridge the semantic gap; 3) a novel semantic video-classifier training framework by integrating feature selection, parameter estimation, and model selection seamlessly in a single algorithm; and 4) a concept-oriented video database organization technique through a certain domain-dependent concept hierarchy to enable semantic-sensitive video retrieval and browsing.  相似文献   

6.
Automatic indexing and retrieval of digital data poses major challenges. The main problem arises from the ever increasing mass of digital media and the lack of efficient methods for indexing and retrieval of such data based on the semantic content rather than keywords. To enable intelligent web interactions, or even web filtering, we need to be capable of interpreting the information base in an intelligent manner. For a number of years research has been ongoing in the field of ontological engineering with the aim of using ontologies to add such (meta) knowledge to information. In this paper, we describe the architecture of a system (Dynamic REtrieval Analysis and semantic metadata Management (DREAM)) designed to automatically and intelligently index huge repositories of special effects video clips, based on their semantic content, using a network of scalable ontologies to enable intelligent retrieval. The DREAM Demonstrator has been evaluated as deployed in the film post-production phase to support the process of storage, indexing and retrieval of large data sets of special effects video clips as an exemplar application domain. This paper provides its performance and usability results and highlights the scope for future enhancements of the DREAM architecture which has proven successful in its first and possibly most challenging proving ground, namely film production, where it is already in routine use within our test bed Partners’ creative processes.  相似文献   

7.
Most semantic video search methods use text-keyword queries or example video clips and images. But such methods have limitations. To address the problems of example-based video search approaches and avoid the use of specialized models, we conduct semantic video searches using a reranking method that automatically reorders the initial text search results based on visual cues and associated context. We developed two general reranking methods that explore the recurrent visual patterns in many contexts, such as the returned images or video shots from initial text queries, and video stories from multiple channels.  相似文献   

8.
Real-time video-shot detection for scene surveillance applications   总被引:11,自引:0,他引:11  
A surveillance system with automatic video-shot detection and indexing capabilities is presented. The proposed system aims at detecting the presence of abandoned objects in a guarded environment and at automatically performing online semantic video segmentation in order to facilitate the human operator's task of retrieving the cause of an alarm. The former task is performed by operating image segmentation based on temporal rank-order filtering, followed by classification in order to reduce false alarms. The latter task is performed by operating temporal video segmentation when an alarm is detected. In the clips of interest, the key frame is the one depicting a person leaving a dangerous object, and is determined on the basis of a feature indicating the movement around the dangerous region. Experimental results are reported in terms of static region detection, classification, clip and key-frame detection errors versus different levels of complexity of the guarded environment, in order to establish the performance that can be expected from the system in different situations.  相似文献   

9.
张良  周长胜 《电子科技》2011,24(10):111-114
分析了视频数据与文本数据的差异,以及视频数据在视频分析检索方面存在的问题。从视频内容分析领域的研究热点出发,分别对视频语义库、与视频分析相关的视频低层特征、视频对象划分与识别、视频信息描述与编码等方面的技术进行了分析和对比。并提出了一个视频语义分析的框架和分析流程。  相似文献   

10.
A content-based image retrieval mechanism to support complex similarity queries is presented. The image content is defined by three kinds of features: quantifiable features describing the visual information, nonquantifiable features describing the semantic information, and keywords describing more abstract semantic information. In correspondence with these feature sets, we construct three types of indexes: visual indexes, semantic indexes, and keyword indexes. Index structures are elaborated to provide effective and efficient retrieval of images based on their contents. The underlying index structure used for all indexes is the HG-tree. In addition to the HG-tree, the signature file and hashing technique are also employed to index keywords and semantic features. The proposed indexing scheme combines and extends the HG-tree, the signature file, and the hashing scheme to support complex similarity queries. We also propose a new evaluation strategy to process the complex similarity queries. Experiments have been carried out on large image collections to demonstrate the effectiveness of the proposed retrieval mechanism.  相似文献   

11.
Video retrieval methods have been developed for a single query. Multi-query video retrieval problem has not been investigated yet. In this study, an efficient and fast multi-query video retrieval framework is developed. Query videos are assumed to be related to more than one semantic. The framework supports an arbitrary number of video queries. The method is built upon using binary video hash codes. As a result, it is fast and requires a lower storage space. Database and query hash codes are generated by a deep hashing method that not only generates hash codes but also predicts query labels when they are chosen outside the database. The retrieval is based on the Pareto front multi-objective optimization method. Re-ranking performed on the retrieved videos by using non-binary deep features increases the retrieval accuracy considerably. Simulations carried out on two multi-label video databases show that the proposed method is efficient and fast in terms of retrieval accuracy and time.  相似文献   

12.
Modern surveillance networks are large collections of computational sensor nodes, where each node can be programmed to capture, prioritize, segment salient objects, and transmit them to central repositories for indexing. Visual data from such networks grow exponentially and present many challenges concerning their transmission, storage, and retrieval. Searching for particular surveillance objects is a common but challenging task. In this paper, we present an efficient features extraction framework which utilizes an optimal subset of kernels from the first layer of a convolutional neural network pre-trained on ImageNet dataset for object-based surveillance image search. The input image is convolved with the set of kernels to generate feature maps, which are aggregated into a single feature map using a novel spatial maximal activator pooling approach. A low-dimensional feature vector is computed to represent surveillance objects. The proposed system provides improvements in both performance and efficiency over other similar approaches for surveillance datasets.  相似文献   

13.
We present a geometry-based indexing approach for the retrieval of video databases. It consists of two modules: 3D object shape inferencing from video data and geometric modeling from the reconstructed shape structure. A motion-based segmentation algorithm employing feature block tracking and principal component split is used for multi-moving-object motion classification and segmentation. After segmentation, feature blocks from each individual object are used to reconstruct its motion and structure through a factorization method. The estimated shape structure and motion parameters are used to generate the implicit polynomial model for the object. The video data is retrieved using the geometric structure of objects and their spatial relationship. We generalize the 2D string to 3D to compactly encode the spatial relationship of objects.  相似文献   

14.
Automatic semantic video object extraction is an important step for providing content-based video coding, indexing and retrieval. However, it is very difficult to design a generic semantic video object extraction technique, which can provide variant semantic video objects by using the same function. Since the presence and absence of persons in an image sequence provide important clues about video content, automatic face detection and human being generation are very attractive for content-based video database applications. For this reason, we propose a novel face detection and semantic human object generation algorithm. The homogeneous image regions with accurate boundaries are first obtained by integrating the results of color edge detection and region growing procedures. The human faces are detected from these homogeneous image regions by using skin color segmentation and facial filters. These detected faces are then used as object seed for semantic human object generation. The correspondences of the detected faces and semantic human objects along time axis are further exploited by a contour-based temporal tracking procedure.  相似文献   

15.
16.
Video text information plays an important role in semantic-based video analysis, indexing and retrieval. Video texts are closely related to the content of a video. Usually, the fundamental steps of text-based video analysis, browsing and retrieval consist of video text detection, localization, tracking, segmentation and recognition. Video sequences are commonly stored in compressed formats where MPEG coding techniques are often adopted. In this paper, a unified framework for text detection, localization, and tracking in compressed videos using the discrete cosines transform (DCT) coefficients is proposed. A coarse to fine text detection method is used to find text blocks in terms of the block DCT texture intensity information. The DCT texture intensity of an 8×8 block of an intra-frame is approximately represented by seven AC coefficients. The candidate text block regions are further verified and refined. The text block region localization and tracking are carried out by virtue of the horizontal and vertical block texture intensity projection profiles. The appearing and disappearing frames of each text line are determined by the text tracking. The final experimental results show the effectiveness of the proposed methods.  相似文献   

17.
Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence.  相似文献   

18.
BilVideo: a video database management system   总被引:1,自引:0,他引:1  
The BilVideo video database management system provides integrated support for spatiotemporal and semantic queries for video. A knowledge base, consisting of a fact base and a comprehensive rule set implemented in Prolog, handles spatio-temporal queries. These queries contain any combination of conditions related to direction, topology, 3D relationships, object appearance, trajectory projection, and similarity-based object trajectories. The rules in the knowledge base significantly reduce the number of facts representing the spatio-temporal relations that the system needs to store. A feature database stored in an object-relational database management system handles semantic queries. To respond to user queries containing both spatio-temporal and semantic conditions, a query processor interacts with the knowledge base and object-relational database and integrates the results returned from these two system components. Because of space limitations, we only discuss the Web-based visual query interface and its fact-extractor and video-annotator tools. These tools populate the system's fact base and feature database to support both query types.  相似文献   

19.
20.
Most current content-based image retrieval systems are still incapable of providing users with their desired results. The major difficulty lies in the gap between low-level image features and high-level image semantics. To address the problem, this study reports a framework for effective image retrieval by employing a novel idea of memory learning. It forms a knowledge memory model to store the semantic information by simply accumulating user-provided interactions. A learning strategy is then applied to predict the semantic relationships among images according to the memorized knowledge. Image queries are finally performed based on a seamless combination of low-level features and learned semantics. One important advantage of our framework is its ability to efficiently annotate images and also propagate the keyword annotation from the labeled images to unlabeled images. The presented algorithm has been integrated into a practical image retrieval system. Experiments on a collection of 10,000 general-purpose images demonstrate the effectiveness of the proposed framework.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号