首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A multimedia application involves information that may be in a form of video, images, audio, text and graphics, need to be stored, retrieved and manipulated in large databases. In this paper, we propose an object-oriented database schema that supports multimedia documents and their temporal, spatial and logical structures. We present a document example and show how the schema can adress all the structures described. We also present a multimedia query specification language that can be used to describe a multimedia content portion to be retrieved from the database. The language provides means by which the user can specify the information on the media as well as the temoral and spatial relationships among these media.  相似文献   

2.
Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout of a document contains a significant amount of information that can be used to classify it by type in the absence of domain-specific models. Our approach to classification is based on “visual similarity” of layout structure and is implemented by building a supervised classifier, given examples of each class. We use image features such as percentages of text and non-text (graphics, images, tables, and rulings) content regions, column structures, relative point sizes of fonts, density of content area, and statistics of features of connected components which can be derived without class knowledge. In order to obtain class labels for training samples, we conducted a study where subjects ranked document pages with respect to their resemblance to representative page images. Class labels can also be assigned based on known document types, or can be defined by the user. We implemented our classification scheme using decision tree classifiers and self-organizing maps. Received June 15, 2000 / Revised November 15, 2000  相似文献   

3.
Symbolic images are composed of a finite set of symbols that have a semantic meaning. Examples of symbolic images include maps (where the semantic meaning of the symbols is given in the legend), engineering drawings, and floor plans. Two approaches for supporting queries on symbolic-image databases that are based on image content are studied. The classification approach preprocesses all symbolic images and attaches a semantic classification and an associated certainty factor to each object that it finds in the image. The abstraction approach describes each object in the symbolic image by using a vector consisting of the values of some of its features (e.g., shape, genus, etc.). The approaches differ in the way in which responses to queries are computed. In the classification approach, images are retrieved on the basis of whether or not they contain objects that have the same classification as the objects in the query. On the other hand, in the abstraction approach, retrieval is on the basis of similarity of feature vector values of these objects. Methods of integrating these two approaches into a relational multimedia database management system so that symbolic images can be stored and retrieved based on their content are described. Schema definitions and indices that support query specifications involving spatial as well as contextual constraints are presented. Spatial constraints may be based on both locational information (e.g., distance) and relational information (e.g., north of). Different strategies for image retrieval for a number of typical queries using these approaches are described. Estimated costs are derived for these strategies. Results are reported of a comparative study of the two approaches in terms of image insertion time, storage space, retrieval accuracy, and retrieval time. Received June 12, 1998 / Accepted October 13, 1998  相似文献   

4.
Recent advances in computer technologies have made it feasible to provide multimedia services, such as news distribution and entertainment, via high-bandwidth networks. The storage and retrieval of large multimedia objects (e.g., video) becomes a major design issue of the multimedia information system. While most other works on multimedia storage servers assume an on-line disk storage system, we consider a two-tier storage architecture with a robotic tape library as the vast near-line storage and an on-line disk system as the front-line storage. Magnetic tapes are cheaper, more robust, and have a larger capacity; hence, they are more cost effective for large scale storage systems (e.g., video-on-demand (VOD) systems may store tens of thousands of videos). We study in detail the design issues of the tape subsystem and propose some novel tape-scheduling algorithms which give faster response and require less disk buffer space. We also study the disk-striping policy and the data layout on the tape cartridge in order to fully utilize the throughput of the robotic tape system and to minimize the on-line disk storage space.  相似文献   

5.
The automatic extraction and recognition of news captions and annotations can be of great help locating topics of interest in digital news video libraries. To achieve this goal, we present a technique, called Video OCR (Optical Character Reader), which detects, extracts, and reads text areas in digital video data. In this paper, we address problems, describe the method by which Video OCR operates, and suggest applications for its use in digital news archives. To solve two problems of character recognition for videos, low-resolution characters and extremely complex backgrounds, we apply an interpolation filter, multi-frame integration and character extraction filters. Character segmentation is performed by a recognition-based segmentation method, and intermediate character recognition results are used to improve the segmentation. We also include a method for locating text areas using text-like properties and the use of a language-based postprocessing technique to increase word recognition rates. The overall recognition results are satisfactory for use in news indexing. Performing Video OCR on news video and combining its results with other video understanding techniques will improve the overall understanding of the news video content.  相似文献   

6.
Due to the fuzziness of query specification and media matching, multimedia retrieval is conducted by way of exploration. It is essential to provide feedback so that users can visualize query reformulation alternatives and database content distribution. Since media matching is an expensive task, another issue is how to efficiently support exploration so that the system is not overloaded by perpetual query reformulation. In this paper, we present a uniform framework to represent statistical information of both semantics and visual metadata for images in the databases. We propose the concept of query verification, which evaluates queries using statistics, and provides users with feedback, including the strictness and reformulation alternatives of each query condition as well as estimated numbers of matches. With query verification, the system increases the efficiency of the multimedia database exploration for both users and the system. Such statistical information is also utilized to support progressive query processing and query relaxation. Received: 9 June 1998/ Accepted: 21 July 2000 Published online: 4 May 2001  相似文献   

7.
We describe a process of word recognition that has high tolerance for poor image quality, tunability to the lexical content of the documents to which it is applied, and high speed of operation. This process relies on the transformation of text images into character shape codes, and on special lexica that contain information on the shape of words. We rely on the structure of English and the high efficiency of mapping between shape codes and the characters in the words. Remaining ambiguity is reduced by template matching using exemplars derived from surrounding text, taking advantage of the local consistency of font, face and size as well as image quality. This paper describes the effects of lexical content, structure and processing on the performance of a word recognition engine. Word recognition performance is shown to be enhanced by the application of an appropriate lexicon. Recognition speed is shown to be essentially independent of the details of lexical content provided the intersection of the occurrences of words in the document and the lexicon is high. Word recognition accuracy is dependent on both intersection and specificity of the lexicon. Received May 1, 1998 / Revised October 20, 1998  相似文献   

8.
Abstract. Providing a customized result set based upon a user preference is the ultimate objective of many content-based image retrieval systems. There are two main challenges in meeting this objective: First, there is a gap between the physical characteristics of digital images and the semantic meaning of the images. Secondly, different people may have different perceptions on the same set of images. To address both these challenges, we propose a model, named Yoda, that conceptualizes content-based querying as the task of soft classifying images into classes. These classes can overlap, and their members are different for different users. The “soft” classification is hence performed for each and every image feature, including both physical and semantic features. Subsequently, each image will be ranked based on the weighted aggregation of its classification memberships. The weights are user-dependent, and hence different users would obtain different result sets for the same query. Yoda employs a fuzzy-logic based aggregation function for ranking images. We show that, in addition to some performance benefits, fuzzy aggregation is less sensitive to noise and can support disjunctive queries as compared to weighted-average aggregation used by other content-based image retrieval systems. Finally, since Yoda heavily relies on user-dependent weights (i.e., user profiles) for the aggregation task, we utilize the users' relevance feedback to improve the profiles using genetic algorithms (GA). Our learning mechanism requires fewer user interactions, and results in a faster convergence to the user's preferences as compared to other learning techniques. Correspondence to: Y.-S. Chen (E-mail: yishinc@usc.edu) This research has been funded in part by NSF grants EEC-9529152 (IMSC ERC) and IIS-0082826, NIH-NLM R01-LM07061, DARPA and USAF under agreement nr. F30602-99-1-0524, and unrestricted cash gifts from NCR, Microsoft, and Okawa Foundation.  相似文献   

9.
Audio classification is an essential task in multimedia content analysis, which is a prerequisite to a variety of tasks such as segmentation, indexing and retrieval. This paper describes our study on multi-class audio classification on broadcast news, a popular multimedia repository with rich audio types. Motivated by the tonal regulations of music, we propose two pitch-density-based features, namely average pitch-density (APD) and relative tonal power density (RTPD). We use an SVM binary tree (SVM-BT) to hierarchically classify an audio clip into five classes: pure speech, music, environment sound, speech with music and speech with environment sound. Since SVM is a binary classifier, we use the SVM-BT architecture to realize coarse-to-fine multi-class classification with high accuracy and efficiency. Experiments show that the proposed one-dimensional APD and RTPD features are able to achieve comparable accuracy with popular high-dimensional features in speech/music discrimination, and the SVM-BT approach demonstrates superior performance in multi-class audio classification. With the help of the pitch-density-based features, we can achieve a high average accuracy of 94.2% in the five-class audio classification task.  相似文献   

10.
Raj  Chahat  Meel  Priyanka 《Applied Intelligence》2021,51(11):8132-8148

An upsurge of false information revolves around the internet. Social media and websites are flooded with unverified news posts. These posts are comprised of text, images, audio, and videos. There is a requirement for a system that detects fake content in multiple data modalities. We have seen a considerable amount of research on classification techniques for textual fake news detection, while frameworks dedicated to visual fake news detection are very few. We explored the state-of-the-art methods using deep networks such as CNNs and RNNs for multi-modal online information credibility analysis. They show rapid improvement in classification tasks without requiring pre-processing. To aid the ongoing research over fake news detection using CNN models, we build textual and visual modules to analyze their performances over multi-modal datasets. We exploit latent features present inside text and images using layers of convolutions. We see how well these convolutional neural networks perform classification when provided with only latent features and analyze what type of images are needed to be fed to perform efficient fake news detection. We propose a multi-modal Coupled ConvNet architecture that fuses both the data modules and efficiently classifies online news depending on its textual and visual content. We thence offer a comparative analysis of the results of all the models utilized over three datasets. The proposed architecture outperforms various state-of-the-art methods for fake news detection with considerably high accuracies.

  相似文献   

11.
Mixing synthetic and video images of an outdoor urban environment   总被引:3,自引:0,他引:3  
Mixing video and computer-generated images is a new and promising area of research for enhancing reality. It can be used in all the situations when a complete simulation would not be easy to implement. Past work on the subject has relied for a large part on human intervention at key moments of the composition. In this paper, we show that if enough geometric information about the environment is available, then efficient tools developed in the computer vision literature can be used to build a highly automated augmented reality loop. We focus on outdoor urban environments and present an application for the visual assessment of a new lighting project of the bridges of Paris. We present a fully augmented 300-image sequence of a specific bridge, the Pont Neuf. Emphasis is put on the robust calculation of the camera position. We also detail the techniques used for matching 2D and 3D primitives and for tracking features over the sequence. Our system overcomes two major difficulties. First, it is capable of handling poor-quality images, resulting from the fact that images were shot at night since the goal was to simulate a new lighting system. Second, it can deal with important changes in viewpoint position and in appearance along the sequence. Throughout the paper, many results are shown to illustrate the different steps and difficulties encountered. Received: 28 July 1997 / Accepted: 30 August 1998  相似文献   

12.
This paper proposes a new technique for the classification of indoor and outdoor images based on edge analysis. Our technique is based on analysing edge straightness in images. We make an original proposal that indoor images have a greater proportion of edges that are straight compared to outdoor images, and use multi-resolution estimates on edge straightness to improve our results. We also consider this method's possible applications in a real-time system. We compare our proposed technique with a number of other approaches that have been published on indoor/outdoor classification of images and convincingly show on a large database that our method generates much higher accuracy.  相似文献   

13.
First Person Indoor/Outdoor Augmented Reality Application: ARQuake   总被引:2,自引:0,他引:2  
This paper presents a first person outdoor/indoor augmented reality application ARQuake that we have developed. ARQuake is an extension of the desktop game Quake, and as such we are investigating how to convert a desktop first person application into an outdoor/indoor mobile augmented reality application. We present an architecture for a low cost, moderately accurate six degrees of freedom tracking system based on GPS, digital compass, and fiducial vision-based tracking. Usability issues such as monster selection, colour, input devices, and multi-person collaboration are discussed.  相似文献   

14.
化工事故新闻数据包含新闻内容,标题以及新闻来源等方面信息,新闻内容的文本对上下文具有较强的依赖性.为了更准确地提取文本特征并提高化工事故分类的准确性,该文提出了一种基于Attention机制的双向LSTM (BLSTM-Attention)神经网络模型对化工新闻文本进行特征提取并实现文本分类.BLSTM-Attention神经网络模型能够结合文本上下文语义信息,通过正向和反向的角度来提取事故新闻的文本特征;考虑到事故新闻中不同词对文本的贡献不大相同,加入Attention机制对不同词和句子分配不同权重.最后,将该文提出的分类方法与Naive-Bayes、CNN、RNN、BLSTM分类方法在相同的化工事故新闻数据集上进行实验对比.实验结果表明:该文提出的神经网络模型BLSTM-Attention神在化工数据集上的效果更优于其他分类方法模型.  相似文献   

15.
The next generation of interactive multimedia documents can contain both static media, e.g., text, graph, image, and continuous media, e.g., audio and video, and can provide user interactions in distributed environments. However, the temporal information of multimedia documents cannot be described using traditional document structures, e.g., Open Document Architecture (ODA) and Standard Generalized Mark-up Language (SGML); the continuous transmission of media units also raises some new synchronization problems, which have not been met before, for processing user interactions. Thus, developing a distributed interactive multimedia document system should resolve the issues of document model, presentation control architecture, and control scheme. In this paper, we (i) propose a new multimedia document model that contains the logical structure, the layout structure, and the temporal structure to formally describe multimedia documents, and (ii) point out main interaction-based synchronization problems, and propose a control architecture and a token-based control scheme to solve these interaction-based synchronization problems. Based on the proposed document model, control architecture, and control scheme, a distributed interactive multimedia document development mechanism, which is called MING-I, is developed on SUN workstations.  相似文献   

16.
Prior research in scene classification has focused on mapping a set of classic low-level vision features to semantically meaningful categories using a classifier engine. In this paper, we propose improving the established paradigm by using a simplified low-level feature set to predict multiple semantic scene attributes that are integrated probabilistically to obtain a final indoor/outdoor scene classification. An initial indoor/outdoor prediction is obtained by classifying computationally efficient, low-dimensional color and wavelet texture features using support vector machines. Similar low-level features can also be used to explicitly predict the presence of semantic features including grass and sky. The semantic scene attributes are then integrated using a Bayesian network designed for improved indoor/outdoor scene classification.  相似文献   

17.
Stop word location and identification for adaptive text recognition   总被引:2,自引:0,他引:2  
Abstract. We propose a new adaptive strategy for text recognition that attempts to derive knowledge about the dominant font on a given page. The strategy uses a linguistic observation that over half of all words in a typical English passage are contained in a small set of less than 150 stop words. A small dictionary of such words is compiled from the Brown corpus. An arbitrary text page first goes through layout analysis that produces word segmentation. A fast procedure is then applied to locate the most likely candidates for those words, using only widths of the word images. The identity of each word is determined using a word shape classifier. Using the word images together with their identities, character prototypes can be extracted using a previously proposed method. We describe experiments using simulated and real images. In an experiment using 400 real page images, we show that on average, eight distinct characters can be learned from each page, and the method is successful on 90% of all the pages. These can serve as useful seeds to bootstrap font learning. Received October 8, 1999 / Revised March 29, 2000  相似文献   

18.
Automatic character recognition and image understanding of a given paper document are the main objectives of the computer vision field. For these problems, a basic step is to isolate characters and group words from these isolated characters. In this paper, we propose a new method for extracting characters from a mixed text/graphic machine-printed document and an algorithm for distinguishing words from the isolated characters. For extracting characters, we exploit several features (size, elongation, and density) of characters and propose a characteristic value for classification using the run-length frequency of the image component. In the context of word grouping, previous works have largely been concerned with words which are placed on a horizontal or vertical line. Our word grouping algorithm can group words which are on inclined lines, intersecting lines, and even curved lines. To do this, we introduce the 3D neighborhood graph model which is very useful and efficient for character classification and word grouping. In the 3D neighborhood graph model, each connected component of a text image segment is mapped onto 3D space according to the area of the bounding box and positional information from the document. We conducted tests with more than 20 English documents and more than ten oriental documents scanned from books, brochures, and magazines. Experimental results show that more than 95% of words are successfully extracted from general documents, even in very complicated oriental documents. Received August 3, 2001 / Accepted August 8, 2001  相似文献   

19.
To improve the discrimination power of color-indexing techniques, we encode a minimal amount of spatial information in the index. We tesselate each image with five partially overlapping, fuzzy regions. In the index, for each region in an image, we store its average color and the covariance matrix of the color distribution. A similiarity function of these color features is used to match query images with images in the database. In addition, we propose two measures to evaluate the performance of image-indexing techniques. We present experimental results using an image database which contains more than 11,600 color images.  相似文献   

20.
The existing skew estimation techniques usually assume that the input image is of high resolution and that the detectable angle range is limited. We present a more generic solution for this task that overcomes these restrictions. Our method is based on determination of the first eigenvector of the data covariance matrix. The solution comprises image resolution reduction, connected component analysis, component classification using a fuzzy approach, and skew estimation. Experiments on a large set of various document images and performance comparison with two Hough transform-based methods show a good accuracy and robustness for our method. Received October 10, 1998 / Revised version September 9, 1999  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号