共查询到20条相似文献,搜索用时 31 毫秒
1.
A multimedia application involves information that may be in a form of video, images, audio, text and graphics, need to be
stored, retrieved and manipulated in large databases. In this paper, we propose an object-oriented database schema that supports
multimedia documents and their temporal, spatial and logical structures. We present a document example and show how the schema
can adress all the structures described. We also present a multimedia query specification language that can be used to describe
a multimedia content portion to be retrieved from the database. The language provides means by which the user can specify
the information on the media as well as the temoral and spatial relationships among these media. 相似文献
2.
Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout
of a document contains a significant amount of information that can be used to classify it by type in the absence of domain-specific
models. Our approach to classification is based on “visual similarity” of layout structure and is implemented by building
a supervised classifier, given examples of each class. We use image features such as percentages of text and non-text (graphics,
images, tables, and rulings) content regions, column structures, relative point sizes of fonts, density of content area, and
statistics of features of connected components which can be derived without class knowledge. In order to obtain class labels
for training samples, we conducted a study where subjects ranked document pages with respect to their resemblance to representative
page images. Class labels can also be assigned based on known document types, or can be defined by the user. We implemented
our classification scheme using decision tree classifiers and self-organizing maps.
Received June 15, 2000 / Revised November 15, 2000 相似文献
3.
Symbolic images are composed of a finite set of symbols that have a semantic meaning. Examples of symbolic images include
maps (where the semantic meaning of the symbols is given in the legend), engineering drawings, and floor plans. Two approaches
for supporting queries on symbolic-image databases that are based on image content are studied. The classification approach
preprocesses all symbolic images and attaches a semantic classification and an associated certainty factor to each object
that it finds in the image. The abstraction approach describes each object in the symbolic image by using a vector consisting
of the values of some of its features (e.g., shape, genus, etc.). The approaches differ in the way in which responses to queries
are computed. In the classification approach, images are retrieved on the basis of whether or not they contain objects that
have the same classification as the objects in the query. On the other hand, in the abstraction approach, retrieval is on
the basis of similarity of feature vector values of these objects. Methods of integrating these two approaches into a relational
multimedia database management system so that symbolic images can be stored and retrieved based on their content are described.
Schema definitions and indices that support query specifications involving spatial as well as contextual constraints are presented.
Spatial constraints may be based on both locational information (e.g., distance) and relational information (e.g., north of).
Different strategies for image retrieval for a number of typical queries using these approaches are described. Estimated costs
are derived for these strategies. Results are reported of a comparative study of the two approaches in terms of image insertion
time, storage space, retrieval accuracy, and retrieval time.
Received June 12, 1998 / Accepted October 13, 1998 相似文献
4.
Recent advances in computer technologies have made it feasible to provide multimedia services, such as news distribution
and entertainment, via high-bandwidth networks. The storage and retrieval of large multimedia objects (e.g., video) becomes
a major design issue of the multimedia information system. While most other works on multimedia storage servers assume an
on-line disk storage system, we consider a two-tier storage architecture with a robotic tape library as the vast near-line
storage and an on-line disk system as the front-line storage. Magnetic tapes are cheaper, more robust, and have a larger
capacity; hence, they are more cost effective for large scale storage systems (e.g., video-on-demand (VOD) systems may
store tens of thousands of videos). We study in detail the design issues of the tape subsystem and propose some novel tape-scheduling
algorithms which give faster response and require less disk buffer space. We also study the disk-striping policy and the
data layout on the tape cartridge in order to fully utilize the throughput of the robotic tape system and to minimize the
on-line disk storage space. 相似文献
5.
The automatic extraction and recognition of news captions and annotations can be of great help locating topics of interest
in digital news video libraries. To achieve this goal, we present a technique, called Video OCR (Optical Character Reader),
which detects, extracts, and reads text areas in digital video data. In this paper, we address problems, describe the method
by which Video OCR operates, and suggest applications for its use in digital news archives. To solve two problems of character
recognition for videos, low-resolution characters and extremely complex backgrounds, we apply an interpolation filter, multi-frame
integration and character extraction filters. Character segmentation is performed by a recognition-based segmentation method,
and intermediate character recognition results are used to improve the segmentation. We also include a method for locating
text areas using text-like properties and the use of a language-based postprocessing technique to increase word recognition
rates. The overall recognition results are satisfactory for use in news indexing. Performing Video OCR on news video and combining
its results with other video understanding techniques will improve the overall understanding of the news video content. 相似文献
6.
Due to the fuzziness of query specification and media matching, multimedia retrieval is conducted by way of exploration.
It is essential to provide feedback so that users can visualize query reformulation alternatives and database content distribution.
Since media matching is an expensive task, another issue is how to efficiently support exploration so that the system is not
overloaded by perpetual query reformulation. In this paper, we present a uniform framework to represent statistical information
of both semantics and visual metadata for images in the databases. We propose the concept of query verification, which evaluates queries using statistics, and provides users with feedback, including the strictness and reformulation alternatives
of each query condition as well as estimated numbers of matches. With query verification, the system increases the efficiency
of the multimedia database exploration for both users and the system. Such statistical information is also utilized to support
progressive query processing and query relaxation.
Received: 9 June 1998/ Accepted: 21 July 2000 Published online: 4 May 2001 相似文献
7.
We describe a process of word recognition that has high tolerance for poor image quality, tunability to the lexical content
of the documents to which it is applied, and high speed of operation. This process relies on the transformation of text images
into character shape codes, and on special lexica that contain information on the shape of words. We rely on the structure
of English and the high efficiency of mapping between shape codes and the characters in the words. Remaining ambiguity is
reduced by template matching using exemplars derived from surrounding text, taking advantage of the local consistency of font,
face and size as well as image quality. This paper describes the effects of lexical content, structure and processing on the
performance of a word recognition engine. Word recognition performance is shown to be enhanced by the application of an appropriate
lexicon. Recognition speed is shown to be essentially independent of the details of lexical content provided the intersection
of the occurrences of words in the document and the lexicon is high. Word recognition accuracy is dependent on both intersection
and specificity of the lexicon.
Received May 1, 1998 / Revised October 20, 1998 相似文献
8.
Abstract. Providing a customized result set based upon a user preference is the ultimate objective of many content-based image retrieval
systems. There are two main challenges in meeting this objective: First, there is a gap between the physical characteristics
of digital images and the semantic meaning of the images. Secondly, different people may have different perceptions on the
same set of images. To address both these challenges, we propose a model, named Yoda, that conceptualizes content-based querying
as the task of soft classifying images into classes. These classes can overlap, and their members are different for different
users. The “soft” classification is hence performed for each and every image feature, including both physical and semantic
features. Subsequently, each image will be ranked based on the weighted aggregation of its classification memberships. The
weights are user-dependent, and hence different users would obtain different result sets for the same query. Yoda employs
a fuzzy-logic based aggregation function for ranking images. We show that, in addition to some performance benefits, fuzzy
aggregation is less sensitive to noise and can support disjunctive queries as compared to weighted-average aggregation used
by other content-based image retrieval systems. Finally, since Yoda heavily relies on user-dependent weights (i.e., user profiles)
for the aggregation task, we utilize the users' relevance feedback to improve the profiles using genetic algorithms (GA).
Our learning mechanism requires fewer user interactions, and results in a faster convergence to the user's preferences as
compared to other learning techniques.
Correspondence to: Y.-S. Chen (E-mail: yishinc@usc.edu)
This research has been funded in part by NSF grants EEC-9529152 (IMSC ERC) and IIS-0082826, NIH-NLM R01-LM07061, DARPA and
USAF under agreement nr. F30602-99-1-0524, and unrestricted cash gifts from NCR, Microsoft, and Okawa Foundation. 相似文献
9.
Audio classification is an essential task in multimedia content analysis, which is a prerequisite to a variety of tasks such
as segmentation, indexing and retrieval. This paper describes our study on multi-class audio classification on broadcast news,
a popular multimedia repository with rich audio types. Motivated by the tonal regulations of music, we propose two pitch-density-based
features, namely average pitch-density (APD) and relative tonal power density (RTPD). We use an SVM binary tree (SVM-BT) to
hierarchically classify an audio clip into five classes: pure speech, music, environment sound, speech with music and speech
with environment sound. Since SVM is a binary classifier, we use the SVM-BT architecture to realize coarse-to-fine multi-class
classification with high accuracy and efficiency. Experiments show that the proposed one-dimensional APD and RTPD features
are able to achieve comparable accuracy with popular high-dimensional features in speech/music discrimination, and the SVM-BT
approach demonstrates superior performance in multi-class audio classification. With the help of the pitch-density-based features,
we can achieve a high average accuracy of 94.2% in the five-class audio classification task. 相似文献
10.
An upsurge of false information revolves around the internet. Social media and websites are flooded with unverified news posts. These posts are comprised of text, images, audio, and videos. There is a requirement for a system that detects fake content in multiple data modalities. We have seen a considerable amount of research on classification techniques for textual fake news detection, while frameworks dedicated to visual fake news detection are very few. We explored the state-of-the-art methods using deep networks such as CNNs and RNNs for multi-modal online information credibility analysis. They show rapid improvement in classification tasks without requiring pre-processing. To aid the ongoing research over fake news detection using CNN models, we build textual and visual modules to analyze their performances over multi-modal datasets. We exploit latent features present inside text and images using layers of convolutions. We see how well these convolutional neural networks perform classification when provided with only latent features and analyze what type of images are needed to be fed to perform efficient fake news detection. We propose a multi-modal Coupled ConvNet architecture that fuses both the data modules and efficiently classifies online news depending on its textual and visual content. We thence offer a comparative analysis of the results of all the models utilized over three datasets. The proposed architecture outperforms various state-of-the-art methods for fake news detection with considerably high accuracies. 相似文献
11.
Mixing video and computer-generated images is a new and promising area of research for enhancing reality. It can be used
in all the situations when a complete simulation would not be easy to implement. Past work on the subject has relied for a
large part on human intervention at key moments of the composition. In this paper, we show that if enough geometric information
about the environment is available, then efficient tools developed in the computer vision literature can be used to build
a highly automated augmented reality loop. We focus on outdoor urban environments and present an application for the visual
assessment of a new lighting project of the bridges of Paris. We present a fully augmented 300-image sequence of a specific
bridge, the Pont Neuf. Emphasis is put on the robust calculation of the camera position. We also detail the techniques used
for matching 2D and 3D primitives and for tracking features over the sequence. Our system overcomes two major difficulties.
First, it is capable of handling poor-quality images, resulting from the fact that images were shot at night since the goal
was to simulate a new lighting system. Second, it can deal with important changes in viewpoint position and in appearance
along the sequence. Throughout the paper, many results are shown to illustrate the different steps and difficulties encountered.
Received: 28 July 1997 / Accepted: 30 August 1998 相似文献
12.
This paper proposes a new technique for the classification of indoor and outdoor images based on edge analysis. Our technique is based on analysing edge straightness in images. We make an original proposal that indoor images have a greater proportion of edges that are straight compared to outdoor images, and use multi-resolution estimates on edge straightness to improve our results. We also consider this method's possible applications in a real-time system. We compare our proposed technique with a number of other approaches that have been published on indoor/outdoor classification of images and convincingly show on a large database that our method generates much higher accuracy. 相似文献
13.
This paper presents a first person outdoor/indoor augmented reality application ARQuake that we have developed. ARQuake is
an extension of the desktop game Quake, and as such we are investigating how to convert a desktop first person application
into an outdoor/indoor mobile augmented reality application. We present an architecture for a low cost, moderately accurate
six degrees of freedom tracking system based on GPS, digital compass, and fiducial vision-based tracking. Usability issues
such as monster selection, colour, input devices, and multi-person collaboration are discussed. 相似文献
14.
化工事故新闻数据包含新闻内容,标题以及新闻来源等方面信息,新闻内容的文本对上下文具有较强的依赖性.为了更准确地提取文本特征并提高化工事故分类的准确性,该文提出了一种基于Attention机制的双向LSTM (BLSTM-Attention)神经网络模型对化工新闻文本进行特征提取并实现文本分类.BLSTM-Attention神经网络模型能够结合文本上下文语义信息,通过正向和反向的角度来提取事故新闻的文本特征;考虑到事故新闻中不同词对文本的贡献不大相同,加入Attention机制对不同词和句子分配不同权重.最后,将该文提出的分类方法与Naive-Bayes、CNN、RNN、BLSTM分类方法在相同的化工事故新闻数据集上进行实验对比.实验结果表明:该文提出的神经网络模型BLSTM-Attention神在化工数据集上的效果更优于其他分类方法模型. 相似文献
15.
The next generation of interactive multimedia documents can contain both static media, e.g., text, graph, image, and continuous
media, e.g., audio and video, and can provide user interactions in distributed environments. However, the temporal information
of multimedia documents cannot be described using traditional document structures, e.g., Open Document Architecture (ODA)
and Standard Generalized Mark-up Language (SGML); the continuous transmission of media units also raises some new synchronization
problems, which have not been met before, for processing user interactions. Thus, developing a distributed interactive multimedia
document system should resolve the issues of document model, presentation control architecture, and control scheme. In this
paper, we (i) propose a new multimedia document model that contains the logical structure, the layout structure, and the temporal
structure to formally describe multimedia documents, and (ii) point out main interaction-based synchronization problems, and
propose a control architecture and a token-based control scheme to solve these interaction-based synchronization problems.
Based on the proposed document model, control architecture, and control scheme, a distributed interactive multimedia document
development mechanism, which is called MING-I, is developed on SUN workstations. 相似文献
16.
Prior research in scene classification has focused on mapping a set of classic low-level vision features to semantically meaningful categories using a classifier engine. In this paper, we propose improving the established paradigm by using a simplified low-level feature set to predict multiple semantic scene attributes that are integrated probabilistically to obtain a final indoor/outdoor scene classification. An initial indoor/outdoor prediction is obtained by classifying computationally efficient, low-dimensional color and wavelet texture features using support vector machines. Similar low-level features can also be used to explicitly predict the presence of semantic features including grass and sky. The semantic scene attributes are then integrated using a Bayesian network designed for improved indoor/outdoor scene classification. 相似文献
17.
Abstract. We propose a new adaptive strategy for text recognition that attempts to derive knowledge about the dominant font on a given
page. The strategy uses a linguistic observation that over half of all words in a typical English passage are contained in
a small set of less than 150 stop words. A small dictionary of such words is compiled from the Brown corpus. An arbitrary
text page first goes through layout analysis that produces word segmentation. A fast procedure is then applied to locate the
most likely candidates for those words, using only widths of the word images. The identity of each word is determined using
a word shape classifier. Using the word images together with their identities, character prototypes can be extracted using
a previously proposed method. We describe experiments using simulated and real images. In an experiment using 400 real page
images, we show that on average, eight distinct characters can be learned from each page, and the method is successful on
90% of all the pages. These can serve as useful seeds to bootstrap font learning.
Received October 8, 1999 / Revised March 29, 2000 相似文献
18.
Automatic character recognition and image understanding of a given paper document are the main objectives of the computer
vision field. For these problems, a basic step is to isolate characters and group words from these isolated characters. In
this paper, we propose a new method for extracting characters from a mixed text/graphic machine-printed document and an algorithm
for distinguishing words from the isolated characters. For extracting characters, we exploit several features (size, elongation,
and density) of characters and propose a characteristic value for classification using the run-length frequency of the image
component. In the context of word grouping, previous works have largely been concerned with words which are placed on a horizontal
or vertical line. Our word grouping algorithm can group words which are on inclined lines, intersecting lines, and even curved
lines. To do this, we introduce the 3D neighborhood graph model which is very useful and efficient for character classification
and word grouping. In the 3D neighborhood graph model, each connected component of a text image segment is mapped onto 3D
space according to the area of the bounding box and positional information from the document. We conducted tests with more
than 20 English documents and more than ten oriental documents scanned from books, brochures, and magazines. Experimental
results show that more than 95% of words are successfully extracted from general documents, even in very complicated oriental
documents.
Received August 3, 2001 / Accepted August 8, 2001 相似文献
19.
To improve the discrimination power of color-indexing techniques, we encode a minimal amount of spatial information in the
index. We tesselate each image with five partially overlapping, fuzzy regions. In the index, for each region in an image,
we store its average color and the covariance matrix of the color distribution. A similiarity function of these color features
is used to match query images with images in the database. In addition, we propose two measures to evaluate the performance
of image-indexing techniques. We present experimental results using an image database which contains more than 11,600 color
images. 相似文献
20.
The existing skew estimation techniques usually assume that the input image is of high resolution and that the detectable
angle range is limited. We present a more generic solution for this task that overcomes these restrictions. Our method is
based on determination of the first eigenvector of the data covariance matrix. The solution comprises image resolution reduction,
connected component analysis, component classification using a fuzzy approach, and skew estimation. Experiments on a large
set of various document images and performance comparison with two Hough transform-based methods show a good accuracy and
robustness for our method.
Received October 10, 1998 / Revised version September 9, 1999 相似文献
|