共查询到20条相似文献,搜索用时 15 毫秒
1.
The comparison of digital images to determine their degree of similarity is one of the fundamental problems of computer vision.
Many techniques exist which accomplish this with a certain level of success, most of which involve either the analysis of
pixel-level features or the segmentation of images into sub-objects that can be geometrically compared. In this paper we develop
and evaluate a new variation of the pixel feature and analysis technique known as the color correlogram in the context of
a content-based image retrieval system. Our approach is to extend the autocorrelogram by adding multiple image features in
addition to color. We compare the performance of each index scheme with our method for image retrieval on a large database
of images. The experiment shows that our proposed method gives a significant improvement over histogram or color correlogram
indexing, and it is also memory-efficient.
相似文献
Peter YoonEmail: |
2.
Automatic audio content recognition has attracted an increasing attention for developing multimedia systems, for which the most popular approaches combine frame-based features with statistic models or discriminative classifiers. The existing methods are effective for clean single-source event detection but may not perform well for unstructured environmental sounds, which have a broad noise-like flat spectrum and a diverse variety of compositions. We present an automatic acoustic scene understanding framework that detects audio events through two hierarchies, acoustic scene recognition and audio event recognition, in which the former is preceded by following dominant audio sources and in turn helps infer non-dominant audio events within the same scene through modeling their occurrence correlations. On the scene recognition hierarchy, we perform adaptive segmentation and feature extraction for every input acoustic scene stream through Eigen-audiospace and an optimized feature subspace, respectively. After filtering background, scene streams are recognized by modeling the observation density of dominant features using a two-level hidden Markov model. On the audio event recognition hierarchy, scene knowledge is characterized by an audio context model that essentially describes the occurrence correlations of dominant and non-dominant audio events within this scene. Monte Carlo integration and gradient descent techniques are employed to maximize the likelihood and correctly tag each audio event. To the best of our knowledge, this is the first work that models event correlations as scene context for robust audio event detection from complex and noisy environments. Note that according to the recent report, the mean accuracy for the acoustic scene classification task by human listeners is only around 71 % on the data collected in office environments from the DCASE dataset. None of the existing methods performs well on all scene categories and the average accuracy of the best performances of the recent 11 methods is 53.8 %. The proposed method averagely achieves an accuracy of 62.3 % on the same dataset. Additionally, we create a 10-CASE dataset by manually collecting 5,250 audio clips of 10 scene types and 21 event categories. Our experimental results on 10-CASE show that the proposed method averagely achieves the enhanced performance of 78.3 %, and the average accuracy of audio event recognition can be effectively improved by capturing dominant audio sources and reasoning non-dominant events from the dominant ones through acoustic context modeling. In the future work, exploring the interactions between acoustic scene recognition and audio event detection, and incorporating other modalities to improve the accuracy are required to further advance the proposed framework. 相似文献
3.
4.
Mahesh Viswanathan Homayoon S.M. Beigi Satya Dharanipragada Fereydoun Maali Alain Tritschler 《International Journal on Document Analysis and Recognition》2000,2(4):147-162
Speech and speaker recognition systems are rapidly being deployed in real-world applications. In this paper, we discuss the details of a system and its components for indexing and retrieving multimedia content derived from broadcast news sources. The audio analysis component calls for real-time speech recognition for converting the audio to text and concurrent speaker analysis consisting of the segmentation of audio into acoustically homogeneous sections followed by speaker identification. The output of these two simultaneous processes is used to abstract statistics to automatically build indexes for text-based and speaker-based retrieval without user intervention. The real power of multimedia document processing is the possibility of Boolean queries in the form of combined text- and speaker-based user queries. Retrieval for such queries entails combining the results of individual text and speaker based searches. The underlying techniques discussed here can easily be extended to other speech-centric applications and transactions. 相似文献
5.
6.
7.
Face recognition by generalized two-dimensional FLD method and multi-class support vector machines 总被引:2,自引:0,他引:2
Shiladitya Chowdhury Jamuna Kanta Sing Dipak Kumar Basu Mita NasipuriAuthor vitae 《Applied Soft Computing》2011,11(7):4282-4292
This paper presents a novel scheme for feature extraction, namely, the generalized two-dimensional Fisher's linear discriminant (G-2DFLD) method and its use for face recognition using multi-class support vector machines as classifier. The G-2DFLD method is an extension of the 2DFLD method for feature extraction. Like 2DFLD method, G-2DFLD method is also based on the original 2D image matrix. However, unlike 2DFLD method, which maximizes class separability either from row or column direction, the G-2DFLD method maximizes class separability from both the row and column directions simultaneously. To realize this, two alternative Fisher's criteria have been defined corresponding to row and column-wise projection directions. Unlike 2DFLD method, the principal components extracted from an image matrix in G-2DFLD method are scalars; yielding much smaller image feature matrix. The proposed G-2DFLD method was evaluated on two popular face recognition databases, the AT&T (formerly ORL) and the UMIST face databases. The experimental results using different experimental strategies show that the new G-2DFLD scheme outperforms the PCA, 2DPCA, FLD and 2DFLD schemes, not only in terms of computation times, but also for the task of face recognition using multi-class support vector machines (SVM) as classifier. The proposed method also outperforms some of the neural networks and other SVM-based methods for face recognition reported in the literature. 相似文献
8.
9.
Antonio Adán 《Pattern recognition letters》2011,32(9):1337-1353
The intention of the strategy proposed in this paper is to solve the object retrieval problem in highly complex scenes using 3D information. In the worst case scenario the complexity of the scene includes several objects with irregular or free-form shapes, viewed from any direction, which are self-occluded or partially occluded by other objects with which they are in contact and whose appearance is uniform in intensity/color. This paper introduces and analyzes a new 3D recognition/pose strategy based on DGI (Depth Gradient Images) models. After comparing it with current representative techniques, we can affirm that DGI has very interesting prospects.The DGI representation synthesizes both surface and contour information, thus avoiding restrictions concerning the layout and visibility of the objects in the scene. This paper first explains the key concepts of the DGI representation and shows the main properties of this method in comparison to a set of known techniques. The performance of this strategy in real scenes is then reported. Details are also presented of a wide set of experimental tests, including results under occlusion, performance with injected noise and experiments with cluttered scenes of a high level of complexity. 相似文献
10.
Yoshihisa Shinagawa 《IEEE transactions on pattern analysis and machine intelligence》2008,30(11):1891-1901
This paper presents novel homotopic image pseudo-invariants for face recognition based on pixelwise analysis. An exemplar face and test images are matched, and the most similar image is determined first. The homotopic image pseudo-invariants are calculated next to judge whether the most similar image is the same person as the exemplar. The proposed method can be applied to openset recognition. Recognition task can be performed with or without face databases, while the recognition rate is higher when a database is available. This fact facilitates the recognition of faces and various other objects on the Internet. We benchmark the method using FERET as well as the images downloaded from the Internet. 相似文献
11.
We present a ubiquitous system that combines context information, security mechanisms and a transport infrastructure to provide
authentication and secure transport of works of art. Authentication is provided for both auctions and exhibitions, where users
can use their own mobile devices to authenticate works of art. Transport is provided by a secure protocol that makes use of
position–time information and wireless sensors providing context information. The system has been used in several real case
studies in the context of the CUSPIS project and continues to be used as a commercial product for the transportation and exhibition
of cultural assets in Italy. 相似文献
12.
13.
Ioannis K. Brilakis Lucio Soibelman Yoshihisa Shinagawa 《Advanced Engineering Informatics》2006,20(4):443-452
The capability to automatically identify shapes, objects and materials from the image content through direct and indirect methodologies has enabled the development of several civil engineering related applications that assist in the design, construction and maintenance of construction projects. This capability is a product of the technological breakthroughs in the area of image processing that has allowed for the development of a large number of digital imaging applications in all industries. In this paper, an automated and content based construction site image retrieval method is presented. This method is based on image retrieval techniques, and specifically those related with material and object identification and matches known material samples with material clusters within the image content. The results demonstrate the suitability of this method for construction site image retrieval purposes and reveal the capability of existing image processing technologies to accurately identify a wealth of materials from construction site images. 相似文献
14.
15.
16.
A generalized neural reflectance (GNR) model for enhancing face recognition under variations in illumination and posture is
presented in this paper. Our work is based on training a number of synthesis images of each face taken at single lighting
direction with frontal/posture view. This way of synthesizing images can be used to build training cases for each face under
different known illumination conditions from which face recognition can be significantly improved. However, reconstructing
face shape may not easily be achieved and the human face images usually form by highly complex structure which suffers from
strong specular and unknown reflective conditions. In this paper, these limitations are addressed by Cho and Chow (IEEE Trans
Neural Netw 12(5):1204–1214, 2002). Face surfaces are recovered by this GNR model and face images in different poses are synthesized
to create a database for training. Our training algorithm assigns to recognize the face identity by similarity measure on
face features extracting first by the principle component analysis (PCA) method and then further processing by the Fisher’s
discrimination analysis (FDA) to acquire lower dimensional patterns. Experimental results conducted on the Yale Face Database
B show that lower error rates of classification and recognition are achieved under different variations in lighting and pose
and the performance significantly outperforms the recognition without using the proposed GNR model. 相似文献
17.
18.
19.
综合考虑了传统灰度共生矩阵法与基于广义图像灰度共生矩阵法各自的优点,提出了改进的基于广义图像灰度共生矩阵的图像检索方法。新方法构造了广义图像四个方向的灰度共生矩阵,并提取四个共生矩阵的纹理参数进行检索。实验结果表明,新方法对图像的旋转及尺寸变化具有更好的检索性能。 相似文献
20.