首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The comparison of digital images to determine their degree of similarity is one of the fundamental problems of computer vision. Many techniques exist which accomplish this with a certain level of success, most of which involve either the analysis of pixel-level features or the segmentation of images into sub-objects that can be geometrically compared. In this paper we develop and evaluate a new variation of the pixel feature and analysis technique known as the color correlogram in the context of a content-based image retrieval system. Our approach is to extend the autocorrelogram by adding multiple image features in addition to color. We compare the performance of each index scheme with our method for image retrieval on a large database of images. The experiment shows that our proposed method gives a significant improvement over histogram or color correlogram indexing, and it is also memory-efficient.
Peter YoonEmail:

Automatic audio content recognition has attracted an increasing attention for developing multimedia systems, for which the most popular approaches combine frame-based features with statistic models or discriminative classifiers. The existing methods are effective for clean single-source event detection but may not perform well for unstructured environmental sounds, which have a broad noise-like flat spectrum and a diverse variety of compositions. We present an automatic acoustic scene understanding framework that detects audio events through two hierarchies, acoustic scene recognition and audio event recognition, in which the former is preceded by following dominant audio sources and in turn helps infer non-dominant audio events within the same scene through modeling their occurrence correlations. On the scene recognition hierarchy, we perform adaptive segmentation and feature extraction for every input acoustic scene stream through Eigen-audiospace and an optimized feature subspace, respectively. After filtering background, scene streams are recognized by modeling the observation density of dominant features using a two-level hidden Markov model. On the audio event recognition hierarchy, scene knowledge is characterized by an audio context model that essentially describes the occurrence correlations of dominant and non-dominant audio events within this scene. Monte Carlo integration and gradient descent techniques are employed to maximize the likelihood and correctly tag each audio event. To the best of our knowledge, this is the first work that models event correlations as scene context for robust audio event detection from complex and noisy environments. Note that according to the recent report, the mean accuracy for the acoustic scene classification task by human listeners is only around 71 % on the data collected in office environments from the DCASE dataset. None of the existing methods performs well on all scene categories and the average accuracy of the best performances of the recent 11 methods is 53.8 %. The proposed method averagely achieves an accuracy of 62.3 % on the same dataset. Additionally, we create a 10-CASE dataset by manually collecting 5,250 audio clips of 10 scene types and 21 event categories. Our experimental results on 10-CASE show that the proposed method averagely achieves the enhanced performance of 78.3 %, and the average accuracy of audio event recognition can be effectively improved by capturing dominant audio sources and reasoning non-dominant events from the dominant ones through acoustic context modeling. In the future work, exploring the interactions between acoustic scene recognition and audio event detection, and incorporating other modalities to improve the accuracy are required to further advance the proposed framework.  相似文献   

With the development of digital devices and pressure sensing equipment, research into freehand sketches from touch-screen interfaces has increased significantly in recent years. As such, we provide the first comprehensive survey of recognition tasks based on sketch generation, freehand sketch classification, sketch-based image retrieval (SBIR), fine-grained sketch-based image retrieval (FG-SBIR), and sketch-based 3D shape image retrieval. Specifically, SBIR and FG-SBIR were the main focus of the survey. Primary technologies and benchmark datasets related to all sketch-based recognition topics are also discussed, along with future trends for this promising technology.  相似文献   

杨洋  平西建 《计算机应用》2006,26(2):419-0420
针对文本图像自动处理中对图标图形识别检索的应用需求,提出一种基于投影特征的图标图形识别检索方法。采用投影直方图表示图标的水平方向特征和垂直方向特征,并利用直方图的相关距离来度量图像的相似性程度,克服镜面对称变换对投影直方图的影响。实验表明,投影特征描述了图标的水平及垂直方向的空间分布信息,能够有效地进行识别检索。  相似文献   

Speech and speaker recognition systems are rapidly being deployed in real-world applications. In this paper, we discuss the details of a system and its components for indexing and retrieving multimedia content derived from broadcast news sources. The audio analysis component calls for real-time speech recognition for converting the audio to text and concurrent speaker analysis consisting of the segmentation of audio into acoustically homogeneous sections followed by speaker identification. The output of these two simultaneous processes is used to abstract statistics to automatically build indexes for text-based and speaker-based retrieval without user intervention. The real power of multimedia document processing is the possibility of Boolean queries in the form of combined text- and speaker-based user queries. Retrieval for such queries entails combining the results of individual text and speaker based searches. The underlying techniques discussed here can easily be extended to other speech-centric applications and transactions.  相似文献   

This paper presents a novel scheme for feature extraction, namely, the generalized two-dimensional Fisher's linear discriminant (G-2DFLD) method and its use for face recognition using multi-class support vector machines as classifier. The G-2DFLD method is an extension of the 2DFLD method for feature extraction. Like 2DFLD method, G-2DFLD method is also based on the original 2D image matrix. However, unlike 2DFLD method, which maximizes class separability either from row or column direction, the G-2DFLD method maximizes class separability from both the row and column directions simultaneously. To realize this, two alternative Fisher's criteria have been defined corresponding to row and column-wise projection directions. Unlike 2DFLD method, the principal components extracted from an image matrix in G-2DFLD method are scalars; yielding much smaller image feature matrix. The proposed G-2DFLD method was evaluated on two popular face recognition databases, the AT&T (formerly ORL) and the UMIST face databases. The experimental results using different experimental strategies show that the new G-2DFLD scheme outperforms the PCA, 2DPCA, FLD and 2DFLD schemes, not only in terms of computation times, but also for the task of face recognition using multi-class support vector machines (SVM) as classifier. The proposed method also outperforms some of the neural networks and other SVM-based methods for face recognition reported in the literature.  相似文献   

基于内容的图像检索(Content-based Image Retrieval,CBIR)以其极高的理论与应用价值成为了图像处理领域的研究热点。提取和匹配图像特征是CBIR的主要手段。然而提取图像的有效特征是极其困难的。利用HSV颜色空间特性以及人类对颜色的感知规律,提出一种颜色识别方法。应用此方法对图像的像素进行一种保持结构的分类,并在类内提取结构特征。图像的特征匹配将在同类像素集合间进行,降低了图像特征提取与匹配的复杂性。实验表明,提出的图像检索方法有良好的效果。  相似文献   

The intention of the strategy proposed in this paper is to solve the object retrieval problem in highly complex scenes using 3D information. In the worst case scenario the complexity of the scene includes several objects with irregular or free-form shapes, viewed from any direction, which are self-occluded or partially occluded by other objects with which they are in contact and whose appearance is uniform in intensity/color. This paper introduces and analyzes a new 3D recognition/pose strategy based on DGI (Depth Gradient Images) models. After comparing it with current representative techniques, we can affirm that DGI has very interesting prospects.The DGI representation synthesizes both surface and contour information, thus avoiding restrictions concerning the layout and visibility of the objects in the scene. This paper first explains the key concepts of the DGI representation and shows the main properties of this method in comparison to a set of known techniques. The performance of this strategy in real scenes is then reported. Details are also presented of a wide set of experimental tests, including results under occlusion, performance with injected noise and experiments with cluttered scenes of a high level of complexity.  相似文献   

This paper presents novel homotopic image pseudo-invariants for face recognition based on pixelwise analysis. An exemplar face and test images are matched, and the most similar image is determined first. The homotopic image pseudo-invariants are calculated next to judge whether the most similar image is the same person as the exemplar. The proposed method can be applied to openset recognition. Recognition task can be performed with or without face databases, while the recognition rate is higher when a database is available. This fact facilitates the recognition of faces and various other objects on the Internet. We benchmark the method using FERET as well as the images downloaded from the Internet.  相似文献   

We present a ubiquitous system that combines context information, security mechanisms and a transport infrastructure to provide authentication and secure transport of works of art. Authentication is provided for both auctions and exhibitions, where users can use their own mobile devices to authenticate works of art. Transport is provided by a secure protocol that makes use of position–time information and wireless sensors providing context information. The system has been used in several real case studies in the context of the CUSPIS project and continues to be used as a commercial product for the transportation and exhibition of cultural assets in Italy.  相似文献   

The capability to automatically identify shapes, objects and materials from the image content through direct and indirect methodologies has enabled the development of several civil engineering related applications that assist in the design, construction and maintenance of construction projects. This capability is a product of the technological breakthroughs in the area of image processing that has allowed for the development of a large number of digital imaging applications in all industries. In this paper, an automated and content based construction site image retrieval method is presented. This method is based on image retrieval techniques, and specifically those related with material and object identification and matches known material samples with material clusters within the image content. The results demonstrate the suitability of this method for construction site image retrieval purposes and reveal the capability of existing image processing technologies to accurately identify a wealth of materials from construction site images.  相似文献   

章建  李芳 《中文信息学报》2015,29(2):179-189
自动挖掘大规模语料中的语义信息以及演化关系近年来已受到广大专家学者的关注。话题被认为是文档集合中的潜在语义信息,话题演化用于研究话题内容随时间的变化。该文提出了一种基于上下文的话题演化和话题关系抽取方法。分析发现,一个话题常和某些其他话题共现在多篇文档中,话题间的这种共现信息被称为话题的上下文。上下文信息可以用于计算同时间段话题间的语义关系以及识别不同时间段中具有相同语义的话题。该文对2008年~2012年两会报告以及2007年~2011年NIPS科技文献进行实验,通过人工分析,利用话题的上下文信息,不但可以提高话题演化的正确率,而且还能挖掘话题之间的语义关系,在话题演化的基础上,显示话题关系的演化。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号