首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT

With the rapid growing of remotely sensed imagery data, there is a high demand for effective and efficient image retrieval tools to manage and exploit such data. In this letter, we present a novel content-based remote sensing image retrieval (RSIR) method based on Triplet deep metric learning convolutional neural network (CNN). By constructing a Triplet network with metric learning objective function, we extract the representative features of the images in a semantic space in which images from the same class are close to each other while those from different classes are far apart. In such a semantic space, simple metric measures such as Euclidean distance can be used directly to compare the similarity of images and effectively retrieve images of the same class. We also investigate a supervised and an unsupervised learning methods for reducing the dimensionality of the learned semantic features. We present comprehensive experimental results on two public RSIR datasets and show that our method significantly outperforms state-of-the-art.  相似文献   

2.
In content-based image retrieval (CBIR), relevance feedback has been proven to be a powerful tool for bridging the gap between low level visual features and high level semantic concepts. Traditionally, relevance feedback driven CBIR is often considered as a supervised learning problem where the user provided feedbacks are used to learn a distance metric or classification function. However, CBIR is intrinsically a semi-supervised learning problem in which the testing samples (images in the database) are present during the learning process. Moreover, when there are no sufficient feedbacks, these methods may suffer from the overfitting problem. In this paper, we propose a novel neighborhood preserving regression algorithm which makes efficient use of both labeled and unlabeled images. By using the unlabeled images, the geometrical structure of the image space can be incorporated into the learning system through a regularizer. Specifically, from all the functions which minimize the empirical loss on the labeled images, we select the one which best preserves the local neighborhood structure of the image space. In this way, our method can obtain a regression function which respects both semantic and geometrical structures of the image database. We present experimental evidence suggesting that our algorithm is able to use unlabeled data effectively for image retrieval.  相似文献   

3.
一种基于稀疏典型性相关分析的图像检索方法   总被引:1,自引:0,他引:1  
庄凌  庄越挺  吴江琴  叶振超  吴飞 《软件学报》2012,23(5):1295-1304
图像语义检索的一个关键问题就是要找到图像底层特征与语义之间的关联,由于文本是表达语义的一种有效手段,因此提出通过研究文本与图像两种模态之间关系来构建反映两者间潜在语义关联的有效模型的思路,基于该模型,可使用自然语言形式(文本语句)来表达检索意图,最终检索到相关图像.该模型基于稀疏典型性相关分析(sparse canonical correlation analysis,简称sparse CCA),按照如下步骤训练得到:首先利用隐语义分析方法构造文本语义空间,然后以视觉词袋(bag of visual words)来表达文本所对应的图像,最后通过Sparse CCA算法找到一个语义相关空间,以实现文本语义与图像视觉单词间的映射.使用稀疏的相关性分析方法可以提高模型可解释性和保证检索结果稳定性.实验结果验证了Sparse CCA方法的有效性,同时也证实了所提出的图像语义检索方法的可行性.  相似文献   

4.
Nowadays, due to the rapid growth of digital technologies, huge volumes of image data are created and shared on social media sites. User-provided tags attached to each social image are widely recognized as a bridge to fill the semantic gap between low-level image features and high-level concepts. Hence, a combination of images along with their corresponding tags is useful for intelligent retrieval systems, those are designed to gain high-level understanding from images and facilitate semantic search. However, user-provided tags in practice are usually incomplete and noisy, which may degrade the retrieval performance. To tackle this problem, we present a novel retrieval framework that automatically associates the visual content with textual tags and enables effective image search. To this end, we first propose a probabilistic topic model learned on social images to discover latent topics from the co-occurrence of tags and image features. Moreover, our topic model is built by exploiting the expert knowledge about the correlation between tags with visual contents and the relationship among image features that is formulated in terms of spatial location and color distribution. The discovered topics then help to predict missing tags of an unseen image as well as the ones partially labeled in the database. These predicted tags can greatly facilitate the reliable measure of semantic similarity between the query and database images. Therefore, we further present a scoring scheme to estimate the similarity by fusing textual tags and visual representation. Extensive experiments conducted on three benchmark datasets show that our topic model provides the accurate annotation against the noise and incompleteness of tags. Using our generalized scoring scheme, which is particularly advantageous to many types of queries, the proposed approach also outperforms state-of-the-art approaches in terms of retrieval accuracy.  相似文献   

5.
结合流形学习和相关反馈技术的图像检索方法关键是结合低层可视化信息,从少量用户反馈信息中学习用户语义,以获得语义子空间流形。为获得更真实的语义子空间,文中在区分对待低层可视化和用户反馈信息的同时,基于低层可视化信息选择学习反馈信息中的类内和类间关系,提出一种选择关系嵌入算法应用于图像检索。该方法可保留更真实的语义流形结构,从而提高在低维空间中的检索精度。实验结果表明文中方法可将图像映射到更广范围的低维空间,在反馈迭代两次之后检索精度提高最高可达16。3%。  相似文献   

6.
We propose to combine short‐term block‐based fuzzy support vector machine (FSVM) learning and long‐term dynamic semantic clustering (DSC) learning to bridge the semantic gap in content‐based image retrieval. The short‐term learning addresses the small sample problem by incorporating additional image blocks to enlarge the training set. Specifically, it applies the nearest neighbor mechanism to choose additional similar blocks. A fuzzy metric is computed to measure the fidelity of the actual class information of the additional blocks. The FSVM is finally applied on the enlarged training set to learn a more accurate decision boundary for classifying images. The long‐term learning addresses the large storage problem by building dynamic semantic clusters to remember the semantics learned during all query sessions. Specifically, it applies a cluster‐image weighting algorithm to find the images most semantically related to the query. It then applies a DSC technique to adaptively learn and update the semantic categories. Our extensive experimental results demonstrate that the proposed short‐term, long‐term, and collaborative learning methods outperform their peer methods when the erroneous feedback resulting from the inherent subjectivity of judging relevance, user laziness, or maliciousness is involved. The collaborative learning system achieves better retrieval precision and requires significantly less storage space than its peers. © 2011 Wiley Periodicals, Inc.  相似文献   

7.
With the rapid development of location-based social networks (LBSNs), more and more media data are unceasingly uploaded by users. The asynchrony between the visual and textual information has made it extremely difficult to manage the multimodal information for manual annotation-free retrieval and personalized recommendation. Consequently the automated image semantic discovery of multimedia location-related user-generated contents (UGCs) for user experience has become mandatory. Most of the literatures leverage single-modality data or correlated multimedia data for image semantic detection. However, the intrinsically heterogeneous UGCs in LBSNs are usually independent and uncorrelated. It is hard to build correlation between textual information and visual information. In this paper, we propose a cross-domain semantic modeling method for automatic image annotation for visual information from social network platforms. First, we extract a set of hot topics from the collected textual information for image dataset preparation. Then the proposed noisy sample filtering is implemented to remove low-relevance photos. Finally, we leverage cross-domain datasets to discover the common knowledge of each semantic concept from UGCs and boost the performance of semantic annotation by semantic transfer. The comparison experiments on cross-domain datasets were conducted to demonstrate the superiority of the proposed method.  相似文献   

8.
In content-based image retrieval (CBIR), relevant images are identified based on their similarities to query images. Most CBIR algorithms are hindered by the semantic gap between the low-level image features used for computing image similarity and the high-level semantic concepts conveyed in images. One way to reduce the semantic gap is to utilize the log data of users' feedback that has been collected by CBIR systems in history, which is also called “collaborative image retrieval.” In this paper, we present a novel metric learning approach, named “regularized metric learning,” for collaborative image retrieval, which learns a distance metric by exploring the correlation between low-level image features and the log data of users' relevance judgments. Compared to the previous research, a regularization mechanism is used in our algorithm to effectively prevent overfitting. Meanwhile, we formulate the proposed learning algorithm into a semidefinite programming problem, which can be solved very efficiently by existing software packages and is scalable to the size of log data. An extensive set of experiments has been conducted to show that the new algorithm can substantially improve the retrieval accuracy of a baseline CBIR system using Euclidean distance metric, even with a modest amount of log data. The experiment also indicates that the new algorithm is more effective and more efficient than two alternative algorithms, which exploit log data for image retrieval.  相似文献   

9.
Extended from the traditional pure statistical learning methods, we propose to augment the statistical learning methods with ontology and apply this idea for image attribute learning. In order to capture structural information among attributes, the graph-guided fused lasso model is adopted and improved by a new distance metric based on WordNet. The novelty of our method is that we find the semantic correlation with the ontology-guided attribute space and integrate inter-attribute similarity information into the learning model. The hierarchy of ImageNet is exploited to define the image attributes and a dataset from ImageNet including over 30,000 images is collected. The experimental results show that this method can both improve the accuracy and accelerate the algorithm convergency. Moreover, the learned semantic correlation owns transfer ability to related applications.  相似文献   

10.
田枫  沈旭昆 《软件学报》2013,24(10):2405-2418
真实环境下数据集中广泛存在着标签噪声问题,数据集的弱标签性已严重阻碍了图像语义标注的实用化进程.针对弱标签数据集中的标签不准确、不完整和语义分布失衡现象,提出了一种适用于弱标签数据集的图像语义标注方法.首先,在视觉内容与标签语义的一致性约束、标签相关性约束和语义稀疏性约束下,通过直推式学习填充样本标签,构建样本的近似语义平衡邻域.鉴于邻域中存在噪声干扰,通过多标签语义嵌入的邻域最大边际学习获得距离测度和图像语义的一致性,使得近邻处于同一语义子空间.然后,以近邻为局部坐标基,通过邻域非负稀疏编码获得目标图像和近邻的部分相关性,并构建局部语义一致邻域.以邻域内的语义近邻为指导并结合语境相关信息,进行迭代式降噪与标签预测.实验结果表明了方法的有效性.  相似文献   

11.
Tag recommendation encourages users to add more tags in bridging the semantic gap between human concept and the features of media object,which provides a feasible solution for content-based multimedia information retrieval.In this paper,we study personalized tag recommendation in a popular online photo sharing site - Flickr.Social relationship information of users is collected to generate an online social network.From the perspective of network topology,we propose node topological potential to characterize user’s social influence.With this metric,we distinguish different social relations between users and find out those who really have influence on the target users.Tag recommendations are based on tagging history and the latent personalized preference learned from those who have most influence in user’s social network.We evaluate our method on large scale real-world data.The experimental results demonstrate that our method can outperform the non-personalized global co-occurrence method and other two state-of-the-art personalized approaches using social networks.We also analyze the further usage of our approach for the cold-start problem of tag recommendation.  相似文献   

12.
13.
Social media networks contain both content and context-specific information. Most existing methods work with either of the two for the purpose of multimedia mining and retrieval. In reality, both content and context information are rich sources of information for mining, and the full power of mining and processing algorithms can be realized only with the use of a combination of the two. This paper proposes a new algorithm which mines both context and content links in social media networks to discover the underlying latent semantic space. This mapping of the multimedia objects into latent feature vectors enables the use of any off-the-shelf multimedia retrieval algorithms. Compared to the state-of-the-art latent methods in multimedia analysis, this algorithm effectively solves the problem of sparse context links by mining the geometric structure underlying the content links between multimedia objects. Specifically for multimedia annotation, we show that an effective algorithm can be developed to directly construct annotation models by simultaneously leveraging both context and content information based on latent structure between correlated semantic concepts. We conduct experiments on the Flickr data set, which contains user tags linked with images. We illustrate the advantages of our approach over the state-of-the-art multimedia retrieval techniques.  相似文献   

14.
Learning Social Tag Relevance by Neighbor Voting   总被引:2,自引:0,他引:2  
Social image analysis and retrieval is important for helping people organize and access the increasing amount of user tagged multimedia. Since user tagging is known to be uncontrolled, ambiguous, and overly personalized, a fundamental problem is how to interpret the relevance of a user-contributed tag with respect to the visual content the tag is describing. Intuitively, if different persons label visually similar images using the same tags, these tags are likely to reflect objective aspects of the visual content. Starting from this intuition, we propose in this paper a neighbor voting algorithm which accurately and efficiently learns tag relevance by accumulating votes from visual neighbors. Under a set of well-defined and realistic assumptions, we prove that our algorithm is a good tag relevance measurement for both image ranking and tag ranking. Three experiments on 3.5 million Flickr photos demonstrate the general applicability of our algorithm in both social image retrieval and image tag suggestion. Our tag relevance learning algorithm substantially improves upon baselines for all the experiments. The results suggest that the proposed algorithm is promising for real-world applications.  相似文献   

15.
Mining multi-tag association for image tagging   总被引:1,自引:0,他引:1  
Automatic media tagging plays a critical role in modern tag-based media retrieval systems. Existing tagging schemes mostly perform tag assignment based on community contributed media resources, where the tags are provided by users interactively. However, such social resources usually contain dirty and incomplete tags, which severely limit the performance of these tagging methods. In this paper, we propose a novel automatic image tagging method aiming to automatically discover more complete tags associated with information importance for test images. Given an image dataset, all the near-duplicate clusters are discovered. For each near-duplicate cluster, all the tags occurring in the cluster form the cluster’s “document”. Given a test image, we firstly initialize the candidate tag set from its near-duplicate cluster’s document. The candidate tag set is then expanded by considering the implicit multi-tag associations mined from all the clusters’ documents, where each cluster’s document is regarded as a transaction. To further reduce noisy tags, a visual relevance score is also computed for each candidate tag to the test image based on a new tag model. Tags with very low scores can be removed from the final tag set. Extensive experiments conducted on a real-world web image dataset—NUS-WIDE, demonstrate the promising effectiveness of our approach.  相似文献   

16.
This paper presents an approach to image understanding on the aspect of unsupervised scene segmentation. With the goal of image understanding in mind, we consider ‘unsupervised scene segmentation’ a task of dividing a given image into semantically meaningful regions without using annotation or other human-labeled information. We seek to investigate how well an algorithm can achieve at partitioning an image with limited human-involved learning procedures. Specifically, we are interested in developing an unsupervised segmentation algorithm that only relies on the contextual prior learned from a set of images. Our algorithm incorporates a small set of images that are similar to the input image in their scene structures. We use the sparse coding technique to analyze the appearance of this set of images; the effectiveness of sparse coding allows us to derive a priori the context of the scene from the set of images. Gaussian mixture models can then be constructed for different parts of the input image based on the sparse-coding contextual prior, and can be combined into an Markov-random-field-based segmentation process. The experimental results show that our unsupervised segmentation algorithm is able to partition an image into semantic regions, such as buildings, roads, trees, and skies, without using human-annotated information. The semantic regions generated by our algorithm can be useful, as pre-processed inputs for subsequent classification-based labeling algorithms, in achieving automatic scene annotation and scene parsing.  相似文献   

17.
18.
Subspace and similarity metric learning are important issues for image and video analysis in the scenarios of both computer vision and multimedia fields. Many real-world applications, such as image clustering/labeling and video indexing/retrieval, involve feature space dimensionality reduction as well as feature matching metric learning. However, the loss of information from dimensionality reduction may degrade the accuracy of similarity matching. In practice, such basic conflicting requirements for both feature representation efficiency and similarity matching accuracy need to be appropriately addressed. In the style of “Thinking Globally and Fitting Locally”, we develop Locally Embedded Analysis (LEA) based solutions for visual data clustering and retrieval. LEA reveals the essential low-dimensional manifold structure of the data by preserving the local nearest neighbor affinity, and allowing a linear subspace embedding through solving a graph embedded eigenvalue decomposition problem. A visual data clustering algorithm, called Locally Embedded Clustering (LEC), and a local similarity metric learning algorithm for robust video retrieval, called Locally Adaptive Retrieval (LAR), are both designed upon the LEA approach, with variations in local affinity graph modeling. For large size database applications, instead of learning a global metric, we localize the metric learning space with kd-tree partition to localities identified by the indexing process. Simulation results demonstrate the effective performance of proposed solutions in both accuracy and speed aspects.  相似文献   

19.
张杰  郭小川  金城  陆伟 《计算机工程》2011,37(4):230-231
在基于内容的图像检索和分类系统中,图像的底层特征和高层语义之间存在着语义鸿沟,有效减小语义鸿沟是一个需要广泛研究的问题。为此,提出一种基于特征互补率矩阵的图像分类方法,该方法通过计算视觉特征互补率矩阵进而指导融合特征集的选择,利用测度学习算法得到一个合适的距离测度以反映图像高层语义的相似度。实验结果表明,该方法能有效提高图像分类精度。  相似文献   

20.
基于语义学习的图像多模态检索   总被引:1,自引:0,他引:1  
针对语义鸿沟问题,在语义学习的基础上设计图像的多模态检索系统。该系统结合3种查询方式进行图像检索。基于视觉特征的查询通过特征提取与相似度匹配进行排位。基于标签的查询建立在图像自动标注的基础上,但在语义空间之外的泛化能力较差。基于语义图例的查询能够在很大程度上克服这个缺陷,通过在显式或隐式的语义空间上进行查询,使检索结果更符合人类感知。实验结果表明,与基于纹理特征的图像检索相比,基于语义图例的检索具有更高的精度及召回率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号