首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
一种基于内容相关性的跨媒体检索方法   总被引:12,自引:0,他引:12  
针对传统基于内容的多媒体检索对单一模态的限制,提出一种新的跨媒体检索方法.分析了不同模态的内容特征之间在统计意义上的典型相关性,并通过子空间映射解决了特征向量的异构性问题,同时结合相关反馈中的先验知识,修正不同模态多媒体数据集在子空间中的拓扑结构,实现跨媒体相关性的准确度量.实验以图像和音频数据为例验证了基于相关性学习的跨媒体检索方法的有效性.  相似文献   

2.
Qi  Jinwei  Huang  Xin  Peng  Yuxin 《Multimedia Tools and Applications》2017,76(23):25109-25127

As a highlighting research topic in the multimedia area, cross-media retrieval aims to capture the complex correlations among multiple media types. Learning better shared representation and distance metric for multimedia data is important to boost the cross-media retrieval. Motivated by the strong ability of deep neural network in feature representation and comparison functions learning, we propose the Unified Network for Cross-media Similarity Metric (UNCSM) to associate cross-media shared representation learning with distance metric in a unified framework. First, we design a two-pathway deep network pretrained with contrastive loss, and employ double triplet similarity loss for fine-tuning to learn the shared representation for each media type by modeling the relative semantic similarity. Second, the metric network is designed for effectively calculating the cross-media similarity of the shared representation, by modeling the pairwise similar and dissimilar constraints. Compared to the existing methods which mostly ignore the dissimilar constraints and only use sample distance metric as Euclidean distance separately, our UNCSM approach unifies the representation learning and distance metric to preserve the relative similarity as well as embrace more complex similarity functions for further improving the cross-media retrieval accuracy. The experimental results show that our UNCSM approach outperforms 8 state-of-the-art methods on 4 widely-used cross-media datasets.

  相似文献   

3.
跨媒体相关性推理与检索研究   总被引:1,自引:0,他引:1  
针对不同模态的多媒体数据之间难以度量跨媒体相关性的问题,提出了一种基于相关性推理的跨媒体检索方法,首先从相同模态内部(intra-media)的相似性和不同模态之间(cross-media)的相关性两个方面进行分析和量化,然后构造跨媒体关联图将相似性和相关性学习结果进行统一表达,以跨媒体关联图的最短路径为基础进行跨媒体检索,并提出相关反馈算法将用户交互中的先验知识融入到跨媒体关联图中,有效提高了跨媒体检索效率.该方法可以应用于针对用户提交查询样例的不同模态交叉检索系统.  相似文献   

4.
为了在半监督情境下利用多视图特征中的信息提升分类性能,通过最小化输入特征向量的局部重构误差为以输入特征向量为顶点构建的图学习合适的边权重,将其用于半监督学习。通过将最小化输入特征向量的局部重构误差捕获到的输入数据的流形结构应用于半监督学习,有利于提升半监督学习中标签预测的准确性。对于训练样本图像的多视图特征的使用问题,借助于改进的典型相关分析技术学习更具鉴别性的多视图特征,将其有效融合并用于图像分类任务。实验结果表明,该方法能够在半监督情境下充分地挖掘训练样本的多视图特征表示的鉴别信息,有效地完成鉴别任务。  相似文献   

5.
代刚  张鸿 《计算机应用》2018,38(9):2529-2534
针对如何挖掘不同模态中具有相同语义的特征数据之间的内在相关性的问题,提出了一种基于语义相关性与拓扑关系(SCTR)的跨媒体检索算法。一方面,利用具有相同语义的多媒体数据之间的潜在相关性去构造多媒体语义相关超图;另一方面,挖掘多媒体数据的拓扑关系来构建多媒体近邻关系超图。通过结合多媒体数据语义相关性与拓扑关系去为每种媒体类型学习一个最优的投影矩阵,然后将多媒体数据的特征向量投影到一个共同空间,从而实现跨媒体检索。该算法在XMedia数据集上,对多项跨媒体检索任务的平均查准率为51.73%,与联合图正则化的异构度量学习(JGRHML)、跨模态相关传播(CMCP)、近邻的异构相似性度量(HSNN)、共同的表示学习(JRL)算法相比,分别提高了22.73、15.23、11.7、9.11个百分点。实验结果从多方面证明了该算法有效提高了跨媒体检索的平均查准率。  相似文献   

6.
Cross-media heterogeneous transfer learning aims to transfer knowledge from the source media domain to the target media domain, which promotes the performance of the learned model for the target media domain. Existing cross-media heterogeneous transfer learning methods usually attempt to learn the latent feature space with a large amount of co-occurrence data. However, there is a significant challenge: domain over-adaption. In this paper, we propose a Cross-Media Heterogeneous Transfer Learning for Preventing Over-adaption (CMHTL-PO) to address this challenge. The divergence between the different media feature spaces is very large. Each media space has some weak correlation features which have no semantic corresponding features in other media. When the co-occurrence data are not enough, if the weak correlation features are compulsively mapped into the common features in the latent space, it will lead to over-adaption. CMHTL-PO divides the features into the strong correlation features and the weak correlation features, which are respectively mapped into the common features and the peculiar features in the latent space. Extensive experiments are conducted on two benchmark datasets widely adopted in transfer learning to verify the superiority of our proposed CMHTL-PO over existing state-of-the-art Heterogeneous Transfer Learning methods.  相似文献   

7.
Although multimedia objects such as images, audios and texts are of different modalities, there are a great amount of semantic correlations among them. In this paper, we propose a method of transductive learning to mine the semantic correlations among media objects of different modalities so that to achieve the cross-media retrieval. Cross-media retrieval is a new kind of searching technology by which the query examples and the returned results can be of different modalities, e.g., to query images by an example of audio. First, according to the media objects features and their co-existence information, we construct a uniform cross-media correlation graph, in which media objects of different modalities are represented uniformly. To perform the cross-media retrieval, a positive score is assigned to the query example; the score spreads along the graph and media objects of target modality or MMDs with the highest scores are returned. To boost the retrieval performance, we also propose different approaches of long-term and short-term relevance feedback to mine the information contained in the positive and negative examples.  相似文献   

8.
Correlated information between multiple views can provide useful information for building robust classifiers. One way to extract correlated features from different views is using canonical correlation analysis (CCA). However, CCA is an unsupervised method and can not preserve discriminant information in feature extraction. In this paper, we first incorporate discriminant information into CCA by using random cross-view correlations between within-class examples. Because of the random property, we can construct a lot of feature extractors based on CCA and random correlation. So furthermore, we fuse those feature extractors and propose a novel method called random correlation ensemble (RCE) for multi-view ensemble learning. We compare RCE with existing multi-view feature extraction methods including CCA and discriminant CCA (DCCA) which use all cross-view correlations between within-class examples, as well as the trivial ensembles of CCA and DCCA which adopt standard bagging and boosting strategies for ensemble learning. Experimental results on several multi-view data sets validate the effectiveness of the proposed method.  相似文献   

9.
Cross-media retrieval returns heterogeneous multimedia data of the same semantics for a query object, and the key problem for cross-media retrieval is how to deal with the correlations of heterogeneous multimedia data. Many works focus on mapping different modal data into an isomorphic space, so the similarities between different modal data can be measured. Inspired by this idea, we propose a joint graph regularization based modality-dependent cross-media retrieval approach (JGRMDCR), which takes into account the one-to-one correspondence between different modal data pairs, the inter-modality similarities and the intra-modality similarities. Meanwhile, according to the modality of the query object, this method learns different projection matrices for different retrieval tasks. Experimental results on benchmark datasets show that the proposed approach outperforms the other state-of-the-art algorithms.  相似文献   

10.
行人重识别问题是计算机视觉的重要研究内容之一,旨在将多个非重叠相机中的目标行人准确加以识别。当将某摄像机中的行人图像视为目标行人在该摄像机视图上的一种表示时,行人重识别可被认为是一种多视图学习问题。在此基础上提出的基于典型相关分析的行人重识别算法仅是一种线性降维算法,很难从复杂的重识别系统(如目标行人图像受低分辨率、光照及行人姿态变化等因素影响)中提取有效的高层语义信息,用于行人重识别。为此,本文提出了一种基于稀疏学习的行人重识别算法(Sparsity learning based person re-identification,SLR)。SLR首先通过稀疏学习获取目标行人在每一相机视图上的高层语义表示,然后将高层特征映射到一个公共的隐空间,使不同视图间的特征距离可比较。SLR算法的优点在于通过学习鲁棒的行人图像特征表示,能够获得更具判别性的公共隐空间,以提高算法的行人重识别性能。在VIPeR、CUHK数据集上的实验结果表明了本文算法的有效性。  相似文献   

11.
黎曼流形上的保局投影在图像集匹配中的应用   总被引:1,自引:1,他引:0       下载免费PDF全文
目的提出了黎曼流形上局部结构特征保持的图像集匹配方法。方法该方法使用协方差矩阵建模图像集合,利用对称正定的非奇异协方差矩阵构成黎曼流形上的子空间,将图像集的匹配转化为流形上的点的匹配问题。通过基于协方差矩阵度量学习的核函数将黎曼流形上的协方差矩阵映射到欧几里德空间。不同于其他方法黎曼流形上的鉴别分析方法,考虑到样本分布的局部几何结构,引入了黎曼流形上局部保持的图像集鉴别分析方法,保持样本分布的局部邻域结构的同时提升样本的可分性。结果在基于图像集合的对象识别任务上测试了本文算法,在ETH80和YouTube Celebrities数据库分别进行了对象识别和人脸识别实验,分别达到91.5%和65.31%的识别率。结论实验结果表明,该方法取得了优于其他图像集匹配算法的效果。  相似文献   

12.
目的 跨媒体检索旨在以任意媒体数据检索其他媒体的相关数据,实现图像、文本等不同媒体的语义互通和交叉检索。然而,"异构鸿沟"导致不同媒体数据的特征表示不一致,难以实现语义关联,使得跨媒体检索面临巨大挑战。而描述同一语义的不同媒体数据存在语义一致性,且数据内部蕴含着丰富的细粒度信息,为跨媒体关联学习提供了重要依据。现有方法仅仅考虑了不同媒体数据之间的成对关联,而忽略了数据内细粒度局部之间的上下文信息,无法充分挖掘跨媒体关联。针对上述问题,提出基于层级循环注意力网络的跨媒体检索方法。方法 首先提出媒体内-媒体间两级循环神经网络,其中底层网络分别建模不同媒体内部的细粒度上下文信息,顶层网络通过共享参数的方式挖掘不同媒体之间的上下文关联关系。然后提出基于注意力的跨媒体联合损失函数,通过学习媒体间联合注意力来挖掘更加精确的细粒度跨媒体关联,同时利用语义类别信息增强关联学习过程中的语义辨识能力,从而提升跨媒体检索的准确率。结果 在2个广泛使用的跨媒体数据集上,与10种现有方法进行实验对比,并采用平均准确率均值MAP作为评价指标。实验结果表明,本文方法在2个数据集上的MAP分别达到了0.469和0.575,超过了所有对比方法。结论 本文提出的层级循环注意力网络模型通过挖掘图像和文本的细粒度信息,能够充分学习图像和文本之间精确跨媒体关联关系,有效地提高了跨媒体检索的准确率。  相似文献   

13.
Subspace and similarity metric learning are important issues for image and video analysis in the scenarios of both computer vision and multimedia fields. Many real-world applications, such as image clustering/labeling and video indexing/retrieval, involve feature space dimensionality reduction as well as feature matching metric learning. However, the loss of information from dimensionality reduction may degrade the accuracy of similarity matching. In practice, such basic conflicting requirements for both feature representation efficiency and similarity matching accuracy need to be appropriately addressed. In the style of “Thinking Globally and Fitting Locally”, we develop Locally Embedded Analysis (LEA) based solutions for visual data clustering and retrieval. LEA reveals the essential low-dimensional manifold structure of the data by preserving the local nearest neighbor affinity, and allowing a linear subspace embedding through solving a graph embedded eigenvalue decomposition problem. A visual data clustering algorithm, called Locally Embedded Clustering (LEC), and a local similarity metric learning algorithm for robust video retrieval, called Locally Adaptive Retrieval (LAR), are both designed upon the LEA approach, with variations in local affinity graph modeling. For large size database applications, instead of learning a global metric, we localize the metric learning space with kd-tree partition to localities identified by the indexing process. Simulation results demonstrate the effective performance of proposed solutions in both accuracy and speed aspects.  相似文献   

14.
15.
当前,以网络数据为代表的跨媒体数据呈现爆炸式增长的趋势,呈现出了跨模态、跨数据源的复杂关联及动态演化特性,跨媒体分析与推理技术针对多模态信息理解、交互、内容管理等需求,通过构建跨模态、跨平台的语义贯通与统一表征机制,进一步实现分析和推理以及对复杂认知目标的不断逼近,建立语义层级的逻辑推理机制,最终实现跨媒体类人智能推理...  相似文献   

16.
Zhang  Hong  Huang  Yu  Xu  Xin  Zhu  Ziqi  Deng  Chunhua 《Multimedia Tools and Applications》2018,77(3):3353-3368

Due to the rapid development of multimedia applications, cross-media semantics learning is becoming increasingly important nowadays. One of the most challenging issues for cross-media semantics understanding is how to mine semantic correlation between different modalities. Most traditional multimedia semantics analysis approaches are based on unimodal data cases and neglect the semantic consistency between different modalities. In this paper, we propose a novel multimedia representation learning framework via latent semantic factorization (LSF). First, the posterior probability under the learned classifiers is served as the latent semantic representation for different modalities. Moreover, we explore the semantic representation for a multimedia document, which consists of image and text, by latent semantic factorization. Besides, two projection matrices are learned to project images and text into a same semantic space which is more similar with the multimedia document. Experiments conducted on three real-world datasets for cross-media retrieval, demonstrate the effectiveness of our proposed approach, compared with state-of-the-art methods.

  相似文献   

17.
为了有效利用多视图数据信息提升监督特征选择的性能,构建了一种结构化多视 图稀疏限定,并基于该稀疏限定提出了一种监督特征选择方法,即结构化多视图监督特征选择 方法(SMSFS)。该方法在特征选择过程中能够同时考虑不同视图特征的重要性以及同一视图中 不同特征的重要性,从而有效的结合多视图数据信息,提升监督特征选择的性能。SMSFS 目标 函数是非凸的,设计了一个有效的迭代算法对目标函数进行求解。将所提结构化多视图监督特 征选择方法 SMSFS 应用到了图像标注任务,在 NUS-WIDE 和 MSRA-MM2.0 图像数据库上进 行了实验,并与其他特征选择算法进行了比较,实验结果表明该算法能够有效结合多视图数据 信息,提升特征选择性能。  相似文献   

18.
回顾跨媒体智能的发展历程,分析跨媒体智能的新趋势与现实瓶颈,展望跨媒体智能的未来前景。跨媒体智能旨在融合多来源、多模态数据,并试图利用不同媒体数据间的关系进行高层次语义理解与逻辑推理。现有跨媒体算法主要遵循了单媒体表达到多媒体融合的范式,其中特征学习与逻辑推理两个过程相对割裂,无法综合多源多层次的语义信息以获得统一特征,阻碍了推理和学习过程的相互促进和修正。这类范式缺乏显式知识积累与多级结构理解的过程,同时限制了模型可信度与鲁棒性。在这样的背景下,本文转向一种新的智能表达方式——视觉知识。以视觉知识驱动的跨媒体智能具有多层次建模和知识推理的特点,并易于进行视觉操作与重建。本文介绍了视觉知识的3个基本要素,即视觉概念、视觉关系和视觉推理,并对每个要素展开详细讨论与分析。视觉知识有助于实现数据与知识驱动的统一框架,学习可归因可溯源的结构化表达,推动跨媒体知识关联与智能推理。视觉知识具有强大的知识抽象表达能力和多重知识互补能力,为跨媒体智能进化提供了新的有力支点。  相似文献   

19.
Wu  Yue  Wang  Can  Zhang  Yue-qing  Bu  Jia-jun 《浙江大学学报:C卷英文版》2019,20(4):538-553

Feature selection has attracted a great deal of interest over the past decades. By selecting meaningful feature subsets, the performance of learning algorithms can be effectively improved. Because label information is expensive to obtain, unsupervised feature selection methods are more widely used than the supervised ones. The key to unsupervised feature selection is to find features that effectively reflect the underlying data distribution. However, due to the inevitable redundancies and noise in a dataset, the intrinsic data distribution is not best revealed when using all features. To address this issue, we propose a novel unsupervised feature selection algorithm via joint local learning and group sparse regression (JLLGSR). JLLGSR incorporates local learning based clustering with group sparsity regularized regression in a single formulation, and seeks features that respect both the manifold structure and group sparse structure in the data space. An iterative optimization method is developed in which the weights finally converge on the important features and the selected features are able to improve the clustering results. Experiments on multiple real-world datasets (images, voices, and web pages) demonstrate the effectiveness of JLLGSR.

  相似文献   

20.
Multi-view learning studies how several views, different feature representations, of the same objects could be best utilized in learning. In other words, multi-view learning is analysis of co-occurrence data, where the observations are co-occurrences of samples in the views. Standard multi-view learning such as joint density modeling cannot be done in the absence of co-occurrence, when the views are observed separately and the identities of objects are not known. As a practical example, joint analysis of mRNA and protein concentrations requires mapping between genes and proteins. We introduce a data-driven approach for learning the correspondence of the observations in the different views, in order to enable joint analysis also in the absence of known co-occurrence. The method finds a matching that maximizes statistical dependency between the views, which is particularly suitable for multi-view methods such as canonical correlation analysis which has the same objective. We apply the method to translational metabolomics, to identify differences and commonalities in metabolic processes in different species or tissues. The metabolite identities and roles in the different species are not generally known, and it is necessary to search for a matching. In this paper we show, using different metabolomics measurement batches as the views so that the ground truth is known, that the metabolite identities can be reliably matched by a consensus of several matching solutions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号