首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
由于多模态数据的快速增长,跨模态检索受到了研究者的广泛关注,其将一种模态的数据作为查询条件检索其他模态的数据,如用户可以用文本检索图像或/和视频。由于查询及其检索结果模态表征的差异,如何度量不同模态之间的相似性是跨模态检索的主要挑战。随着深度学习技术的推广及其在计算机视觉、自然语言处理等领域的显著成果,研究者提出了一系列以深度学习为基础的跨模态检索方法,极大缓解了不同模态间相似性度量的挑战,本文称之为深度跨模态检索。本文从以下角度综述有代表性的深度跨模态检索论文,基于所提供的跨模态信息将这些方法分为3类:基于跨模态数据间一一对应的、基于跨模态数据间相似度的以及基于跨模态数据语义标注的深度跨模态检索。一般来说,上述3类方法提供的跨模态信息呈现递增趋势,且提供学习的信息越多,跨模态检索性能越优。在上述不同类别下,涵盖了7类主流技术,即典型相关分析、一一对应关系保持、度量学习、似然分析、学习排序、语义预测以及对抗学习。不同类别下包含部分关键技术,本文将具体阐述其中有代表性的方法。同时对比提供不同跨模态数据信息下不同技术的区别,以阐述在提供了不同层次的跨模态数据信息下相关技术的关注点与使用异同。为评估不同的跨模态检索方法,总结了部分代表性的跨模态检索数据库。最后讨论了当前深度跨模态检索待解决的问题以及未来的研究方向。  相似文献   

2.
This paper introduces the problem of searching for social network accounts, e.g., Twitter accounts, with the rich information available on the Web, e.g., people names, attributes, and relationships to other people. For this purpose, we need to map Twitter accounts with Web entities. However, existing solutions building upon naive textual matching inevitably suffer low precision due to false positives (e.g., fake impersonator accounts) and false negatives (e.g., accounts using nicknames). To overcome these limitations, we leverage “relational” evidences extracted from the Web corpus. We consider two types of evidence resources—First, web-scale entity relationship graphs, extracted from name co-occurrences crawled from the Web. This co-occurrence relationship can be interpreted as an “implicit” counterpart of Twitter follower relationships. Second, web-scale relational repositories, such as Freebase with complementary strength. Using both textual and relational features obtained from these resources, we learn a ranking function aggregating these features for the accurate ordering of candidate matches. Another key contribution of this paper is to formulate confidence scoring as a separate problem from relevance ranking. A baseline approach is to use the relevance of the top match itself as the confidence score. In contrast, we train a separate classifier, using not only the top relevance score but also various statistical features extracted from the relevance scores of all candidates, and empirically validate that our approach outperforms the baseline approach. We evaluate our proposed system using real-life internet-scale entity-relationship and social network graphs.  相似文献   

3.
目的 糖尿病视网膜病变(diabetic retinopathy,DR)是一种病发率和致盲率都很高的糖尿病并发症。临床中,由于视网膜图像不同等级之间差异性小以及临床医生经验的不同,会出现误诊、漏诊等情况,目前基于人工DR的诊断分类性能差且耗时费力。基于此,本文提出一种融合注意力机制(attention mechanism)和高效率网络(high-efficiency network,EfficientNet)的DR影像自动分类识别方法,以此达到对病变类型的精确诊断。方法 针对实验中DR数据集存在的问题,进行剔除、去噪、扩增和归一化等处理;利用EfficientNet进行特征提取,采用迁移学习的策略用DR的数据集对EfficientNet进行学习与训练,提取深度特征。为了解决病变之间差异小的问题,防止网络对糖尿病视网膜图像的特征学习时出现错分等情况,在EfficientNet输出结果上加入注意力机制;根据网络提取的特征在深度分类器中进行分类,将视网膜图像按等级进行五分类。结果 本文方法的分类精度、敏感性、特异性和二次加权(kappa)值分别为97.2%、95.6%、98.7%和0.84,具有较好的分类性能及鲁棒性。结论 基于融合注意力机制的高效率网络(attention EfficientNet,A-EfficientNet)的DR分类算法有效地提高了DR筛查效率,解决了人工分类的手动提取特征的局限性,在临床上对医生诊断起到了辅助作用,能更有效地防治此类恶性眼疾造成严重视力损伤、甚至失明。  相似文献   

4.
Rumor detection has become an emerging and active research field in recent years. At the core is to model the rumor characteristics inherent in rich information, such as propagation patterns in social network and semantic patterns in post content, and differentiate them from the truth. However, existing works on rumor detection fall short in modeling heterogeneous information, either using one single information source only (e.g., social network, or post content) or ignoring the relations among multiple sources (e.g., fusing social and content features via simple concatenation).Therefore, they possibly have drawbacks in comprehensively understanding the rumors, and detecting them accurately. In this work, we explore contrastive self-supervised learning on heterogeneous information sources, so as to reveal their relations and characterize rumors better. Technically, we supplement the main supervised task of detection with an auxiliary self-supervised task, which enriches post representations via post self-discrimination.Specifically, given two heterogeneous views of a post (i.e., representations encoding social patterns and semantic patterns), the discrimination is done by maximizing the mutual information between different views of the same post compared to that of other posts. We devise cluster-wise and instance-wise approaches to generate the views and conduct the discrimination, considering different relations of information sources. We term this framework as self-supervised rumor detection (SRD). Extensive experiments on three real-world datasets validate the effectiveness of SRD for automatic rumor detection on social media.  相似文献   

5.
In the era of the social web, many people manage their social relationships through various online social networking services. It has been found that identifying the types of social relationships among users in online social networks facilitates the marketing of products via electronic “word of mouth.” However, it is a great challenge to identify the types of social relationships, given very limited information in a social network. In this article, we study how to identify the types of relationships across multiple heterogeneous social networks and examine if combining certain information from different social networks can help improve the identification accuracy. The main contribution of our research is that we develop a novel decision tree initiated random walk model, which takes into account both global network structure and local user behavior to bootstrap the performance of relationship identification. Experiments conducted based on two real‐world social networks, Sina Weibo and Jiepang, demonstrate that the proposed model achieves an average accuracy of 92.0%, significantly outperforming other baseline methods. Our experiments also confirm the effectiveness of combining information from multiple social networks. Moreover, our results reveal that human mobility features indicating location categories, coincidence, and check‐in patterns are among the most discriminative features for relationship identification.  相似文献   

6.
Distant supervision relation extraction (DSRE) trains a classifier by automatically labeling data through aligning triples in the knowledge base (KB) with large-scale corpora. Training data generated by distant supervision may contain many mislabeled instances, which is harmful to the training of the classifier. Some recent methods show that relevant background information in KBs, such as entity type (e.g., Organization and Book), can improve the performance of DSRE. However, there are three main problems with these methods. Firstly, these methods are tailored for a specific type of information. A specific type of information only has a positive effect on a part of instances and will not be beneficial to all cases. Secondly, different background information is embedded independently, and no reasonable interaction is achieved. Thirdly, previous methods do not consider the side effect of the introduced noise of background information. To address these issues, we leverage five types of background information instead of a specific type of information in previous works and propose a novel edge-reasoning hybrid graph (ER-HG) model to realize reasonable interaction between different kinds of information. In addition, we further employ an attention mechanism for the ER-HG model to alleviate the side effect of noise. The ER-HG model integrates all types of information efficiently and is very robust to the noise of information. We conduct experiments on two widely used datasets. The experimental results demonstrate that our model outperforms the state-of-the-art methods significantly in held-out metric and robustness tests.  相似文献   

7.
Sharp edges are important shape features and their extraction has been extensively studied both on point clouds and surfaces. We consider the problem of extracting sharp edges from a sparse set of colour‐and‐depth (RGB‐D) images. The noise‐ridden depth measurements are challenging for existing feature extraction methods that work solely in the geometric domain (e.g. points or meshes). By utilizing both colour and depth information, we propose a novel feature extraction method that produces much cleaner and more coherent feature lines. We make two technical contributions. First, we show that intensity edges can augment the depth map to improve normal estimation and feature localization from a single RGB‐D image. Second, we designed a novel algorithm for consolidating feature points obtained from multiple RGB‐D images. By utilizing normals and ridge/valley types associated with the feature points, our algorithm is effective in suppressing noise without smearing nearby features.  相似文献   

8.
Recent research emphasizes more on analyzing multiple features to improve face recognition (FR) performance. One popular scheme is to extend the sparse representation based classification framework with various sparse constraints. Although these methods jointly study multiple features through the constraints, they just process each feature individually such that they overlook the possible high-level relationship among different features. It is reasonable to assume that the low-level features of facial images, such as edge information and smoothed/low-frequency image, can be fused into a more compact and more discriminative representation based on the latent high-level relationship. FR on the fused features is anticipated to produce better performance than that on the original features, since they provide more favorable properties. Focusing on this, we propose two different strategies which start from fusing multiple features and then exploit the dictionary learning (DL) framework for better FR performance. The first strategy is a simple and efficient two-step model, which learns a fusion matrix from training face images to fuse multiple features and then learns class-specific dictionaries based on the fused features. The second one is a more effective model requiring more computational time that learns the fusion matrix and the class-specific dictionaries simultaneously within an iterative optimization procedure. Besides, the second model considers to separate the shared common components from class-specified dictionaries to enhance the discrimination power of the dictionaries. The proposed strategies, which integrate multi-feature fusion process and dictionary learning framework for FR, realize the following goals: (1) exploiting multiple features of face images for better FR performances; (2) learning a fusion matrix to merge the features into a more compact and more discriminative representation; (3) learning class-specific dictionaries with consideration of the common patterns for better classification performance. We perform a series of experiments on public available databases to evaluate our methods, and the experimental results demonstrate the effectiveness of the proposed models.  相似文献   

9.
With various emerging Social Networking Services (SNS), it is possible for users to join multiple SNS for social relationships with other users and to collect a large amount of information (e.g., statuses on Facebook and tweets on Twitter). However, these users have been facing difficulties in managing all the data collected from the multiple SNS. It is important to match social identities from the multiple SNS. In this study, we propose a privacy-aware framework for a social identity matching (SIM) method across these multiple SNS. It means that the proposed approach can protect user privacy, because only the public information (e.g., username and the social relationships of the users) is employed to find the best matches between social identities. As a result, we have shown by evaluation that the F-measure of the proposed SIM method is about 60%.  相似文献   

10.
目的 度量学习是少样本学习中一种简单且有效的方法,学习一个丰富、具有判别性和泛化性强的嵌入空间是度量学习方法实现优秀分类效果的关键。本文从样本自身的特征以及特征在嵌入空间中的分布出发,结合全局与局部数据增强实现了一种元余弦损失的少样本图像分类方法(a meta-cosine loss for few-shot image classification,AMCL-FSIC)。方法 首先,从数据自身特征出发,将全局与局部的数据增广方法结合起来,利于局部信息提供更具区别性和迁移性的信息,使训练模型更多关注图像的前景信息。同时,利用注意力机制结合全局与局部特征,以得到更丰富更具判别性的特征。其次,从样本特征在嵌入空间中的分布出发,提出一种元余弦损失(meta-cosine loss,MCL)函数,优化少样本图像分类模型。使用样本与类原型间相似性的差调整不同类的原型,扩大类间距,使模型测试新任务时类间距更加明显,提升模型的泛化能力。结果 分别在5个少样本经典数据集上进行了实验对比,在FC100(Few-shot Cifar100)和CUB(Caltech-UCSD Birds-200-2011)数据集上,本文方法均达到了目前最优分类效果;在MiniImageNet、TieredImageNet和Cifar100数据集上与对比模型的结果相当。同时,在MiniImageNet,CUB和Cifar100数据集上进行对比实验以验证MCL的有效性,结果证明提出的MCL提升了余弦分类器的分类效果。结论 本文方法能充分提取少样本图像分类任务中的图像特征,有效提升度量学习在少样本图像分类中的准确率。  相似文献   

11.
12.
Image retrieval and categorization may need to consider several types of visual features and spatial information between them (e.g., different point of views of an image). This paper presents a novel approach that exploits an extension of the language modeling approach from information retrieval to the problem of graph-based image retrieval and categorization. Such versatile graph model is needed to represent the multiple points of views of images. A language model is defined on such graphs to handle a fast graph matching. We present the experiments achieved with several instances of the proposed model on two collections of images: one composed of 3,849 touristic images and another composed of 3,633 images captured by a mobile robot. Experimental results show that using visual graph model (VGM) improves the accuracies of the results of the standard language model (LM) and outperforms the Support Vector Machine (SVM) method.  相似文献   

13.
近几年,在基于事件的社交网络(EBSNs)服务中,为便于增强用户体验,事件推荐任务一直被广泛研究。本文基于对EBSN中用户行为数据的详细分析,提出了一种新型的融合多种数据特征的潜在因子模型。该模型综合考虑EBSN中两种新型的数据特征: 异构的社交关系特征(线上社交关系+线下社交关系)和用户参与行为的地域性特征。基于真实的Meetup数据集,实验结果表明我们的算法在解决事件推荐问题时比传统的算法有更好的性能。
  相似文献   

14.
一种融合语义距离的最近邻图像标注方法   总被引:1,自引:0,他引:1  
传统的基于最近邻的图像标注方法效果不佳,主要原因在于提取图像视觉特征时,损失了很多有价值的信息.提出了一种改进的最近邻分类模型.首先利用距离测度学习方法,引入图像的语义类别信息进行训练,生成新的语义距离;然后利用该距离对每一类图像进行聚类,生成多个类内的聚类中心;最后通过计算图像到各个聚类中心的语义距离来构建最近邻分类模型.在构建最近邻分类模型的整个过程中,都使用训练得到的语义距离来计算,这可以有效减少相同图像类内的变动和不同图像类之间的相似所造成的语义鸿沟.在ImageCLEF2012图像标注数据库上进行了实验,将本方法与传统分类模型和最新的方法进行了比较,验证了本方法的有效性.  相似文献   

15.
目的 由于乳腺肿瘤病灶的隐蔽性强且极易转移,目前采用医学辅助诊断(computer-aided diagnosis,CAD)来尽早地发现肿瘤并诊断。然而,医学图像数据量少且标注昂贵,导致全监督场景下的基于深度学习的X-ray乳腺肿瘤检测方法的性能非常有限,且模型泛化能力弱;此外,噪声产生的域偏移(domain shift)也降低了不同环境下肿瘤检测的性能。针对上述挑战,提出一种单域泛化X-ray乳腺肿瘤检测方法。方法 提出了一种单域泛化模型(single-domain generalization model, SDGM)进行X-ray乳腺肿瘤检测,采用ResNet-50(residual network-50)作为主干特征提取网络,设计了域特征增强模块(domain feature enhancement module, DFEM)来有效融合上采样与下采样中的全局信息以抑制噪声,然后在检测头处设计了实例泛化模块(instance generalization module,IGM),对每个实例的类别语义信息进行正则化与白化处理来提升模型的泛化性能,通过学习少量的有标注医学图像对不可预...  相似文献   

16.
In hyperspectral image (HSI) processing, the inclusion of both spectral and spatial features, e.g. morphological features, shape features, has shown great success in classification of hyperspectral data. Nevertheless, there exist two main issues to address: (1) The multiple features are often treated equally and thus the complementary information among them is neglected. (2) The features are often degraded by a mixture of various kinds of noise, leading to the classification accuracy decreased. In order to address these issues, a novel robust discriminative multiple features extraction (RDMFE) method for HSI classification is proposed. The proposed RDMFE aims to project the multiple features into a common low-rank subspace, where the specific contributions of different types of features are sufficiently exploited. With low-rank constraint, RDMFE is able to uncover the intrinsic low-dimensional subspace structure of the original data. In order to make the projected features more discriminative, we make the learned representations optimal for classification. With intrinsic information preserving and discrimination capabilities, the learned projection matrix works well in HSI classification tasks. Experimental results on three real hyperspectral datasets confirm the effectiveness of the proposed method.  相似文献   

17.
针对壁画图像具有较大类内差异的特点,提出一种分组策略,将样本空间划分为不同的子空间,每一个子空间中的所有训练样本训练分类器模型,测试阶段,根据测试样本落到的子空间来选择不同的分类模型对测试样本进行分类。在各个子空间训练分类器时,为了克服壁画图像较强背景噪音的影响,我们将每一幅壁画图像样本看作多个实例的组成,采用多实例学习的方式来训练分类器。训练过程中,我们引入隐变量用于标识每一个实例,隐变量的存在使得分类器的优化问题不是一个凸问题,因此我们无法用梯度下降法去直接求解,本文中我们采用迭代的方式训练Latent SVM作为每一个子空间的分类器。实验证明了本文的分类模型能够较大程度的解决壁画图像的类内差异以及背景噪音对分类结果造成的影响。  相似文献   

18.
Over the past few years, the appropriate utilization of user communities or image groups in social networks (i.e., Flickr or Facebook) has drawn a great deal of attention. In this paper, we are particularly interested in recommending preferred groups to users who may favor according to auxiliary information. In real world, the images captured by mobile equipments explicitly record a lot of contextual information (e.g., locations) about users generating images. Meanwhile, several words are employed to describe the particular theme of each group (e.g., “Dogs for Fun Photos” image group in Flickr), and the words may mention particular entities as well as their belonging categories (e.g., “Animal”). In fact, the group recommendation can be conducted in heterogeneous information networks, where informative cues are in general multi-typed. Motivated by the assumption that the auxiliary information (visual features of images, mobile contextual information and entity-category information of groups in this paper) in heterogeneous information networks will boost the performance of the group recommendation, this paper proposes to combine auxiliary information with implicit user feedback for group recommendation. In general, the group recommendation in this paper is formulated as a non-negative matrix factorization (NMF) method regularized with user–user similarity via visual features and heterogeneous information networks. Experiments show that our proposed approach outperforms other counterpart recommendation approaches.  相似文献   

19.
Network representation learning called NRL for short aims at embedding various networks into lowdimensional continuous distributed vector spaces. Most existing representation learning methods focus on learning representations purely based on the network topology, i.e., the linkage relationships between network nodes, but the nodes in lots of networks may contain rich text features, which are beneficial to network analysis tasks, such as node classification, link prediction and so on. In this paper, we propose a novel network representation learning model, which is named as Text-Enhanced Network Representation Learning called TENR for short, by introducing text features of the nodes to learn more discriminative network representations, which come from joint learning of both the network topology and text features, and include common influencing factors of both parties. In the experiments, we evaluate our proposed method and other baseline methods on the task of node classification. The experimental results demonstrate that our method outperforms other baseline methods on three real-world datasets.  相似文献   

20.
Crime is a focal problem in modern society, affecting social stability, public safety, economic development, and life quality of residents. Promptly predicting crime occurrence places in a relatively high accuracy is a very important and meaningful research direction. Via the rapid development of social media (e.g., Twitter), the online information can act as a strong supplement for the offline information (crime records). Additionally, the geographic information and taxi flow between communities can model the spatial relationship between communities, which has already been confirmed effective in previous work. In order to efficiently solve crime prediction problem, we propose a generalized deep multi-view representation learning framework for crime forecasting. Our extensive experiments on a 4-month city-wide dataset that consists of 77 communities and 22 crime types show our model improve the prediction accuracy on most crime types.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号