首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
刘长红  曾胜  张斌  陈勇 《计算机应用》2022,42(10):3018-3024
跨模态图像文本检索的难点是如何有效地学习图像和文本间的语义相关性。现有的大多数方法都是学习图像区域特征和文本特征的全局语义相关性或模态间对象间的局部语义相关性,而忽略了模态内对象之间的关系和模态间对象关系的关联。针对上述问题,提出了一种基于语义关系图的跨模态张量融合网络(CMTFN-SRG)的图像文本检索方法。首先,采用图卷积网络(GCN)学习图像区域间的关系并使用双向门控循环单元(Bi-GRU)构建文本单词间的关系;然后,将所学习到的图像区域和文本单词间的语义关系图通过张量融合网络进行匹配以学习两种不同模态数据间的细粒度语义关联;同时,采用门控循环单元(GRU)学习图像的全局特征,并将图像和文本的全局特征进行匹配以捕获模态间的全局语义相关性。将所提方法在Flickr30K和MS-COCO两个基准数据集上与多模态交叉注意力(MMCA)方法进行了对比分析。实验结果表明,所提方法在Flickr30K测试集、MS-COCO1K测试集以及MS-COCO5K测试集上文本检索图像任务的Recall@1分别提升了2.6%、9.0%和4.1%,召回率均值(mR)分别提升了0.4、1.3和0.1个百分点,可见该方法能有效提升图像文本检索的精度。  相似文献   

2.
Zhang  Jin  He  Xiaohai  Qing  Linbo  Liu  Luping  Luo  Xiaodong 《Multimedia Tools and Applications》2022,81(9):12005-12027
Multimedia Tools and Applications - Cross-modal image-text matching has attracted considerable interest in both computer vision and natural language processing communities. The main issue of...  相似文献   

3.
Zhang  Baopeng  Qu  Yanyun  Peng  Jinye  Fan  Jianping 《Multimedia Tools and Applications》2017,76(20):21401-21421
Multimedia Tools and Applications - For reducing huge uncertainty on the relatedness between the web images and their auxiliary text terms, an automatic image-text alignment algorithm is developed...  相似文献   

4.
宫大汉    陈辉    陈仕江  包勇军  丁贵广   《智能系统学报》2021,16(6):1143-1150
跨模态图像文本检索的任务对于理解视觉和语言之间的对应关系很重要,大多数现有方法利用不同的注意力模块挖掘区域到词和词到区域的对齐来探索细粒度的跨模态关联。然而,现有的方法没有考虑到基于双重注意力会导致对齐不一致的问题。为此,本文提出了一种一致性协议匹配方法,旨在利用一致性对齐来增强跨模态检索的性能。本文采用注意力实现跨模态关联对齐,并基于跨模态对齐结果设计了基于竞争性投票的跨模态协议,该协议衡量了跨模态对齐的一致性,可以有效提升跨模态图像文本检索的性能。在Flickr30K和MS COCO两个基准数据集上,本文通过大量的实验证明了所提出的方法的有效性。  相似文献   

5.
Liao  Wenxiong  Zeng  Bi  Liu  Jianqi  Wei  Pengfei  Fang  Jiongkun 《Applied Intelligence》2022,52(10):11184-11198

As various social platforms are experiencing fast development, the volume of image-text content generated by users has grown rapidly. Image-text based sentiment of social media analysis has also attracted great interest from researchers in recent years. The main challenge of image-text sentiment analysis is how to construct a model that can promote the complementarity between image and text. In most previous studies, images and text were simply merged, while the interaction between them was not fully considered. This paper proposes an image-text interaction graph neural network for image-text sentiment analysis. A text-level graph neural network is used to extract the text features, and a pre-trained convolutional neural network is employed to extract the image features. Then, an image-text interaction graph network is constructed. The node features of the graph network are initialized by the text features and the image features, while the node features in the graph are updated based on the graph attention mechanism. Finally, combined with image-text aggregation layer to realize sentiment classification. The results of the experiments prove that the presented method is more effective than existing methods. In addition, a large-scale Twitter image-text sentiment analysis dataset was built by us and used in the experiments.

  相似文献   

6.
Taking a picture of delicious food and sharing it in social media has been a popular trend. The ability to recommend recipes along will benefit users who want to cook a particular dish, and the feature is yet to be available. The challenge of recipe retrieval, nevertheless, comes from two aspects. First, the current technology in food recognition can only scale up to few hundreds of categories, which are yet to be practical for recognizing tens of thousands of food categories. Second, even one food category can have variants of recipes that differ in ingredient composition. Finding the best-match recipe requires knowledge of ingredients, which is a fine-grained recognition problem. In this paper, we consider the problem from the viewpoint of cross-modality analysis. Given a large number of image and recipe pairs acquired from the Internet, a joint space is learnt to locally capture the ingredient correspondence between images and recipes. As learning happens at the regional level for image and ingredient level for recipe, the model has the ability to generalize recognition to unseen food categories. Furthermore, the embedded multi-modal ingredient feature sheds light on the retrieval of best-match recipes. On an in-house dataset, our model can double the retrieval performance of DeViSE, a popular cross-modality model but not considering region information during learning.  相似文献   

7.
8.
In this paper, we study the (positive) graph relational calculus. The basis for this calculus was introduced by Curtis and Lowe in 1996 and some variants, motivated by their applications to semantics of programs and foundations of mathematics, appear scattered in the literature. No proper treatment of these ideas as a logical system seems to have been presented. Here, we give a formal presentation of the system, with precise formulation of syntax, semantics, and derivation rules. We show that the set of rules is sound and complete for the valid inclusions, and prove a finite model result as well as decidability. We also prove that the graph relational language has the same expressive power as a first-order positive fragment (both languages define the same binary relations), so our calculus may be regarded as a notational variant of the positive existential first-order logic of binary relations. The graph calculus, however, has a playful aspect, with rules easy to grasp and use. This opens a wide range of applications which we illustrate by applying our calculus to the positive relational calculus (whose set of valid inclusions is not finitely axiomatizable), obtaining an algorithm for deciding the valid inclusions and equalities of the latter.  相似文献   

9.
Graphs are ubiquitous in computer science. Moreover, in various application fields, graphs are equipped with attributes to express additional information such as names of entities or weights of relationships. Due to the pervasiveness of attributed graphs, it is highly important to have the means to express properties on attributed graphs to strengthen modeling capabilities and to enable analysis. Firstly, we introduce a new logic of attributed graph properties, where the graph part and attribution part are neatly separated. The graph part is equivalent to first-order logic on graphs as introduced by Courcelle. It employs graph morphisms to allow the specification of complex graph patterns. The attribution part is added to this graph part by reverting to the symbolic approach to graph attribution, where attributes are represented symbolically by variables whose possible values are specified by a set of constraints making use of algebraic specifications. Secondly, we extend our refutationally complete tableau-based reasoning method as well as our symbolic model generation approach for graph properties to attributed graph properties. Due to the new logic mentioned above, neatly separating the graph and attribution parts, and the categorical constructions employed only on a more abstract level, we can leave the graph part of the algorithms seemingly unchanged. For the integration of the attribution part into the algorithms, we use an oracle, allowing for flexible adoption of different available SMT solvers in the actual implementation. Finally, our automated reasoning approach for attributed graph properties is implemented in the tool AutoGraph integrating in particular the SMT solver Z3 for the attribute part of the properties. We motivate and illustrate our work with a particular application scenario on graph database query validation.  相似文献   

10.
在跨模态食谱检索任务中,如何有效地对模态进行特征表示是一个热点问题。目前一般使用两个独立的神经网络分别获取图像和食谱的特征,通过跨模态对齐实现跨模态检索。但这些方法主要关注模态内的特征信息,忽略了模态间的特征交互,导致部分有效模态信息丢失。针对该问题,提出一种通过多模态编码器来增强模态语义的跨模态食谱检索方法。首先使用预训练模型提取图像和食谱的初始语义特征,并借助对抗损失缩小模态间差异;然后利用成对跨模态注意力使来自一个模态的特征反复强化另一个模态的特征,进一步提取有效信息;接着采用自注意力机制对模态的内部特征进行建模,以捕捉丰富的模态特定语义信息和潜在关联知识;最后,引入三元组损失最小化同类样本间的距离,实现跨模态检索学习。在Recipe 1M数据集上的实验结果表明,该方法在中位数排名(MedR)和前K召回率(R@K)等方面均优于目前的主流方法,为跨模态检索任务提供了有力的解决方案。  相似文献   

11.
One of the major assumptions in case-based reasoning is that similar experiences can guide future reasoning, problem solving and learning. This assumption shows the importance of the method used for choosing the most suitable case, especially when dealing with the class of problems in which risk, is relevant concept to the case retrieval process. This paper argues that traditional similarity assessment methods are not sufficient to obtain the best case; an additional step with new information must be performed necessary, after applying similarity measures in the retrieval stage. When a case is recovered from the case base, one must take into account not only the specific value of the attribute but also whether the case solution is suitable for solving the problem, depending on the risk produced in the final decision. We introduce this risk, as new information through a new concept called risk information that is entirely different from the weight of the attributes. Our article presents this concept locally and measures it for each attribute independently.  相似文献   

12.
Commonsense question answering (CQA) requires understanding and reasoning over QA context and related commonsense knowledge, such as a structured Knowledge Graph (KG). Existing studies combine language models and graph neural networks to model inference. However, traditional knowledge graph are mostly concept-based, ignoring direct path evidence necessary for accurate reasoning. In this paper, we propose MRGNN (Meta-path Reasoning Graph Neural Network), a novel model that comprehensively captures sequential semantic information from concepts and paths. In MRGNN, meta-paths are introduced as direct inference evidence and an original graph neural network is adopted to aggregate features from both concepts and paths simultaneously. We conduct sufficient experiments on the CommonsenceQA and OpenBookQA datasets, showing the effectiveness of MRGNN. Also, we conduct further ablation experiments and explain the reasoning behavior through the case study.  相似文献   

13.
As a new information sharing platform, microblog has got explosive growth in recent years and has become an important source for public opinion mining. A variety of information like the reviews of brands/products or the trends of events can be socially sensed from such kind of data. However, it is still a challenging task to search relevant microblogs as the user generated content tends to be mixed with noise. Besides short text, image is getting popular in microblogs due to its power in visual information conveying. In this paper, we leverage textual and visual cues integratedly and propose a general re-ranking approach for microblog retrieval via multi-graph semi-supervised learning. We argue that the different types of information in microblogs correspond to different relationships among microblogs and each type of the relationship can be represented as a similarity graph. We then integrate different graphs into a unified framework and solve them simultaneously for microblog re-ranking. Extensive experiments on a recently published Brand-Social-Net dataset showed the effectiveness of the proposed method and marginal improvements have been achieved in accuracy as compared to the single graph model based method.  相似文献   

14.
刘芳名  张鸿 《计算机应用》2021,41(8):2187-2192
针对大多数跨模态哈希方法采用二进制矩阵表示相关程度,因此无法捕获多标签数据之间更深层的语义信息,以及它们忽略了保持语义结构和数据特征的判别性等问题,提出了一种基于多级语义的判别式跨模态哈希检索算法——ML-SDH.所提算法使用多级语义相似度矩阵发现跨模态数据中的深层关联信息,同时利用平等指导跨模态哈希表示在语义结构和判...  相似文献   

15.
The aim of this paper is to present the principles and results about case-based reasoning adapted to real-time interactive simulations, more precisely concerning retrieval mechanisms. The article begins by introducing the constraints involved in interactive multiagent-based simulations. The second section presents a framework stemming from case-based reasoning by autonomous agents. Each agent uses a case base of local situations and, from this base, it can choose an action in order to interact with other autonomous agents or users’ avatars. We illustrate this framework with an example dedicated to the study of dynamic situations in football. We then go on to address the difficulties of conducting such simulations in real-time and propose a model for case and for case base. Using generic agents and adequate case base structure associated with a dedicated recall algorithm, we improve retrieval performance under time pressure compared to classic CBR techniques. We present some results relating to the performance of this solution. The article concludes by outlining future development of our project.  相似文献   

16.
17.
目的 现有视觉问答模型的研究主要从注意力机制和多模态融合角度出发,未能对图像场景中对象之间的语义联系显式建模,且较少突出对象的空间位置关系,导致空间关系推理能力欠佳。对此,本文针对需要空间关系推理的视觉问答问题,提出利用视觉对象之间空间关系属性结构化建模图像,构建问题引导的空间关系图推理视觉问答模型。方法 利用显著性注意力,用Faster R-CNN(region-based convolutional neural network)提取图像中显著的视觉对象和视觉特征;对图像中的视觉对象及其空间关系结构化建模为空间关系图;利用问题引导的聚焦式注意力进行基于问题的空间关系推理。聚焦式注意力分为节点注意力和边注意力,分别用于发现与问题相关的视觉对象和空间关系;利用节点注意力和边注意力权重构造门控图推理网络,通过门控图推理网络的信息传递机制和控制特征信息的聚合,获得节点的深度交互信息,学习得到具有空间感知的视觉特征表示,达到基于问题的空间关系推理;将具有空间关系感知的图像特征和问题特征进行多模态融合,预测出正确答案。结果 模型在VQA(visual question answering)v2...  相似文献   

18.
可解释的知识图谱推理方法综述   总被引:2,自引:0,他引:2       下载免费PDF全文
近年来,以深度学习模型为基础的人工智能研究不断取得突破性进展,但其大多具有黑盒性,不利于人类认知推理过程,导致高性能的复杂算法、模型及系统普遍缺乏决策的透明度和可解释性。在国防、医疗、网络与信息安全等对可解释性要求严格的关键领域,推理方法的不可解释性对推理结果及相关回溯造成较大影响,因此,需要将可解释性融入这些算法和系统中,通过显式的可解释知识推理辅助相关预测任务,形成一个可靠的行为解释机制。知识图谱作为最新的知识表达方式之一,通过对语义网络进行建模,以结构化的形式描述客观世界中实体及关系,被广泛应用于知识推理。基于知识图谱的知识推理在离散符号表示的基础上,通过推理路径、逻辑规则等辅助手段,对推理过程进行解释,为实现可解释人工智能提供重要途径。针对可解释知识图谱推理这一领域进行了全面的综述。阐述了可解释人工智能和知识推理相关概念。详细介绍近年来可解释知识图谱推理方法的最新研究进展,从人工智能的3个研究范式角度出发,总结了不同的知识图谱推理方法。提出对可解释的知识图谱推理研究前景和未来研究方向。  相似文献   

19.
20.
Frequent subgraphs proved to be powerful features for graph classification and prediction tasks. Their practical use is, however, limited by the computational intractability of pattern enumeration and that of graph embedding into frequent subgraph feature spaces. We propose a simple probabilistic technique that resolves both limitations. In particular, we restrict the pattern language to trees and relax the demand on the completeness of the mining algorithm, as well as on the correctness of the pattern matching operator by replacing transaction and query graphs with small random samples of their spanning trees. In this way we consider only a random subset of frequent subtrees, called probabilistic frequent subtrees, that can be enumerated efficiently. Our extensive empirical evaluation on artificial and benchmark molecular graph datasets shows that probabilistic frequent subtrees can be listed in practically feasible time and that their predictive and retrieval performance is very close even to those of complete sets of frequent subgraphs. We also present different fast techniques for computing the embedding of unseen graphs into (probabilistic frequent) subtree feature spaces. These algorithms utilize the partial order on tree patterns induced by subgraph isomorphism and, as we show empirically, require much less evaluations of subtree isomorphism than the standard brute-force algorithm. We also consider partial embeddings, i.e., when only a part of the feature vector has to be calculated. In particular, we propose a highly effective practical algorithm that significantly reduces the number of pattern matching evaluations required by the classical min-hashing algorithm approximating Jaccard-similarities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号