首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 23 毫秒
This paper investigates the problem of cross-domain action recognition. Specifically, we present a cross-domain action recognition framework by utilizing some labeled data from other data sets as the auxiliary source domain. It is a challenging task as data from different domains may have different feature distribution. To map data from different domains into the same abstract space and boost the action recognition performance, we propose a method named collective matrix factorization with graph Laplacian regularization (CMFGLR). Our approach is built upon the technique of collective matrix factorization, which simultaneously learns a common latent space, linear projection matrices for obtaining semantic representations, and an optimal linear classifier. Moreover, we explore the label consistency across different domain and the local geometric consistency in each domain and obtain a graph Laplacian regularization term to enhance the discrimination of learned features. Experimental results verify that CMFGLR significantly outperforms several state-of-the-art methods.  相似文献   

李威  王蒙 《自动化学报》2022,48(9):2337-2351
针对目标检测任务中获取人工标注训练样本的困难, 提出一种在像素级与特征级渐进完成域自适应的无监督跨域目标检测方法. 现有的像素级域自适应方法中, 存在翻译图像风格单一、内容结构不一致的问题. 因此, 将输入图像分解为域不变的内容空间及域特有的属性空间, 综合不同空间表示进行多样性的图像翻译, 同时保留图像的空间语义结构以实现标注信息的迁移. 此外, 对特征级域自适应而言, 为缓解单源域引起的源域偏向问题, 将得到的带有标注的多样性翻译图像作为多源域训练集, 设计基于多领域的对抗判别模块, 从而获取多个领域不变的特征表示. 最后, 采用自训练方案迭代生成目标域训练集伪标签, 以进一步提升模型在目标域上的检测效果. 在Cityscapes & Foggy Cityscapes与VOC07 & Clipart1k数据集上的实验结果表明, 相比现有的无监督跨域检测算法, 该检测框架具更优越的迁移检测性能.  相似文献   

跨领域情感分类任务旨在利用富含情感标签的源域数据对缺乏标签的目标域数据进行情感极性分析.由此,文中提出基于对抗式分布对齐的跨域方面级情感分类模型,利用方面词与上下文的交互注意力学习语义关联,基于梯度反转层的领域分类器学习共享的特征表示.利用对抗式训练扩大领域分布的对齐边界,有效缓解模糊特征导致错误分类的问题.在Semeval-2014、Twitter数据集上的实验表明,文中模型性能较优.消融实验进一步表明捕获决策边界的模糊特征并扩大样本与决策边界间距离的策略可提高分类性能.  相似文献   

针对VSM不能揭示隐藏在不同特征词后面的相同概念语义、反映文档中的潜在语义关系、在相似度计算中精度较低的问题,提出一种基于领域本体的文档向量空间模型DOBVSM(domain ontology-based vector spacemodel)。该模型把领域本体中的概念扩展为文档特征词,并通过概念间的语义关系对特征词权重进行调整,最终建立包含语义关系的文档DOBVSM。通过实验分析表明:DOBVSM计算的文档相似度值更加发散,与专家评价值最为接近,能够较好地反映文档之间的相似情况。  相似文献   

情感分类是用于判断数据的情感极性,广泛用于商品评论,微博话题等数据。标记信息的昂贵使得传统的情感分类方法难以对不同领域的数据进行有效的分类。为此,跨领域情感分类问题引起广泛关注。已有的跨领域情感分类方法大多以共现为基础提取词汇特征和句法特征, 而忽略了词语间的语义关系。基于此,提出了基于word2vec的跨领域情感分类方法WEEF(Cross-domain Classification based on Word Embedding Extension Feature),选取高质量的领域共现特征作为桥梁,并以这些特征作为种子,基于词向量的相似度计算,将领域专有特征扩充到这些种子中,形成特征簇,从而减小领域间的差异。在SRAA和Amazon产品评论数据集上的实验结果表明方法的有效性,尤其在数据量较大时。  相似文献   


Sense representations have gone beyond word representations like Word2Vec, GloVe and FastText and achieved innovative performance on a wide range of natural language processing tasks. Although very useful in many applications, the traditional approaches for generating word embeddings have a strict drawback: they produce a single vector representation for a given word ignoring the fact that ambiguous words can assume different meanings. In this paper, we explore unsupervised sense representations which, different from traditional word embeddings, are able to induce different senses of a word by analyzing its contextual semantics in a text. The unsupervised sense representations investigated in this paper are: sense embeddings and deep neural language models. We present the first experiments carried out for generating sense embeddings for Portuguese. Our experiments show that the sense embedding model (Sense2vec) outperformed traditional word embeddings in syntactic and semantic analogies task, proving that the language resource generated here can improve the performance of NLP tasks in Portuguese. We also evaluated the performance of pre-trained deep neural language models (ELMo and BERT) in two transfer learning approaches: feature based and fine-tuning, in the semantic textual similarity task. Our experiments indicate that the fine tuned Multilingual and Portuguese BERT language models were able to achieve better accuracy than the ELMo model and baselines.


词向量在自然语言处理中起着重要的作用,近年来受到越来越多研究者的关注。然而,传统词向量学习方法往往依赖于大量未经标注的文本语料库,却忽略了单词的语义信息如单词间的语义关系。为了充分利用已有领域知识库(包含丰富的词语义信息),文中提出一种融合语义信息的词向量学习方法(KbEMF),该方法在矩阵分解学习词向量的模型上加入领域知识约束项,使得拥有强语义关系的词对获得的词向量相对近似。在实际数据上进行的单词类比推理任务和单词相似度量任务结果表明,KbEMF比已有模型具有明显的性能提升。  相似文献   

Domain ontologies facilitate the organization, sharing and reuse of domain knowledge, and enable various vertical domain applications to operate successfully. Most methods for automatically constructing ontologies focus on taxonomic relations, such as is-kind-of and is-part-of relations. However, much of the domain-specific semantics is ignored. This work proposes a semi-unsupervised approach for extracting semantic relations from domain-specific text documents. The approach effectively utilizes text mining and existing taxonomic relations in domain ontologies to discover candidate keywords that can represent semantic relations. A preliminary experiment on the natural science domain (Taiwan K9 education) indicates that the proposed method yields valuable recommendations. This work enriches domain ontologies by adding distilled semantics.  相似文献   

虽然近年来情感分析相关研究取得很大进展,但跨领域属性情感分析仍是一个挑战。现有的方法主要关注源领域和目标领域的共有信息,忽略了目标领域的特有信息。此外,情感词作为句子中的重要信息,不仅能反映属性的情感极性,而且可以被划分为共有情感词和特有情感词。针对目标领域的特有信息和情感词,该文提出领域特有情感词注意力模型(DSSW-ATT)。该模型设立两个独立的子空间,分别使用注意力机制提取共有情感词特征和特有情感词特征,并建立相应的共有特征分类器和特有特征分类器,同时使用协同训练方法融合这两种特征。该文还构建了酒店领域(源领域)和手机领域(目标领域)的属性级用户评论数据集。在该数据集上的实验结果表明,该方法明显优于基线方法。  相似文献   

传统的推荐系统面临着诸如数据稀疏性、无法解释的推荐等几个挑战。为了解决这些问题,许多研究通过挖掘评论文本语义信息来提高推荐性能。然而,这些方法在文本特征建模和文本交互方面存在问题。在文本建模方面,它们简单地将用户/物品的所有评论拼接成一个单一的评论。然而,单词/短语级别的语义信息可能与评论文本的整体语义信息相悖。在文本交互方面,它们将交互推迟到预测层,无法捕捉用户和物品之间复杂的相关性。为了解决这些问题,我们提出了一种新颖的基于层次型文本交互的表示学习方法。在该方法中,我们以层级方式对低级单词语义和高级评论文本进行建模,以便在不同粒度上挖掘文本信息。为了进一步捕捉复杂的用户-物品的交互关系,我们提出在不同层次上挖掘用户-物品之间的语义关联。在单词级别上,我们提出了一种针对每对用户-物品个性化的注意力机制,来捕捉表示每个评论的重要单词。在文本级别上,我们在用户和物品之间相互传播文本语义信息,并捕捉针对目标任务有用的评论文本。最后,我们通过协同过滤框架,将该方法应用于评分预测应用场景,并通过在公开数据集上的对比实验,证明该方法在评分预测方面的性能优于现有方法。  相似文献   

推荐系统在各方各面得到充分的应用,时刻影响着日常生活。要训练出一个良好的推荐系统往往需要大量的用户—商品交互数据,但是实际情况下获得的数据往往是十分稀疏的,这往往会使得训练出来的模型过拟合,最后难以获得理想的推荐效果。为了解决这个问题,跨领域推荐系统应运而生。目前大部分的跨领域推荐系统工作都是借鉴传统领域自适应的方法,使用基于特征对齐或者对抗学习的思想将领域不变用户兴趣从有丰富数据的源域迁移到稀疏的目标域上,例如豆瓣电影迁移到豆瓣图书。但是由于不同推荐平台的网络结构有所不同,现有方法暴力提取的领域不变的语义信息容易和结构信息耦合,导致错配现象。而且,现有方法忽略了图数据本身存在的噪声,导致实验效果进一步受到了影响。为了解决这个问题,首先引入了图数据的因果数据生成过程,通过领域特征隐变量和语义特征隐变量、噪声隐变量解耦出来,通过使用每个节点的语义隐变量进行推荐,从而获得领域不变的推荐效果。在多个公开数据集上验证了该方法,并取得了目前最好的实验效果。  相似文献   

目的 现有的图像识别方法应用于从同一分布中提取的训练数据和测试数据时具有良好性能,但这些方法在实际场景中并不适用,从而导致识别精度降低。使用领域自适应方法是解决此类问题的有效途径,领域自适应方法旨在解决来自两个领域相关但分布不同的数据问题。方法 通过对数据分布的分析,提出一种基于注意力迁移的联合平衡自适应方法,将源域有标签数据中提取的图像特征迁移至无标签的目标域。首先,使用注意力迁移机制将有标签源域数据的空间类别信息迁移至无标签的目标域。通过定义卷积神经网络的注意力,使用关注信息来提高图像识别精度。其次,基于目标数据集引入网络参数的先验分布,并且赋予网络自动调整每个领域对齐层特征对齐的能力。最后,通过跨域偏差来描述特定领域的特征对齐层的输入分布,定量地表示每层学习到的领域适应性程度。结果 该方法在数据集Office-31上平均识别准确率为77.6%,在数据集Office-Caltech上平均识别准确率为90.7%,不仅大幅领先于传统手工特征方法,而且取得了与目前最优的方法相当的识别性能。结论 注意力迁移的联合平衡领域自适应方法不仅可以获得较高的识别精度,而且能够自动学习领域间特征的对齐程度,同时也验证了进行域间特征迁移可以提高网络优化效果这一结论。  相似文献   

摘要:跨领域分类旨在利用已标记的源领域信息来为概率分布不同,未标记的目标领域训练一个精确的分类器。已有工作大多以文本主题为特征表现形式,并基于共享主题来建立领域间独有主题的映射关系,从而达到跨领域学习的目的。然而,现实中领域间的连接可以是多角度的,而这种基于单一共享主题的映射方式,存在语义表示不完备和偏差性等问题,从而影响跨领域分类精度。基于此,提出一种基于多桥映射的跨领域分类方法,通过提取多重的共享主题和领域独有主题,并以多重共享主题为桥梁来建立领域独有主题之间的多重映射关系,从而实现跨领域的分类。在20Newsgroups和Reuters-21578数据集上的实验结果表明,和同类算法相比,所提算法在分类精度上具有优越性。  相似文献   

词向量能够以向量的形式表示词的意义,近来许多自然语言处理应用中已经融入词向量,将其作为额外特征或者直接输入以提升系统性能。然而,目前的词向量训练模型大多基于浅层的文本信息,没有充分挖掘深层的依存关系。词的词义体现在该词与其他词产生的关系中,而词语关系包含关联单位、关系类型和关系方向三个属性,因此,该文提出了一种新的基于神经网络的词向量训练模型,它具有三个顶层,分别对应关系的三个属性,更合理地利用词语关系对词向量进行训练,借助大规模未标记文本,利用依存关系和上下文关系来训练词向量。将训练得到的词向量在类比任务和蛋白质关系抽取任务上进行评价,以验证关系模型的有效性。实验表明,与skip-gram模型和CBOW模型相比,由关系模型训练得到的词向量能够更准确地表达词语的语义信息。  相似文献   

如何在中文BERT字向量基础上进一步得到高质量的领域词向量表示,用于各种以领域分词为基础的文本分析任务是一个亟待解决的问题。提出了一种基于BERT的领域词向量生成方法。建立一个BERT-CRF领域分词器,在预训练BERT字向量基础上结合领域文本进行fine-tuning和领域分词学习;通过领域分词解码结果进一步得到领域词向量表示。实验表明,该方法仅利用少量的领域文本就可以学习出符合领域任务需求的分词器模型,并能获得相比原始BERT更高质量的领域词向量。  相似文献   

Sentence syntax is the basis for organizing semantic relations in TANKA, a project that aims to acquire knowledge from technical text. Other hallmarks include an absence of precoded domain-specific knowledge; significant use of public-domain generic linguistic information sources; involvement of the user as a judge and source of expertise; and learning from the meaning representations produced during processing. These elements shape the realization of the TANKA project: implementing a trainable text processing system to propose correct semantic interpretations to the user. A three-level model of sentence semantics, including a comprehensive Case system, provides the framework for TANKA's representations. Text is first processed by the DIPETT parser, which can handle a wide variety of unedited sentences. The semantic analysis module HAIKU then semi-automatically extracts semantic patterns from the parse trees and composes them into domain knowledge representations. HAIKU's dictionaries and main algorithm are described with the aid of examples and traces of user interaction. Encouraging experimental results are described and evaluated.  相似文献   

Data-driven conceptual design is rapidly emerging as a powerful approach to generate novel and meaningful ideas by leveraging external knowledge especially in the early design phase. Currently, most existing studies focus on the identification and exploration of design knowledge by either using common-sense or building specific-domain ontology databases and semantic networks. However, the overwhelming majority of engineering knowledge is published as highly unstructured and heterogeneous texts, which presents two main challenges for modern conceptual design: (a) how to capture the highly contextual and complex knowledge relationships, (b) how to efficiently retrieve of meaningful and valuable implicit knowledge associations. To this end, in this work, we propose a new data-driven conceptual design approach to represent and retrieve cross-domain knowledge concepts for enhancing design ideation. Specifically, this methodology is divided into three parts. Firstly, engineering design knowledge from the massive body of scientific literature is efficiently learned as information-dense word embeddings, which can encode complex and diverse engineering knowledge concepts into a common distributed vector space. Secondly, we develop a novel semantic association metric to effectively quantify the strength of both explicit and implicit knowledge associations, which further guides the construction of a novel large-scale design knowledge semantic network (DKSN). The resulting DKSN can structure cross-domain engineering knowledge concepts into a weighted directed graph with interconnected nodes. Thirdly, to automatically explore both explicit and implicit knowledge associations of design queries, we further establish an intelligent retrieval framework by applying pathfinding algorithms on the DKSN. Next, the validation results on three benchmarks MTURK-771, TTR and MDEH demonstrate that our constructed DKSN can represent and associate engineering knowledge concepts better than existing state-of-the-art semantic networks. Eventually, two case studies show the effectiveness and practicality of our proposed approach in the real-world engineering conceptual design.  相似文献   

冯艳红  于红  孙庚  赵禹锦 《计算机应用》2016,36(11):3146-3151
针对基于统计特征的领域术语识别方法忽略了术语的语义和领域特性,从而影响识别结果这一问题,提出一种基于词向量和条件随机场(CRF)的领域术语识别方法。该方法利用词向量具有较强的语义表达能力、词语与领域术语之间的相似度具有较强的领域表达能力这一特点,在统计特征的基础上,增加了词语的词向量与领域术语的词向量之间的相似度特征,构成基于词向量的特征向量,并采用CRF方法综合这些特征实现了领域术语识别。最后在领域语料库和SogouCA语料库上进行实验,识别结果的准确率、召回率和F测度分别达到了0.9855、0.9439和0.9643,表明所提的领域术语识别方法取得了较好的效果。  相似文献   

Open Information Extraction (OIE) systems focus on identifying and extracting general relations from text. Most OIE systems utilize simple linguistic structure, such as part-of-speech or dependency features, to extract relations and arguments from a sentence. These approaches are simple and fast to implement, but suffer from two main drawbacks: i) they are less effective to handle complex sentences with multiple relations and shared arguments, and ii) they tend to extract overly-specific relations.This paper proposes an approach to Information Extraction called SemIE, which addresses both drawbacks. SemIE identifies significant relations from domain-specific text by utilizing a semantic structure that describes the domain of discourse. SemIE exploits the predicate-argument structure of a text, which is able to handle complex sentences. The semantics of the arguments are explicitly specified by mapping them to relevant concepts in the semantic structure.SemIE uses a semi-supervised learning approach to bootstrap training examples that cover all relations expressed in the semantic structure. SemIE inputs pairs of structured documents and uses a Greedy Mapping module to bootstrap a full set of training examples. The training examples are then used to learn the extraction and mapping rules.We evaluated the performance of SemIE by comparing it with OLLIE, a state-of-the-art OIE system. We tested SemIE and OLLIE on the task of extracting relations from text in the “movie” domain and found that on average, SemIE outperforms OLLIE. Furthermore, we also examined how the performance varies with sentence complexity and sentence length. The results prove the effectiveness of SemIE in handling complex sentences.  相似文献   

由于中文文本之间没有分隔符,难以识别中文命名实体的边界.此外,在垂直领域中难以获取充足的标记完整的语料,例如医疗领域和金融领域等垂直领域.为解决上述不足,提出一种动态迁移实体块信息的跨领域中文实体识别模型(TES-NER),将跨领域共享的实体块信息(entity span)通过基于门机制(gate mechanism)的动态融合层,从语料充足的通用领域(源领域)动态迁移到垂直领域(目标领域)上的中文命名实体模型,其中,实体块信息用于表示中文命名实体的范围.TES-NER模型首先通过双向长短期记忆神经网络(BiLSTM)和全连接网络(FCN)构建跨领域共享实体块识别模块,用于识别跨领域共享的实体块信息以确定中文命名实体的边界;然后,通过独立的基于字的双向长短期记忆神经网络和条件随机场(BiLSTM-CRF)构建中文命名实体识别模块,用于识别领域指定的中文命名实体;最后构建动态融合层,将实体块识别模块抽取得到的跨领域共享实体块信息通过门机制动态决定迁移到领域指定的命名实体识别模型上的量.设置通用领域(源领域)数据集为标记语料充足的新闻领域数据集(MSRA),垂直领域(目标领域)数据集为混合领域(OntoNotes 5.0)、金融领域(Resume)和医学领域(CCKS 2017)这3个数据集,其中,混合领域数据集(OntoNotes 5.0)是融合了6个不同垂直领域的数据集.实验结果表明,提出的模型在OntoNotes 5.0、Resume和CCKS 2017这3个垂直领域数据集上的F1值相比于双向长短期记忆和条件随机场模型(BiLSTM-CRF)分别高出2.18%、1.68%和0.99%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号