期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

崔婉秋杜军平寇菲菲李志坚 Lee JangMyung 《计算机研究与发展》2018,55(8):1641-1652

充分挖掘微博短文本的语义以实现精准搜索是一项重要任务.由于微博文本内容具有稀疏性和语义局限性的特点,使得仅通过分析字面语义来进行短文本理解和相似性匹配的传统搜索方法受到了一定的限制.因此提出了一种社交与概念化语义结合的扩展搜索方法,通过挖掘社交网络独特的社交属性如#标签#、“@”和链接信息URL,对微博短文本实现进一步的社交语义扩展.该方法将文本字面分析获取的概念词语和社交关系中潜在的关联标签信息相结合,对短文本进行2种角度下的语义特征表示,实现了基于微博短文本语义充分理解的精准搜索.在微博数据集上的对比实验表明,与已有的扩展搜索方法相比所提方法能捕捉更多的语义特征,微博搜索的性能也得到了显著的提升. 相似文献

2.

A PHRASE GRAMMAR-BASED CONCEPTUAL INDEXING PARADIGM

P. C. Reghu Raj 《Applied Artificial Intelligence》2013,27(6):559-599

ABSTRACT

Present day information retrieval systems largely ignore the issues of lexical and compositional semantics, and rely mainly on some statistical measures for choosing or evolving an indexing scheme. This has been the reason for the decreasing precision in their responses, given an exponentially increasing number of Web pages. The work reported in this paper addresses this issue from a linguistic point of view. We show that the detection of domain-specific phrases can capture the task-specific semantics of documents. We introduce the notion of n*-gram formalism to characterize the domain-specific phrases and their variants, taking a few sample domains. A method to construct a phrase grammar from a small set of documents is proposed. A method of conceptual indexing based on the phrase grammar has also been proposed. In order to demonstrate the effectiveness of the proposed method, we have designed a versatile system that can perform concept-based retrieval, in addition to several document-processing tasks, such as text classification, extraction-based summarization, context tracking, and semantic tagging. Collectively, the system can address the semantic content of documents. Considering the fact that an average user prefers highly relevant results in the top-ranked subset to an exhaustively retrieved set, it is shown that the proposed system performs better in that it retrieves documents that are more conceptually relevant than those retrieved by Google, and at 95% confidence level. 相似文献

3.

混合词汇特征和LDA的语义相关度计算方法

肖宝李璞蒋运承《计算机工程与应用》2017,53(12):152-157

文本语义相关度计算在自然语言处理、语义信息检索等方面起着重要作用,以Wikipedia为知识库,基于词汇特征的ESA（Explicit Semantic Analysis）因简单有效的特点在这些领域中受到学术界的广泛关注和应用。然而其语义相关度计算因为有大量冗余概念的参与变成了一种高维度、低效率的计算方式,同时也忽略了文本所属主题因素对语义相关度计算的作用。引入LDA（Latent Dirichlet Allocation）主题模型,对ESA返回的相关度较高的概念转换为模型的主题概率向量,从而达到降低维度和提高效率的目的;将JSD距离（Jensen-Shannon Divergence）替换余弦距离的测量方法,使得文本语义相关度计算更加合理和有效。最后对不同层次的数据集进行算法的测试评估,结果表明混合词汇特征和主题模型的语义相关度计算方法的皮尔逊相关系数比ESA和LDA分别高出3%和9%以上。相似文献

4.

Domain structure-based transfer learning for cross-domain word representation

《Information Fusion》2021

Cross-domain word representation aims to learn high-quality semantic representations in an under-resourced domain by leveraging information in a resourceful domain. However, most existing methods mainly transfer the semantics of common words across domains, ignoring the semantic relations among domain-specific words. In this paper, we propose a domain structure-based transfer learning method to learn cross-domain representations by leveraging the relations among domain-specific words. To accomplish this, we first construct a semantic graph to capture the latent domain structure using domain-specific co-occurrence information. Then, in the domain adaptation process, beyond domain alignment, we employ Laplacian Eigenmaps to ensure the domain structure is consistently distributed in the learned embedding space. As such, the learned cross-domain word representations not only capture shared semantics across domains, but also maintain the latent domain structure. We performed extensive experiments on two tasks, namely sentiment analysis and query expansion. The experiment results show the effectiveness of our method for tasks in under-resourced domains. 相似文献

5.

Emphasizing Essential Words for Sentiment Classification Based on Recurrent Neural Networks

下载免费PDF全文

Fei Hu Li Li Zi-Li Zhang Jing-Yuan Wang Xiao-Fei Xu 《计算机科学技术学报》2017,32(4):785-795

With the explosion of online communication and publication, texts become obtainable via forums, chat messages, blogs, book reviews and movie reviews. Usually, these texts are much short and noisy without sufficient statistical signals and enough information for a good semantic analysis. Traditional natural language processing methods such as Bow-of-Word (BOW) based probabilistic latent semantic models fail to achieve high performance due to the short text environment. Recent researches have focused on the correlations between words, i.e., term dependencies, which could be helpful for mining latent semantics hidden in short texts and help people to understand them. Long short-term memory (LSTM) network can capture term dependencies and is able to remember the information for long periods of time. LSTM has been widely used and has obtained promising results in variants of problems of understanding latent semantics of texts. At the same time, by analyzing the texts, we find that a number of keywords contribute greatly to the semantics of the texts. In this paper, we establish a keyword vocabulary and propose an LSTM-based model that is sensitive to the words in the vocabulary; hence, the keywords leverage the semantics of the full document. The proposed model is evaluated in a short-text sentiment analysis task on two datasets: IMDB and SemEval-2016, respectively. Experimental results demonstrate that our model outperforms the baseline LSTM by 1%~2% in terms of accuracy and is effective with significant performance enhancement over several non-recurrent neural network latent semantic models (especially in dealing with short texts). We also incorporate the idea into a variant of LSTM named the gated recurrent unit (GRU) model and achieve good performance, which proves that our method is general enough to improve different deep learning models. 相似文献

6.

基于中文维基百科链接结构与分类体系的语义相关度计算

汪祥贾焰周斌丁兆云梁政《小型微型计算机系统》2011,32(11)

自然语言词汇的语义相关度的计算需要获取大量的背景知识,而维基百科是当前规模最大的百科全书,其不仅是一个规模巨大的语料库,而且还是一个包含了大量人类背景知识和语义关系的知识库,研究表明,其是进行语义计算的理想资源,本文提出了一种将维基百科的链接结构和分类体系相结合计算中文词汇语义相关度的算法,算法只利用了维基百科的链接结构和分类体系,无需进行复杂的文本处理,计算所需的开销较小.在多个人工评测的数据集上的实验结果显示,获得了比单独使用链接结构或分类体系的算法更好的效果,在最好的情况下,Spearman相关系数提高了30.96％. 相似文献

7.

使用概念描述的中文短文本分类算法

杨天平朱征宇《计算机应用》2012,32(12):3335-3338

针对短文本特征较少而导致使用传统文本分类算法进行分类效果并不理想的问题,提出了一种使用了概念描述的短文本分类算法,该方法首先构建出全局的语义概念词表;然后,使用概念词表分别对预测短文本和训练短文本概念化描述,使得预测短文本在训练集中找出拥有相似概念描述的训练短文本组合成预测长文本,同时将训练集内部的短文本也进行自组合形成训练长文本;最后,再使用传统的长文本分类算法进行分类。实验证明,该方法能够有效挖掘短文本内部隐含的语义信息,充分对短文本进行语义扩展,提高了短文本分类的准确度。相似文献

8.

Extracting semantic relations to enrich domain ontologies

Minxin Shen Duen-Ren Liu Yu-Siang Huang 《Journal of Intelligent Information Systems》2012,39(3):749-761

Domain ontologies facilitate the organization, sharing and reuse of domain knowledge, and enable various vertical domain applications to operate successfully. Most methods for automatically constructing ontologies focus on taxonomic relations, such as is-kind-of and is-part-of relations. However, much of the domain-specific semantics is ignored. This work proposes a semi-unsupervised approach for extracting semantic relations from domain-specific text documents. The approach effectively utilizes text mining and existing taxonomic relations in domain ontologies to discover candidate keywords that can represent semantic relations. A preliminary experiment on the natural science domain (Taiwan K9 education) indicates that the proposed method yields valuable recommendations. This work enriches domain ontologies by adding distilled semantics. 相似文献

9.

Semantic integration of enterprise information systems using meta-metadata ontology

Igor Cverdelj-Fogaraši Goran Sladić Stevan Gostojić Milan Segedinac Branko Milosavljević 《Information Systems and E-Business Management》2017,15(2):257-304

This paper proposes a non-domain-specific metadata ontology as a core component in a semantic model-based document management system (DMS), a potential contender towards the enterprise information systems of the next generation. What we developed is the core semantic component of an ontology-driven DMS, providing a robust semantic base for describing documents’ metadata. We also enabled semantic services such as automated semantic translation of metadata from one domain to another. The core semantic base consists of three semantic layers, each one serving a different view of documents’ metadata. The core semantic component’s base layer represents a non-domain-specific metadata ontology founded on ebRIM specification. The main purpose of this ontology is to serve as a meta-metadata ontology for other domain-specific metadata ontologies. The base semantic layer provides a generic metadata view. For the sake of enabling domain-specific views of documents’ metadata, we implemented two domain-specific metadata ontologies, semantically layered on top of ebRIM, serving domain-specific views of the metadata. In order to enable semantic translation of metadata from one domain to another, we established model-to-model mappings between these semantic layers by introducing SWRL rules. Having the semantic translation of metadata automated not only allows for effortless switching between different metadata views, but also opens the door for automating the process of documents long-term archiving. For the case study, we chose judicial domain as a promising ground for improving the efficiency of the judiciary by introducing the semantics in this field. 相似文献

10.

基于带约束语义文法的领域相关自然语言理解方法

王东升王石王卫民符建辉诸峰《中文信息学报》2018,32(2):38-49

开放域问答系统通常可以借助一些数据冗余方法来提高问答准确性,而对于缺乏大规模领域语料的领域相关问答系统来说,准确理解用户的意图成为这类系统的关键。该文首先定义了一种带约束语义文法,与本体等语义资源相结合,可以在词汇级、句法级、语义级对自然语言句子的解析过程进行约束,解决自然语言理解歧义问题;然后给出了一个高效的文法匹配算法,其首先依据定义的各种约束条件预先过滤一些规则,然后依据提出的匹配度计算模型对候选的规则进行排序,找到最佳匹配。为了验证方法的有效性,将方法应用到两个实际的应用领域的信息查询系统。实验结果表明,本系统提出的方法切实有效,系统理解准确率分别达到了82.4%和86.2%,MRR值分别达到了91.6%和93.5%。相似文献

11.

Mohammed Nazim Uddin Trong Hai Duong Ngoc Thanh Nguyen Xin-Min Qi Geun Sik Jo 《Expert systems with applications》2013,40(5):1645-1653

Collaborative tagging systems, also known as folksonomies, enable a user to annotate various web resources with a free set of tags for sharing and searching purposes. Tags in a folksonomy reflect users’ collaborative cognition about information. Tags play an important role in a folksonomy as a means of indexing information to facilitate search and navigation of resources. However, the semantics of the tags, and therefore the semantics of the resources, are neither known nor explicitly stated. It is therefore difficult for users to find related resources due to the absence of a consistent semantic meaning among tags. The shortage of relevant tags increases data sparseness and decreases the rate of information extraction with respect to user queries. Defining semantic relationships between tags, resources, and users is an important research issue for the retrieval of related information from folksonomies. In this research, a method for finding semantic relationships among tags is proposed. The present study considers not only the pairwise relationships between tags, resources, and users, but also the relationships among all three. Experimental results using real datasets from Flickr and Del.icio.us show that the method proposed here is more effective than previous methods such as LCH, JCN, and LIN in finding semantic relationships among tags in a folksonomy. 相似文献

12.

An evaluative baseline for geo-semantic relatedness and similarity

Andrea Ballatore Michela Bertolotto David C. Wilson 《GeoInformatica》2014,18(4):747-767

In geographic information science and semantics, the computation of semantic similarity is widely recognised as key to supporting a vast number of tasks in information integration and retrieval. By contrast, the role of geo-semantic relatedness has been largely ignored. In natural language processing, semantic relatedness is often confused with the more specific semantic similarity. In this article, we discuss a notion of geo-semantic relatedness based on Lehrer’s semantic fields, and we compare it with geo-semantic similarity. We then describe and validate the Geo Relatedness and Similarity Dataset (GeReSiD), a new open dataset designed to evaluate computational measures of geo-semantic relatedness and similarity. This dataset is larger than existing datasets of this kind, and includes 97 geographic terms combined into 50 term pairs rated by 203 human subjects. GeReSiD is available online and can be used as an evaluation baseline to determine empirically to what degree a given computational model approximates geo-semantic relatedness and similarity. 相似文献

13.

基于知网的个人微博语义相关度的聚类研究

高永兵宋添树李江宇马占飞《计算机工程与科学》2019,41(6):1128-1135

聚类相关度大的个人微博有助于快速了解博主的专业兴趣和经历,目前的短文本聚类方法缺乏对于语义和句子相关度的充分考虑,提出了一种基于知网的个人微博语义相关度的聚类方法。其要点如下:(1)利用Skip-gram训练大量微博文本生成词汇向量;(2)根据词汇义原进行句内词汇消除歧义;(3)分别计算个人微博之间词汇和句子的相似度并将其综合得到博文相关度;(4)根据博文相关度进行个人微博的聚类。实验表明,相较于层次聚类法、密度聚类法,本文算法的准确度有明显提高。相似文献

14.

进程创建的语义及等价性

袁春陈意云《计算机学报》2000,23(8):877-881

针对一个基于共享变量的带有进程创建的命令式语言,用变迁系统描述了它的结构操作语义,并用扩展的状态变迁迹模型定义了它的指称语义,在该模型下,状态变迁被区分为两种不同形式,分别表示发生在原进程和被创建进程中的状态变迁,这样便可以定义适当的语义复合运算,在对命令的指称进行复合时根据变迁类型的不同对变迁迹进行串行或交错连接,恰当地反映了进程的并发运行受创建命令在程序中的相对位置的限制,最后证明了这两个语义相似文献

15.

Explication and semantic querying of enterprise information systems

Milan Zdravković Hervé Panetto Miroslav Trajanović Alexis Aubry 《Knowledge and Information Systems》2014,40(3):697-724

Many researches show that the ability of independent, heterogeneous enterprises’ information systems to interoperate is related to the challenges of making their semantics explicit and formal, so that the messages are not merely exchanged, but interpreted, without ambiguity. In this paper, we present an approach to overcome those challenges by developing a method for explication of the systems’ implicit semantics. We define and implement the method for the generation of local ontologies, based on the databases of their systems. In addition, we describe an associated method for the translation between semantic and SQL queries, a process in which implicit semantics of the EIS’s databases and explicit semantics of the local ontologies become interrelated. Both methods are demonstrated in the case of creating the local ontology and the semantic querying of OpenERP Enterprise Resource Planning system, for the benefit of the collaborative supply chain planning. 相似文献

16.

多层视频语义概念分析与理解

魏维邹书蓉刘凤玉《计算机辅助设计与图形学学报》2008,20(1):85-92

基于统计学理论,提出了一种视频多粒度语义分析的通用方法,使得多层次语义分析与多模式信息融合得到统一.为了对时域内容进行表示,首先提出一种具有时间语义语境约束的关键帧选取策略和注意力选择模型;在基本视觉语义识别后,采用一种多层视觉语义分析框架来抽取视觉语义;然后应用隐马尔可夫模型(HMM)和贝叶斯决策进行音频语义理解;最后用一种具有两层结构的仿生多模式融合方案进行语义信息融合.实验结果表明,该方法能有效融合多模式特征,并提取不同粒度的视频语义. 相似文献

17.

一种基于复杂网络的短文本语义相似度计算

詹志建杨小平《中文信息学报》2016,30(4):71-80

将传统的文本相似度量方法直接移植到短文本时,由于短文本内容简短的特性会导致数据稀疏而造成计算结果出现偏差。该文通过使用复杂网络表征短文本,提出了一种新的短文本相似度量方法。该方法首先对短文本进行预处理,然后对短文本建立复杂网络模型,计算短文本词语的复杂网络特征值,再借助外部工具计算短文本词语之间的语义相似度,然后结合短文本语义相似度定义计算短文本之间的相似度。最后在基准数据集上进行聚类实验,验证本文提出的短文本相似度计算方法在基于F-度量值标准上,优于传统的TF-IDF方法和另一种基于词项语义相似度的计算方法。相似文献

18.

Ontology and rule-based natural language processing approach for interpreting textual regulations on underground utility infrastructure

《Advanced Engineering Informatics》2021

The nation’s massive underground utility infrastructure must comply with a multitude of regulations. The regulatory compliance checking of underground utilities requires an objective and consistent interpretation of the regulations. However, utility regulations contain a variety of domain-specific terms and numerous spatial constraints regarding the location and clearance of underground utilities. It is challenging for the interpreters to understand both the domain and spatial semantics in utility regulations. To address the challenge, this paper adopts an ontology and rule-based Natural Language Processing (NLP) framework to automate the interpretation of utility regulations – the extraction of regulatory information and the subsequent transformation into logic clauses. Two new ontologies have been developed. The urban product ontology (UPO) is domain-specific to model domain concepts and capture domain semantics on top of heterogeneous terminologies in utility regulations. The spatial ontology (SO) consists of two layers of semantics – linguistic spatial expressions and formal spatial relations – for better understanding the spatial language in utility regulations. Pattern-matching rules defined on syntactic features (captured using common NLP techniques) and semantic features (captured using ontologies) were encoded for information extraction. The extracted information elements were then mapped to their semantic correspondences via ontologies and finally transformed into deontic logic (DL) clauses to achieve the semantic and logical formalization. The approach was tested on the spatial configuration-related requirements in utility accommodation policies. Results show it achieves a 98.2% precision and a 94.7% recall in information extraction, a 94.4% precision and a 90.1% recall in semantic formalization, and an 83% accuracy in logical formalization. 相似文献

19.

神经网络与组合语义在文本相似度中的应用

肖和付丽娜姬东鸿《计算机工程与应用》2016,52(7):139-142

为了更好地提高短文本语义相似度分析能力,提出了基于神经网络和组合语义的短文本语义相似度分析算法。利用神经网络构建词义表示模型,结合局部和全局上下文信息学习词语在实际语境下的表示;通过句法分析,得到文本的依存关系,并依此构建组合关系树,使用组合语义模型得到整个文本的语义表示;通过计算两个语义表示结果之间的相似性来计算文本的语义相似度。实验分析结果表明,该方法能够在一定程度上提高文本语义分析能力。相似文献

20.

基于标签语义注意力的多标签文本分类

肖琳陈博理黄鑫刘华锋景丽萍于剑《软件学报》2020,31(4):1079-1089

自大数据蓬勃发展以来,多标签分类一直是令人关注的重要问题,在现实生活中有许多实际应用,如文本分类、图像识别、视频注释、多媒体信息检索等.传统的多标签文本分类算法将标签视为没有语义信息的符号,然而,在许多情况下,文本的标签是具有特定语义的,标签的语义信息和文档的内容信息是有对应关系的,为了建立两者之间的联系并加以利用,提出了一种基于标签语义注意力的多标签文本分类(LAbel Semantic Attention Multi-label Classification,简称LASA)方法,依赖于文档的文本和对应的标签,在文档和标签之间共享单词表示.对于文档嵌入,使用双向长短时记忆(bi-directional long short-term memory,简称Bi-LSTM)获取每个单词的隐表示,通过使用标签语义注意力机制获得文档中每个单词的权重,从而考虑到每个单词对当前标签的重要性.另外,标签在语义空间里往往是相互关联的,使用标签的语义信息同时也考虑了标签的相关性.在标准多标签文本分类的数据集上得到的实验结果表明,所提出的方法能够有效地捕获重要的单词,并且其性能优于当前先进的多标签文本分类... 相似文献