首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
刘耀  帅远华  龚幸伟  黄毅 《计算机科学》2018,45(1):128-132, 156
文本分割在信息检索、摘要生成、问答系统、信息抽取等领域发挥着重要作用。在总结现有的国内外文本分割方法的基础上,提出了一种基于领域本体对文本进行线性分割的方法。该方法利用初始概念自动获取结构化语义概念集合,并根据获取的概念、属性及属性词在文本中出现的频次、位置和关系等因素为段落赋予语义标签,挖掘文本的子主题信息,将拥有相同语义标注信息的段落划分为相同语义段落,实现了文本不同子主题之间的分割。实验结果表明,该方法对于特定领域的文本分割的准确率、召回率以及F值分别达到了85%,90%和88%,分割效果能够满足实际应用需求,并优于现有的无需训练语料的文本分割方法。  相似文献   

2.
为解决武器装备领域中单实体重叠和实体对重叠的复杂三元组的抽取问题,提出了挂载武器装备领域知识结合多轮对抗攻击的复杂三元组抽取方法(RDA),该方法通过武器装备领域微调后的Bert获取更具领域语义的文本向量;利用在嵌入层发起多轮对抗的方式,实现模型层面的数据增强,减少模型对标注样本规模的依赖;采用单层指针网络获取头实体对头实体的类别进行判定,利用维基百科知识库对武器装备领域的实体类别解释信息的向量,对武器装备类别信息以字为最小粒度进行融合,缓解分层标注的天然缺陷;最后在横纵两个维度基于不同粒度的序列标注实现复杂三元组的抽取.在武器装备领域的数据集上精准率达到88.54%,召回率达到75.88%,F1值达到81.72%,取得了SOTA效果.实验表明提出的RDA方法对武器装备领域的信息利用更加充分,有效地缓解武器装备领域遇到的单实体重叠问题(SEO)和实体对重叠(EPO)问题.  相似文献   

3.
地理信息与数据是客观知识世界的重要组成部分。研究如何从大量非结构化的信息中自动抽取地理实体位置关系具有重要意义。提出一种基于语义文法的地理实体位置关系获取方法,该方法可准确地从网页文本中获取多个地理实体之间的复合位置关系。首先,设计一种反映地理实体位置关系的语义文法GeoRSG。GeoRSG反映了地理实体位置关系的层次分类关系,并采用基于规则的方式刻画地理实体位置关系在文本中的语言表达方式。然后,实现地理实体位置关系解析器GeoRSG Parser。该解析器利用GeoRSG对文本进行解析,获得谓词表达形式的位置关系知识。实验结果显示,该方法从1000条语句中获取了81条三元和816条二元地理实体位置关系,并且取得了88.85%的正确率。  相似文献   

4.
《计算机工程》2017,(3):204-212
针对数学表达式符号种类繁多、结构复杂多变、语法语义丰富等特点,提出一种检索结果相关排序算法,利用犹豫模糊集在处理多特征、多隶属度模式方面的优势,计算数学表达式间的相似度,实现基于相似度的数学表达式检索结果的相关排序。通过归纳数学表达式的符号、结构、语法、语义方面的特征,建立数学表达式的相似度函数,对数学表达式检索系统中用户查询式与检索结果集中数学表达式之间的相似程度进行综合多视角的测量。实验结果表明,该算法能实现数学表达式检索系统结果数据的有序输出,有助于改善数学表达式检索系统的性能。  相似文献   

5.
命名实体关系抽取是信息抽取领域中的重要研究课题。本文采用基于特征向量的机器学习算法支持向量机(SVM)进行实体关系抽取实验。在现有的算法中,特征提取方法以基于关键词集的向量空间模型为主。本文提出一种基于语义的文本特征提取方法,并且在关系抽取实验中取得较好的效果。实验证明将语义特征应用到关系抽取领域中可以明显提高性能。  相似文献   

6.
针对为检索服务的语义知识库存在的内容不全面和不准确的问题,提出一种基于维基百科的软件工程领域概念语义知识库的构建方法;首先,以SWEBOK V3概念为标准,从维基百科提取概念的解释文本,并抽取其关键词表示概念的语义;其次,通过概念在维基百科中的层次关系、概念与其它概念解释文本关键词之间的链接关系、不同概念解释文本关键词之间的链接关系构建概念语义知识库;接着, LDA主题模型分别和TF-IDF算法、TextRank算法相结合的两种方法抽取关键词;最后,对构建好的概念语义知识库用随机游走算法计算概念间的语义相似度;将实验结果与人工标注结果对比发现,本方法构建的语义知识库语义相似度准确率能够达到84%以上;充分验证了所提方法的有效性。  相似文献   

7.
基于最大熵模型的本体概念获取方法   总被引:1,自引:0,他引:1       下载免费PDF全文
本体是语义检索的核心。本体构建主要包括领域概念获取和概念间关系获取,其中领域概念获取是本体构建的基础。采用基于最大熵模型的方法来获取概念,通过对领域文本进行挖掘而得到名词性短语,使用改进的TF-IDF公式从中抽取具有领域性的短语,并经人工修正后得到本体概念。实验表明该方法提高了概念的准确性和完整性。  相似文献   

8.
姜琳  李宇  卢汉  曹存根 《计算机科学》2007,34(12):151-156
文本知识获取(Knowledge acquisition from text,简称KAT)是知识工程中的一个重要研究课题。重点研究如何从大规模Web网页文本中获取地理实体概念及其位置关系知识,本文首先介绍了如何自动和半自动地获取这些地理实体概念及其位置关系的文法模式,建立文法模式库;然后基于文法模式库获取例句来抽取候选概念并进行概念验证;最后利用基于图论的方法构造位置关系图,利用地理领域特定规则进行分析验证。作为统一概念图管理下概念空间的一个重要组成部分,地理实体概念及其位置关系本身不仅是知识库的一个重要部分,还可为知识库中其它领域的知识提供支持。  相似文献   

9.
在研究文本倾向性识别方法的基础上,分别实现基于文本分类、基于语义规则模式和基于情感词的倾向性分析算法.研究情感本体构建和基于HowNet与主题领域语料的情感概念选择方法,两者结合能提高情感本体中概念的全面性和领域针对性.利用情感本体抽取特征词并判断其情感倾向度,结合句法规则及程度副词影响,用特征情感倾向度作为特征权重,采用机器学习的方法对主题网络舆情web文本进行倾向性分析.实验表明,其分析结果有更高的准确率和召回率,实现方案的普遍性和稳定性值得进一步研究.  相似文献   

10.
基于维基百科的领域历史沿革信息抽取   总被引:1,自引:0,他引:1  
赵佳鹏  林民 《计算机应用》2015,35(4):1021-1025
针对在软件工程的教学过程中,由于领域概念种类多、演变快,导致学生理解记忆困难的问题,提出了通过抽取软件工程领域历史沿革主题信息构建知识库的方法。该方法首先结合自然语言处理技术与Web信息抽取技术从维基百科的自由文本中抽取实体与实体关系构建候选集;再利用关键词抽取方法TextRank从候选集中抽取与历史沿革关系最密切的实体关系;最后以关键实体关系为核心,抽取邻近的时间实体与概念实体组成五元组构建了知识库。在抽取信息的过程中,结合文本的语义信息对TextRank算法进行了改进,提高了抽取的准确率。实验结果表明,该知识库能够将软件工程领域的概念按时序特征组织在一起,验证了所提方法的有效性。  相似文献   

11.
Semantic entities carry the most important semantics of text data. Therefore, the identification and the relationship integration of semantic entities are very important for applications requiring semantics of text data. However, current strategies are still facing many problems such as semantic entity identification, new word identification and relationship integration among semantic entities. To address these problems, a two-phase framework for semantic entity identification with relationship integration in large scale text data is proposed in this paper. In the first semantic entities identification phase, we propose a novel strategy to extract unknown text semantic entities by integrating statistical features, Decision Tree (DT), and Support Vector Machine (SVM) algorithms. Compared with traditional approaches, our strategy is more effective in detecting semantic entities and more sensitive to new entities that just appear in the fresh data. After extracting the semantic entities, the second phase of our framework is for the integration of Semantic Entities Relationships (SER) which can help to cluster the semantic entities. A novel classification method using features such as similarity measures and co-occurrence probabilities is applied to tackle the clustering problem and discover the relationships among semantic entities. Comprehensive experimental results have shown that our framework can beat state-of-the-art strategies in semantic entity identification and discover over 80% relationship pairs among related semantic entities in large scale text data.  相似文献   

12.
Data mining algorithms such as data classification or clustering methods exploit features of entities to characterise, group or classify them according to their resemblance. In the past, many feature extraction methods focused on the analysis of numerical or categorical properties. In recent years, motivated by the success of the Information Society and the WWW, which has made available enormous amounts of textual electronic resources, researchers have proposed semantic data classification and clustering methods that exploit textual data at a conceptual level. To do so, these methods rely on pre-annotated inputs in which text has been mapped to their formal semantics according to one or several knowledge structures (e.g. ontologies, taxonomies). Hence, they are hampered by the bottleneck introduced by the manual semantic mapping process. To tackle this problem, this paper presents a domain-independent, automatic and unsupervised method to detect relevant features from heterogeneous textual resources, associating them to concepts modelled in a background ontology. The method has been applied to raw text resources and also to semi-structured ones (Wikipedia articles). It has been tested in the Tourism domain, showing promising results.  相似文献   

13.
知识库问答实体链接任务需要将问句内容精准链接到知识库中实体.当前方法大多难以兼顾链接实体的召回率和精确率,并且仅能根据文本信息对实体进行区分筛选.因此,文中在合并子步骤的基础上,提出融合多维度特征的知识库问答实体链接模型(MDIIEL).通过表示学习方法,将文本符号、实体和问句类型、实体在知识库中语义结构表达等信息整合并引至实体链接任务中,加强对相似实体的区分,在提高准确率的同时降低候选集的大小.实验表明,MDIIEL模型在实体链接任务性能上具有整体性提升,在大部分指标上取得较优的链接结果.  相似文献   

14.
为了从中英文混排的中文文档中定位数学公式,提出了一种基于中文字符识别和公式符号识别的数学公式定位方法。该方法主要由中文字符提取、内嵌公式提取和独立公式定位三个部分组成。在中文字符提取中,首先提取字符块信息中文字符识别结果、公式符号识别结果和字符块的几何特征,然后使用决策树的方法区分中文字符和非中文字符。在内嵌公式提取中,使用公式符号的语义信息、符号间的角标关系和公式的语义信息等从非中文字符中定位内嵌公式。在独立数学公式定位中,对包含较多内嵌公式符号且不包含中文字符的文字行提取版式结构特征,并使用高斯混合模型区分独立公式和普通文字行。在148幅文档图像共包含3 690个公式组成的测试集上取得了91.19%的公式定位正确率。  相似文献   

15.
Domain ontologies facilitate the organization, sharing and reuse of domain knowledge, and enable various vertical domain applications to operate successfully. Most methods for automatically constructing ontologies focus on taxonomic relations, such as is-kind-of and is-part-of relations. However, much of the domain-specific semantics is ignored. This work proposes a semi-unsupervised approach for extracting semantic relations from domain-specific text documents. The approach effectively utilizes text mining and existing taxonomic relations in domain ontologies to discover candidate keywords that can represent semantic relations. A preliminary experiment on the natural science domain (Taiwan K9 education) indicates that the proposed method yields valuable recommendations. This work enriches domain ontologies by adding distilled semantics.  相似文献   

16.
An efficient method is introduced to represent large Arabic texts in comparatively smaller size without losing significant information. The proposed method uses the distributional semantics to build the word-context matrix representing the distribution of words across contexts and to transform the text into a vector space model (VSM) representation based on word semantic similarity. The linguistic features of the Arabic language, in addition to the semantic information extracted from different lexical-semantic resources such as Arabic WordNet and named entities’ gazetteers are used to improve the text representation and to create word clusters of similar and related words. Distributional similarity measures have been used to capture the words’ semantic similarity and to create clusters of similar words. The conducted experiments have shown that the proposed method significantly reduces the size of text representation by about 27 % compared with the stem-based VSM and by about 50 % compared with the traditional bag-of-words model. Their results have shown that the amount of dimension reduction depends on the size and shape of the windows of analysis as well as on the content of the text.  相似文献   

17.
基于知识图谱的网络安全动态预警方法,能够主动感知和应对网络安全攻击,增强感知的实时性和精准性。然而,在构建网络安全知识图谱的实体抽取过程中,传统的命名实体识别工具和方法无法识别网络安全领域中的特定类别实体,文本中的未登录和中英文混合的网络安全实体也难以被准确识别。网络安全文本中的网络安全命名实体存在中英文混合、单词缩写等问题,仅基于字的命名实体识别方法难以充分表征字或词的语义信息。因此,论文考虑中英文更细粒度的部件语义捕捉字或词的语义特征,提出一种基于部件CNN的网络安全命名实体识别方法(C C-NS-NER),利用部件CNN抽取词语部件特征中的关键语义特征,丰富字词级别的语义信息,并引入BiLSTM-CRF确保抽取字向量和部件特征中的抽象信息,同时获取标签之间的关联信息,识别文本中的网络安全命名实体。在人工标注的网络安全数据集上的实验结果表明,该方法相较于传统模型,能有效获取字或词的部件语义信息,显著提高网络安全命名实体识别的效果。  相似文献   

18.
The nation’s massive underground utility infrastructure must comply with a multitude of regulations. The regulatory compliance checking of underground utilities requires an objective and consistent interpretation of the regulations. However, utility regulations contain a variety of domain-specific terms and numerous spatial constraints regarding the location and clearance of underground utilities. It is challenging for the interpreters to understand both the domain and spatial semantics in utility regulations. To address the challenge, this paper adopts an ontology and rule-based Natural Language Processing (NLP) framework to automate the interpretation of utility regulations – the extraction of regulatory information and the subsequent transformation into logic clauses. Two new ontologies have been developed. The urban product ontology (UPO) is domain-specific to model domain concepts and capture domain semantics on top of heterogeneous terminologies in utility regulations. The spatial ontology (SO) consists of two layers of semantics – linguistic spatial expressions and formal spatial relations – for better understanding the spatial language in utility regulations. Pattern-matching rules defined on syntactic features (captured using common NLP techniques) and semantic features (captured using ontologies) were encoded for information extraction. The extracted information elements were then mapped to their semantic correspondences via ontologies and finally transformed into deontic logic (DL) clauses to achieve the semantic and logical formalization. The approach was tested on the spatial configuration-related requirements in utility accommodation policies. Results show it achieves a 98.2% precision and a 94.7% recall in information extraction, a 94.4% precision and a 90.1% recall in semantic formalization, and an 83% accuracy in logical formalization.  相似文献   

19.
A high-level electrical energy ontology with weighted attributes   总被引:1,自引:0,他引:1  
One of the significant application areas of domain ontologies is known to be text analysis applications like information extraction and text classification systems, and semantic portals. In this paper, we present a high-level ontology for the electrical energy domain. This domain ontology has weighted attributes to cover the inherent fuzziness in the textual representations of its concepts. Additionally, we have included in the ontology the necessary attributes to align the ontology concepts to on-line collaborative knowledge bases like Wikipedia and linked open data sources like DBpedia, other attributes to facilitate its use in multilingual applications, and concepts to hold the named entities in the domain. The ultimate ontology is aligned with the previously proposed ontologies for the energy-related subdomains after extending the latter ones with weighted attributes. We make the ultimate form of the electrical energy ontology, as well as the extended versions of the domain ontologies for the subdomains, available for research purposes. Also included in the paper are sample text analysis applications which mainly exploit the weighted attributes within the ontology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号