首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
基于统计和规则相结合的科技术语自动抽取研究   总被引:4,自引:0,他引:4  
科技术语自动抽取是中文信息处理领域的一个重要研究课题,在信息检索、机器翻译等领域,特别是在专利翻译中有着广泛应用。结合专利翻译任务,主要研究专利中科技术语的识别方法,在分析目前已有方法的基础之上,提出了一种使用条件随机场模型进行标注识别,并结合规则对错误识别结果进行后处理的科技术语识别方法。实验结果表明,提出的统计和规则相结合的识别方法是有效的,开放测试结果F值达到了84.4%。  相似文献   

2.
基于聚类方法对特定领域术语的自动筛选   总被引:1,自引:0,他引:1       下载免费PDF全文
从大规模无标注的文本中获取特定领域的术语词典,通常采用的方法是从术语抽取器得到相关术语,而后使用手工的方式进行术语筛选,得到相关领域的术语。这需要大量的人力物力,并且标准无法统一。本文提出了一种利用CBC聚类方法从抽取的术语文本中自动别除非此领域的术语,并且通过对训练语料库文本的不断丰富,还可以对新词进行
识别,以扩大该领域的术语集。最后,通过对实验结果进行评测,显示了CBC聚类方法对术语筛选的良好效果。  相似文献   

3.
冯艳红  于红  孙庚  赵禹锦 《计算机应用》2016,36(11):3146-3151
针对基于统计特征的领域术语识别方法忽略了术语的语义和领域特性,从而影响识别结果这一问题,提出一种基于词向量和条件随机场(CRF)的领域术语识别方法。该方法利用词向量具有较强的语义表达能力、词语与领域术语之间的相似度具有较强的领域表达能力这一特点,在统计特征的基础上,增加了词语的词向量与领域术语的词向量之间的相似度特征,构成基于词向量的特征向量,并采用CRF方法综合这些特征实现了领域术语识别。最后在领域语料库和SogouCA语料库上进行实验,识别结果的准确率、召回率和F测度分别达到了0.9855、0.9439和0.9643,表明所提的领域术语识别方法取得了较好的效果。  相似文献   

4.
一种基于领域本体的新术语扩充方法   总被引:1,自引:0,他引:1       下载免费PDF全文
提出一种基于领域本体的新术语扩充方法。结合传统基于统计和基于规则的方法,计算词语在文档中的影响程度,使用领域本体体现领域知识,通过在文档中识别出本体中概念计算文档及词语的领域相关度,获得术语候选项的推荐排序,对术语候选项结果进行优化。实验结果证明了该方法的有效性和可行性。  相似文献   

5.
领域术语是各个领域的核心词汇,在研究了大量领域文献的基础上,提出了一种识别领域术语的方法。该方法以现有成熟工具为依托,使用条件随机场模型统计领域术语的词性组合概率。在选定特征集后,通过调整特征和窗口的组合,制定一个最优特征模板,同时通过10倍交叉验证法确定模型训练参数。实验结果表明,通过条件随机场模型分析领域术语的词性组合概率能够有效地识别领域术语。  相似文献   

6.
一种适用于复合术语的本体概念学习方法   总被引:1,自引:0,他引:1  
术语的提取显然在本体概念学习中起着重要作用,由于汉语文本中词与词之间没有明显的界限,使得领域术语特别是复合术语的提取尤为困难。针对传统提取方法缺乏语义支持、计算量大、准确率低等不足,提出了一种适用于复合术语提取的本体概念学习方法。首先利用自然语言处理技术过滤掉与术语无关的成分,对语句进行自然切割,为领域术语提取提供完整的候选数据集,以保证候选领域复合术语不被误分。在此基础上,根据术语的领域统计和分布特征,利用术语频率和信息熵进行多策略的领域术语筛选,经同义术语识别与合并,获得领域概念集。经实验验证,提出的方法能够以较高的准确率从领域文本中提取出领域单词术语和复合术语。  相似文献   

7.
基于排序集成的自动术语识别方法   总被引:1,自引:0,他引:1  
自动术语识别是信息抽取和文本挖掘等领域的关键步骤之一。基础自动术语识别算法采用某些方面的特征信息,有明显的局限性,引入局部Kemeny最优的方法来处理自动术语识别问题,并提出新的集成方法。实验结果表明该方法显著改善了自动术语识别的精准度。  相似文献   

8.
专利术语自动抽取是知识抽取与文本挖掘的关键环节。在构建专利文献停用词表以及提取特定规则的基础上,抽取候选专利术语;通过分析专利术语与其所在句子的关联关系、相邻专利术语之间的影响以及常识性词语对专利术语抽取的干扰,分别提出基于PageRank思想的STRank权重计算方法、专利术语区别度计算方法以及知网义原信息降权方法,并融合上述方法对专利术语进行抽取。采用传感器领域的专利文献进行实验,在top-1400、top-1600级别上正确率为80.5%、79.7%,相对比CS+CC+CD方法分别提高了11.4%、9.5%。实验结果证明该多策略融合方法的有效性。  相似文献   

9.
采用CRF技术的军事情报术语自动抽取研究   总被引:3,自引:0,他引:3       下载免费PDF全文
针对军事情报领域,提出了一种基于条件随机场的术语抽取方法,该方法将领域术语抽取看作一个序列标注问题,将领域术语分布的特征量化作为训练的特征,利用CRF工具包训练出一个领域术语特征模板,然后利用该模板进行领域术语抽取。实验采用的训练语料来自“搜狐网络军事频道”的新闻数据,测试语料选取《现代军事》杂志2007年第1~8期的所有文章。实验取得了良好的结果,准确率为73.24%,召回率为69.57%,F-测度为71.36%,表明该方法简单易行,且具有领域通用性。  相似文献   

10.
《微型机与应用》2017,(21):51-53
生物医学文献中的疾病命名实体识别问题是疾病相关的生物信息学分析基础,疾病命名实体中的医学术语识别和边界确定是该问题的难点和关键。文中提出了一种CRF(Conditional Random Field)与词典相结合的疾病命名实体识别方法。该方法利用网络资源来构建含有语义信息的医学术语词典,并使用该词典对医学术语进行识别,获得医学术语的语义信息,然后CRF结合这些信息对疾病命名实体进行识别。实验结果表明该方法有效。  相似文献   

11.
由于上海市区域医疗健康平台整合了38家三级医院的电子病历,各医院表述同一临床检验指标的多样性和歧义性已严重影响病历挖掘研究。然而现有术语库理论性强,难以覆盖实际临床用语,需要构建融合38家医院的临床检验指标术语库。针对该问题,在模式图定义、知识抽取、知识融合和知识校验4个步骤基础上,提出半自动的术语库构建方案,以上海卫健委制定的医保术语为标准,先构建标准指标术语子库,再利用基于BERT的临床检验指标对齐模型,将38家医院的指标作为同义词归入标准术语。最终形成的指标术语库包含23 495个实体和47 746条事实三元组,可用于病历清洗、病历查询等应用。实验表明,所用指标对齐模型的F1-score可达95.78%,在大肠癌挖掘课题中使用术语库可增加查询记录高达94%。此外,大肠癌相关指标的专病术语库已在dcakb.ecustnlplab.com公开。  相似文献   

12.
We propose a semi-automatic tool, termight, that supports the construction of bilingual glossaries. Termight consists of two components which address the two subtasks in glossary construction: (a) preparing a monolingual list of all technical terms in a source-language document, and (b) finding the translations for these terms in parallel source–target documents. As a first step (in each component) the tool extracts automatically candidate terms and candidate translations, based on term-extraction and word-alignment algorithms. It then performs several additional preprocessing steps which greatly facilitate human post-editing of the candidate lists. These steps include grouping and sorting of candidates and associating example concordance lines with each candidate. Finally, the data prepared in preprocessing is presented to the user via an interactive interface which supports quick post-editing operations. Termight was deployed by translators at AT & T Business Translation Services (formerly AT & T Language Line Services) leading to very high rates of semi-automatic glossary construction.  相似文献   

13.
本文针对于专利术语的特点,建立了一个基于属于模板的术语自动翻译系统。系统中结合了翻译模型和相似度打分机制,实验结果表明该方法能够解决术语翻译中的固定术语翻译问题和基于中心词序调整问题,提高了统计机器防疫的质量。同时术语模板中引用了单一的统计翻译中增加了语言学知识,为翻译提供了一个有效手段。  相似文献   

14.
ABSTRACT

Security is an enduring priority for both individuals and communities. Methods such as locks, fences, identity cards, and passbooks have been used for many years to provide security against physical attack, crime, espionage, and terrorism. As a result, many national governments, standards organizations, think tanks, and commentators have proposed security methods. None of these methods provides enduring and effective responses to the serious security challenges faced today. Shortfalls in effectiveness derive from terminology that is inconsistent, incomplete, confusing, or contains language that is specific to the physical, personnel, or electronic domains.

This article presents harmonized taxonomies for security and resilience that can be applied across the physical, personnel, and electronic domains. These taxonomies provide an ordered set of terms to organize thinking, and facilitate data and information sharing throughout the security discipline.

Functional decomposition is used to derive the new taxonomies of security and resilience. Case studies that span the physical, personnel, and electronic security domains are used to provide the experimental context to test the utility of the new taxonomies, using an established security risk assessment framework. The utility of the new taxonomies is further validated by the results of a survey of senior security experts.  相似文献   

15.
The language used to talk about computers is uniquely colorful and sometimes extraordinarily difficult. This paper examines ‘computer discourse’ and points out its highly metaphorical nature. While the use of metaphor is unavoidable, it often leads, especially in informal settings, to the mannered use of words we call jargon. Metaphor becomes jargon when it is used too literally in a self-conscious manner. Experts often use their metaphors as though they were literally true. Technical details fall away and the metaphor is taken for the reality it represents. The less expert sometimes mimic the language they hear, in a self-conscious manner, without truly understanding it.  相似文献   

16.
现有基于模式的术语翻译系统存在2个主要缺点,即学习过程依赖人工标定语料和缺乏对模式的评分以及对候选术语的评分太简单。该文将self-training学习机制引入术语翻译系统,在一对训练语料上完成初始学习,在实际运行中自动选择可靠程度较高的术语重新训练,以改进系统性能。该系统中增加了对模式的评分,利用启发规则,扩充了候选术语的评分方法。实验结果表明,改进后系统的性能高于原有系统。  相似文献   

17.
Ontologies are recognised as important tools, not only for effective and efficient information sharing, but also for information extraction and text mining. In the biomedical domain, the need for a common ontology for information sharing has long been recognised, and several ontologies are now widely used. However, there is confusion among researchers concerning the type of ontology that is needed for text mining , and how it can be used for effective knowledge management, sharing, and integration in biomedicine. We argue that there are several different ways to define an ontology and that, while the logical view is popular for some applications, it may be neither possible nor necessary for text mining. We propose a text-centered approach for knowledge sharing, as an alternative to formal ontologies. We argue that a thesaurus (i.e. an organised collection of terms enriched with relations) is more useful for text mining applications than formal ontologies.  相似文献   

18.
This study analyzed and organized the content coverage of the clinical care classification (CCC) system to represent nursing record data in a medical center in Taiwan. The nursing care plan was analyzed using the process of knowledge discovery in the data set. The nursing documentation was mapped based on the full list of nursing diagnoses and interventions available using the CCC system. The result showed that 75.45% of the documented diagnosis terms can be mapped using the CCC system. A total of 21 established nursing diagnoses were recommended for inclusion in the CCC system. The results also showed that 30.72% of assessment/monitor tasks and 31.16% of care/perform tasks were provided by nursing professionals, whereas manage/refer actions accounted for 15.36% of the tasks involved in nursing care. The results showed that the CCC system is a suitable clinical information system for the majority of nursing care documentation, and is useful for determining the patterns in nursing practices.  相似文献   

19.
非结构化数据的结构化任务是大数据环境下管理信息系统面临的新课题。该文从文体的角度研究自由文本的特性,提出了从Web新闻中抽取突发事件属性的方法,该方法首先分析研究了Web文本和新闻文体的特征,利用Google Word2Vec对领域专家构建的词表进行扩展,针对突发事件的不同属性制定了不同的抽取方法: 采用词表实现事件分类,采用文体特征进行时间、事件摘要的抽取,采用文体和词表进行地点、伤亡情况和经济损失属性的抽取。实验表明,采用基于文体和词表方法在爬取的Web新闻语料库和公开语料库进行突发事件的属性进行抽取时,平均准确率分别为87.89%、91.29%,平均召回率分别为81.76%、87.91%,能满足应急管理需求。  相似文献   

20.
This article begins by emphasizing the importance of terminology in this modern age of technical innovations and machine-based translation systems, establishing the need for a terminology interchange format, and distinguishing between lexicography and terminology. It then reviews previous attempts to establish terminology interchange formats and concludes with a forceful argument for a new system based on the TEI-based notions of elements and attributes.Alan K. Melby is Professor of Linguistics at Brigham Young University. He is involved in translation, technical communication, and computational language projects. He is the CEO of Linguatech International, the Chair of the Translation and Computers committee of the American Translators Association, and a member of the editorial board ofMachine Translation. He serves as Chair of the Terminology Workgroup of the Text Encoding Initiative, and as a member of the U.S. tag of ISO TC 37. He is an accredited translator (French to English).This paper is based on work performed in the TEI work group on Terminology, whose members include Alan Melby, Brigham Young U. (chair), Sue Ellen Wright, American Translators Association and other affiliations, Greg Shreve, Kent State U., Gerhard Budin, Infoterm, and Richard Strehlow, ASTM and other affiliations.Portions of this chapter appear in the following work: Melby, Alan, and Wright, Sue Ellen. 1995. Terminology Interchange. In:The Terminology Handbook, Sue Ellen Wright and Gerhard Budin, Eds. Amsterdam: John Benjamins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号