首页 | 本学科首页   官方微博 | 高级检索  
     

计算机领域术语的自动获取与层次构建
引用本文:林源,陈志泊,孙俏. 计算机领域术语的自动获取与层次构建[J]. 计算机工程, 2011, 37(2): 172-174. DOI: 10.3969/j.issn.1000-3428.2011.02.059
作者姓名:林源  陈志泊  孙俏
作者单位:1. 北京林业大学信息学院,北京,100083
2. 北京林业大学信息学院,北京,100083;北京航空航天大学计算机学院,北京,100191
基金项目:国家“863”计划基金资助项目(2006AA10Z232)
摘    要:设计一种能够自动获取计算机领域术语的方案,提出基于规则与统计相结合的抽取方法,使用亚马逊网站的计算机类图书作为语料库,通过分词、去停止词预处理以及词频统计的方法提取出计算机类领域术语,并插入到由ODP构建的树中,形成计算机领域术语的层次结构。实验结果表明,与人工标注结果相比,使用废方法自动获取的术语有很高的准确率与召回率。

关 键 词:计算机领域术语  术语获取  层次结构  ODP项目

Computer Domain Term Automatic Extraction and Hierarchical Structure Building
LIN Yuan,CHEN Zhi-bo,SUN Qiao. Computer Domain Term Automatic Extraction and Hierarchical Structure Building[J]. Computer Engineering, 2011, 37(2): 172-174. DOI: 10.3969/j.issn.1000-3428.2011.02.059
Authors:LIN Yuan  CHEN Zhi-bo  SUN Qiao
Affiliation:LIN Yuan1,CHEN Zhi-bo1,SUN Qiao1,2(1.School of Information Science and Technology,Beijing Forestry University,Beijing 100083,China,2.School of Computer Science and Engineering,Beihang University,Beijing 100191,China)
Abstract:This paper presents a computer domain term automatic extraction method based on rules and statistics.It uses computer book titles from Amazon.com website as corpus,data are preprocessed by words splitting,stop words and special characters filtering.Terms are extracted by a set of rules and frequency statistics and inserted into a word tree from ODP to build the hierarchical structure.Experimental results show high precision and recall of the automatically extracted results compared with manual tagged terms.
Keywords:computer domain term  term extraction  hierarchical structure  Open Directory Project(ODP)  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号