首页 | 本学科首页   官方微博 | 高级检索  
     

基于多策略的专业领域术语抽取器的设计
引用本文:杜波,田怀凤,王立,陆汝占.基于多策略的专业领域术语抽取器的设计[J].计算机工程,2005,31(14):159-160.
作者姓名:杜波  田怀凤  王立  陆汝占
作者单位:上海交通大学计算机系,上海,200030
摘    要:设计了一个将统计方法与规则方法相结合的专业领域内术语抽取算法。针对专业领域术语的特点,利用多种衡量字符串中各字之间结合“紧密程度”的统计量,先使用阈值分类器抽取出双字候选项;然后再对这些候选项向左右进行一定程度的扩充,从中筛选出符合要求的多字候选项;最后将所得候选项进行过滤,得到最终结果。据此实现了一个以未切分标注的生语料为输入、以专业领域术语为输出的抽取程序,在对多个领域内的语料进行测试后对实验结果进行分析,指出其中存在的问题,对未来的工作作出了展望。

关 键 词:自然语言处理  术语抽取  多策略
文章编号:1000-3428(2005)14-0159-02

Design of Domain-specific Term Extractor Based on Multi-strategy
DU Bo,TIAN Huaifeng,WANG Li,Lu Ruzhan.Design of Domain-specific Term Extractor Based on Multi-strategy[J].Computer Engineering,2005,31(14):159-160.
Authors:DU Bo  TIAN Huaifeng  WANG Li  Lu Ruzhan
Abstract:This paper designs a multi-strategy based term extracting algorithm combining both statistics-based and rule-based methods. Withmultiple statistics measuring relationship between words in a string, it firstly uses a threshold classifier to extract two-word candidates from rawcorpus, extends these candidates left and right to obtain multi-word candidate terms and at last filters these terms to get domain-specific terms, thefinal result. It implements an extractor with an unprocessed corpus as input and domain-specific terms as output according to this algorithm. Aftersome experiments on corpora from multiple domains, the paper analyzes the results, figures out problems in it and finally does some expectations.
Keywords:Natural language processing  Term extractor  Multi-strategy
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号