首页 | 本学科首页   官方微博 | 高级检索  
     

无词典中文特征词自动抽取的桥接模式滤除算法*
引用本文:宣照国,党延忠. 无词典中文特征词自动抽取的桥接模式滤除算法*[J]. 计算机应用研究, 2007, 24(7): 168-170
作者姓名:宣照国  党延忠
作者单位:大连理工大学,系统工程研究所,辽宁,大连,116023
摘    要:提出一种不依赖于词典的抽取文本特征词的桥接模式滤除算法(BPFA).该算法统计文本中的汉字结合模式及其出现频率,通过消除桥接频率得到模式的支持频率,并依此来判断和提取正确词语.实验结果显示,BPFA能够有效提高分词结果的查准率和查全率.该算法适用于对词语频率敏感的中文信息处理应用,如文本分类、文本自动摘要等.

关 键 词:自动分词  桥接模式滤除算法  中文信息处理  无词典  中文特征词自动抽取  桥接模式  滤除算法  Thesaurus  Extraction  Chinese Words  Filtering Algorithm  文本自动摘要  文本分类  处理应用  中文信息  敏感  查全率  查准率  分词  显示  结果  实验  词语
文章编号:1001-3695(2007)07-0168-03
修稿时间:2006-04-052006-06-06

Bridge connection Patterns Filtering Algorithm for Chinese Words Extraction Without Thesaurus
XUAN Zhao guo,DANG Yan zhong. Bridge connection Patterns Filtering Algorithm for Chinese Words Extraction Without Thesaurus[J]. Application Research of Computers, 2007, 24(7): 168-170
Authors:XUAN Zhao guo  DANG Yan zhong
Affiliation:Institute of Systems Engineering, Dalian University of Technology, Dalian Liaoning 116023, China
Abstract:This paper put forward a bridge-connection patterns filtering algorithm(BPFA) for extracting high-frequency words without thesaurus.Firstly,the frequencies of co-occurrence patterns of Chinese characters were counted from documents,then the bridge-connection frequencies were eliminated and therefore obtains the support frequencies of patterns.Afterwards,the words were identified and acquired according to the support frequencies instead of the primary appearing frequencies.The experimental results show that BPFA can improve both precision and recall of extracted lexical set to some extent.This algorithm can be applied to text categorization and automatic summarization.
Keywords:automatic word segmentation  bridge-connection patterns filtering algorithm  Chinese information processing
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号