首页 | 本学科首页   官方微博 | 高级检索  
     

领域术语自动抽取及其在文本分类中的应用
引用本文:刘桃,刘秉权,徐志明,王晓龙.领域术语自动抽取及其在文本分类中的应用[J].电子学报,2007,35(2):328-332.
作者姓名:刘桃  刘秉权  徐志明  王晓龙
作者单位:哈尔滨工业大学计算机科学与技术学院,黑龙江哈尔滨 150001
摘    要:本文提出了一种基于信息熵的领域术语抽取方法,在给定领域分类语料的前提下,该方法既考虑了领域术语在不同领域类别间分布的不均匀性,又考虑了其在特定领域类别内分布的均匀性,并针对语料的不平衡性进行了正规化.人工评测显示该方法能更准确有效地抽取领域术语.本文还将该算法应用于文本分类,用于代替传统特征选择算法,实验表明,该算法能够显著提高文本分类的精度.

关 键 词:领域术语  信息熵  正规化  文本分类  特征选择  
文章编号:0372-2112(2007)02-0328-05
收稿时间:2005-10-21
修稿时间:2005-10-212006-11-23

Automatic Domain-Specific Term Extraction and Its Application in Text Classification
LIU Tao,LIU Bing-quan,XU Zhi-ming,WANG Xiao-long.Automatic Domain-Specific Term Extraction and Its Application in Text Classification[J].Acta Electronica Sinica,2007,35(2):328-332.
Authors:LIU Tao  LIU Bing-quan  XU Zhi-ming  WANG Xiao-long
Affiliation:School of Computer Science and Technology,Harbin Institute of Technology,Harbin,Heilongjiang 150001,China
Abstract:A statistical method based on information entropy is proposed for domain-specific term extraction from domain comparative corpora. It takes into account the distribution of a candidate word among domains and within a certain domain. Normalization step is added into the extraction process to cope with unbalanced corpora. The proposed method characterizes attributes of domain-specific term more precisely and more effectively than previous term extraction approaches.Domain-specific terms are applied in text classification as the feature space.Experimental results indicate that it achieves better performance than traditional feature selection methods.
Keywords:domain-specific term  information entropy  normalization  text classification  feature selection
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《电子学报》浏览原始摘要信息
点击此处可从《电子学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号