领域术语自动抽取及其在文本分类中的应用 Automatic Domain-Specific Term Extraction and Its Application in Text Classification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

领域术语自动抽取及其在文本分类中的应用

引用本文：	刘桃,刘秉权,徐志明,王晓龙.领域术语自动抽取及其在文本分类中的应用[J].电子学报,2007,35(2):328-332.

作者姓名：	刘桃刘秉权徐志明王晓龙

作者单位：	哈尔滨工业大学计算机科学与技术学院,黑龙江哈尔滨 150001

摘要：	本文提出了一种基于信息熵的领域术语抽取方法,在给定领域分类语料的前提下,该方法既考虑了领域术语在不同领域类别间分布的不均匀性,又考虑了其在特定领域类别内分布的均匀性,并针对语料的不平衡性进行了正规化.人工评测显示该方法能更准确有效地抽取领域术语.本文还将该算法应用于文本分类,用于代替传统特征选择算法,实验表明,该算法能够显著提高文本分类的精度.
关键词：	领域术语信息熵正规化文本分类特征选择
文章编号：	0372-2112（2007）02-0328-05
收稿时间：	2005-10-21
修稿时间：	2005-10-212006-11-23
Automatic Domain-Specific Term Extraction and Its Application in Text Classification

LIU Tao,LIU Bing-quan,XU Zhi-ming,WANG Xiao-long.Automatic Domain-Specific Term Extraction and Its Application in Text Classification[J].Acta Electronica Sinica,2007,35(2):328-332.

Authors:	LIU Tao LIU Bing-quan XU Zhi-ming WANG Xiao-long

Affiliation:	School of Computer Science and Technology,Harbin Institute of Technology,Harbin,Heilongjiang 150001,China

Abstract:	A statistical method based on information entropy is proposed for domain-specific term extraction from domain comparative corpora. It takes into account the distribution of a candidate word among domains and within a certain domain. Normalization step is added into the extraction process to cope with unbalanced corpora. The proposed method characterizes attributes of domain-specific term more precisely and more effectively than previous term extraction approaches.Domain-specific terms are applied in text classification as the feature space.Experimental results indicate that it achieves better performance than traditional feature selection methods.

Keywords:	domain-specific term information entropy normalization text classification feature selection
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《电子学报》浏览原始摘要信息
	点击此处可从《电子学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏