基于多重启发式规则的中文文本特征值提取方法 An Eigenvalue Extraction Method for Chinese Texts Using Multiple Heuristic Rules期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于多重启发式规则的中文文本特征值提取方法

引用本文：	邹娟,周经野,邓成,刘玲.基于多重启发式规则的中文文本特征值提取方法[J].计算机工程与科学,2006,28(8):78-80.

作者姓名：	邹娟周经野邓成刘玲

作者单位：	湘潭大学信息工程学院,湖南,湘潭,411105

摘要：	本文根据中文文本的特点，以一种新的同义概念来替代传统的词为单位，并给出了同义概念之间权值的全新计算方法。我们不仅考虑了文本中词汇概率信息，还结合文本语义等多方面来提取文本特征值，从而提出了一种基于多重启发式规则的中文文本特征值提取方法，并给出了特征值提取模型和算法。通过与传统特征值提取方法的比较实验，证证明本文中提出的特征值提取方法能有效地提高文本分类正确率，并达到了有效降低特征向量维数的目的。
关键词：	文本分类特征值提取自然语言处理
文章编号：	1007-130X(2006)07-0078-03
修稿时间：	2004年12月22
An Eigenvalue Extraction Method for Chinese Texts Using Multiple Heuristic Rules

ZOU Juan,ZHOU Jing-ye,DENG Cheng,LIU Ling.An Eigenvalue Extraction Method for Chinese Texts Using Multiple Heuristic Rules[J].Computer Engineering & Science,2006,28(8):78-80.

Authors:	ZOU Juan ZHOU Jing-ye DENG Cheng LIU Ling

Abstract:	Based on the characteristics of Chinese texts, we propose a method of Chinese text eigenvalue extraction using multiple heuristic rules in this paper. The eigenvalues of texts are extracted according to many aspects. In the extraction we substitute the traditional word with a kind of new synonym conception as the units of eigenvalue and consider not only the appearance rates of words but also the semantic information in the text. And, the model and algorithm of eigenvalue extraction are provided in this paper. Finally, we present the results of experiments comparing with traditional extraction methods using the appearance rate of words in the text, which illustrates that the method of this paper improves the correctness rate of text categorization and reduces the dimensions effectively.

Keywords:	text categorization eigenvalue extraction natural language processing
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与科学》浏览原始摘要信息
	点击此处可从《计算机工程与科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏