首页 | 本学科首页   官方微博 | 高级检索  
     

基于词条属性聚类的文本特征选择算法
引用本文:张 群,王红军,王伦文.基于词条属性聚类的文本特征选择算法[J].计算机应用研究,2017,34(2).
作者姓名:张 群  王红军  王伦文
作者单位:电子工程学院,电子工程学院,电子工程学院
基金项目:国家自然科学基金资助项目
摘    要:文本挖掘之前首先要对文本集进行有效的特征选择,传统的特征选择算法在维数约减及文本表征方面效果有限,并且因需要用到文本的类别信息而不适用于无监督的文本聚类任务。针对这种情况,设计一种适用于文本聚类任务的特征选择算法,提出词条属性的概念,首先基于词频、文档频、词位置及词间关联性构建词条特征模型,重点研究了词位置属性及词间关联性属性的权值计算方法,改进了Apriori算法用于词间关联性属性权值计算,然后通过改进的k-means聚类算法对词条特征模型进行多次聚类完成文本特征选择。实验结果表明,与传统特征选择算法相比,该算法获得较好维数约减率的同时提高了所选特征词的文本表征能力,能有效适用于文本聚类任务。

关 键 词:文本特征选择  词条属性  词位置  词间关联性  关联规则算法  k均值算法
收稿时间:2016/1/25 0:00:00
修稿时间:2016/12/23 0:00:00

An algorithm of text feature selection based on vocabulary attribute clustering
Affiliation:Electronic Engineering Institute,Electronic Engineering Institute,Electronic Engineering Institute
Abstract:Effective text feature selection is the precondition of text mining. Conventional text feature selection method has limited effect on dimension of eigenvector reduction and text representation. Besides, conventional text feature selection method is not suitable for unsupervised text clustering. In view of above, this paper proposed a novel algorithm of text feature selection based on the concept of vocabulary attribute suitable for text clustering. Firstly, the algorithm constructed the model based on vocabulary attribute including term frequency, document frequency, term position and term correlation. Then it analyzed the approach to calculate attribute value in detail and improved Apriori algorithm to calculate attribute value of term correlation. Finally it clustered on the vocabulary attribute model by the improved k-means clustering algorithm to complete the text feature selection. Experimental results show that using our scheme can effectively reduce the dimension of eigenvector and improve the text representation capability of feature vocabulary compared to the traditional methods, and meet the actual demand for text clustering.
Keywords:text feature selection  vocabulary attribute  term position  term correlation  Apriori algorithm  k-means clustering algorithm
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号