基于数据挖掘中文书目自动分类算法 |
| |
引用本文: | 纪纲,王海东,陈小飞. 基于数据挖掘中文书目自动分类算法[J]. 计算机测量与控制, 2018, 26(5): 237-241 |
| |
作者姓名: | 纪纲 王海东 陈小飞 |
| |
作者单位: | 海军航空大学舰面航空保障与场站管理系,海军航空大学舰面航空保障与场站管理系,海军航空大学舰面航空保障与场站管理系 |
| |
摘 要: | 提出一种改进的数据挖掘算法。首先采用ICTCLAS系统进行文本预处理,以词频特征构建词条向量;然后融合词频特征和词频-逆向文件频率特征,构建训练样本集的特征矩阵;接着对该矩阵进行奇异值分解变换,得到语义空间,用于对文本特征向量进行语义空间变换,得到语义向量;最后构建联合支持向量机分类器,实现中文书目所对应的语义向量的自动分类。最后做了大量的仿真实验,实验结果表明,本文方法的分类准确率高于现有方法。
|
关 键 词: | 数据挖掘 中文书目分类 文本挖掘 支持向量机 |
收稿时间: | 2018-02-23 |
修稿时间: | 2018-03-14 |
Automatic classification algorithm of Chinese bibliography based on Data Mining |
| |
Abstract: | an improved algorithm for data mining is proposed. The first use of ICTCLAS system for text preprocessing, construct the term vector in frequency characteristics; then the fusion frequency characteristics and frequency - inverse document frequency features, construct the characteristic matrix of the training sample set; then the matrix singular value decomposition, get the semantic space for semantic space transform of text feature vector, semantic vector; the construction of combined support vector machine classifier, automatic classification of semantic vector corresponding to the Chinese bibliography. At last, a lot of simulation experiments have been done, and the experimental results show that the classification accuracy of this method is higher than that of the existing methods. |
| |
Keywords: | data mining Chinese bibliography classification text mining support vector machine |
|
| 点击此处可从《计算机测量与控制》浏览原始摘要信息 |
|
点击此处可从《计算机测量与控制》下载免费的PDF全文 |
|