重复串特征提取算法及其在文本聚类中的应用 Algorithm of Repeats-based Term Extraction and Its Application in Text Clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

重复串特征提取算法及其在文本聚类中的应用

引用本文：	胡吉祥,许洪波,刘悦,程学旗.重复串特征提取算法及其在文本聚类中的应用[J].计算机工程,2007,33(2):65-67.

作者姓名：	胡吉祥许洪波刘悦程学旗

作者单位：	1. 中国科学院计算技术研究所,北京,100080;中国科学院研究生院,北京,100039 2. 中国科学院计算技术研究所,北京,100080

基金项目：	国家重点基础研究发展计划(973计划)

摘要：	针对Web文档的高维问题及网络新语言给现有分词系统带来的挑战，该文提出一种基于重复串的特征提取方法，可以从文本中提取有意义的特征，且对于中文无需分词。实验表明，该方法可以降低特征空间维度，同时能有效改善传统以词为特征的聚类算法的性能。
关键词：	文本聚类特征提取重复串
文章编号：	1000-3428（2007）02-0065-03
修稿时间：	2006-03-21
Algorithm of Repeats-based Term Extraction and Its Application in Text Clustering

HU Jixiang,XU Hongbo,LIU Yue,CHENG Xueqi.Algorithm of Repeats-based Term Extraction and Its Application in Text Clustering[J].Computer Engineering,2007,33(2):65-67.

Authors:	HU Jixiang XU Hongbo LIU Yue CHENG Xueqi

Affiliation:	1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080; 2. Graduate School, Chinese Academy of Sciences, Beijing 100039

Abstract:	This paper proposes a novel term extraction method based on repeats,which can extract meaningful terms from text.For Chinese,it need not word segmentation.Experimental results show that the proposed approach can remarkably reduce the dimensionality and effectively improve the performance of traditional clustering algorithms.

Keywords:	Text clustering Term extraction Repeats
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏