一种基于文本先分类再聚类的互联网热点信息发现方法 An Approach to Detecting Internet Hotspot by Information Clustering Based on Text Classification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于文本先分类再聚类的互联网热点信息发现方法

引用本文：	张慷.一种基于文本先分类再聚类的互联网热点信息发现方法[J].兰州工业高等专科学校学报,2013(3):10-14.

作者姓名：	张慷

作者单位：	中国电信上海公司,上海200041

摘要：	针对互联网热点信息发现的需求,提出一种基于先分类再聚类的互联网信息热点发现及分析系统构建方法.通过对互联网样本信息文本的特征提取,构建文本向量空间模型,使用Maxent最大熵分类模型对文本进行分类,对分类结果使用OPTICS聚类算法获取文本热点簇,最终获取有效热点信息.实验证明,通过先分类再聚类的方法可以有效避免语义类别不同但字面意义混淆的文章对聚类算法的影响,有效提高聚类结果的精度和运算效率.
关键词：	热点发现最大熵模型聚类自动分类
An Approach to Detecting Internet Hotspot by Information Clustering Based on Text Classification

ZHANG Kang.An Approach to Detecting Internet Hotspot by Information Clustering Based on Text Classification[J].Journal of Lanzhou Higher Polytechnical College,2013(3):10-14.

Authors:	ZHANG Kang

Affiliation:	ZHANG Kang ( Shanghai Branch of China Teleeom Corporation Limited, Shanghai 200041, China)

Abstract:	Aiming at the demand for Internet hotspot detection, this paper presents a method using information clustering based on text classification. With the Internet sample text feature extraction, text vector space model is built, to classify text using the Maximum Entropy Classification Modeling, then, to obtain the text hotspot cluster by OPTICS clustering algorithm according to the classification results, and ultimately get effective hotspot. Experiments prove that this method can effectively avoid semantic category confused by the literal meaning, effectively improve the precision of the clustering results and operational efficiency.

Keywords:	hotspot detection baximum entropy model clustering automatic classification
本文献已被 CNKI 维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏