首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于文本先分类再聚类的互联网热点信息发现方法
引用本文:张慷.一种基于文本先分类再聚类的互联网热点信息发现方法[J].兰州工业高等专科学校学报,2013(3):10-14.
作者姓名:张慷
作者单位:中国电信上海公司,上海200041
摘    要:针对互联网热点信息发现的需求,提出一种基于先分类再聚类的互联网信息热点发现及分析系统构建方法.通过对互联网样本信息文本的特征提取,构建文本向量空间模型,使用Maxent最大熵分类模型对文本进行分类,对分类结果使用OPTICS聚类算法获取文本热点簇,最终获取有效热点信息.实验证明,通过先分类再聚类的方法可以有效避免语义类别不同但字面意义混淆的文章对聚类算法的影响,有效提高聚类结果的精度和运算效率.

关 键 词:热点发现  最大熵模型  聚类  自动分类

An Approach to Detecting Internet Hotspot by Information Clustering Based on Text Classification
ZHANG Kang.An Approach to Detecting Internet Hotspot by Information Clustering Based on Text Classification[J].Journal of Lanzhou Higher Polytechnical College,2013(3):10-14.
Authors:ZHANG Kang
Affiliation:ZHANG Kang ( Shanghai Branch of China Teleeom Corporation Limited, Shanghai 200041, China)
Abstract:Aiming at the demand for Internet hotspot detection, this paper presents a method using information clustering based on text classification. With the Internet sample text feature extraction, text vector space model is built, to classify text using the Maximum Entropy Classification Modeling, then, to obtain the text hotspot cluster by OPTICS clustering algorithm according to the classification results, and ultimately get effective hotspot. Experiments prove that this method can effectively avoid semantic category confused by the literal meaning, effectively improve the precision of the clustering results and operational efficiency.
Keywords:hotspot detection  baximum entropy model  clustering  automatic classification
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号