首页 | 本学科首页   官方微博 | 高级检索  
     

结构特征和内容分析融合的博客文章分类
引用本文:张永,王芳,张译匀.结构特征和内容分析融合的博客文章分类[J].计算机工程与应用,2013,49(5).
作者姓名:张永  王芳  张译匀
作者单位:兰州理工大学 计算机通信学院,兰州 730050
摘    要:针对博客文章内容上,包含多个主题,类别归属不明显,多为作者自己主观意见且结构上,包括不同于文本的标签,普通文本分类方法直接应用于博客文章效果不理想的问题,提出一种结构特征和内容分析融合的博客文章分类方法。内容上,通过迭代两种不同特征选择方法,提高特征集代表性的前提下,利用正文,标题两个方面分类.结构上,利用博客文章特有的标签分类,并将三个方面融合。实验结果表明,改进的分类方法有效地提高了博客文章分类的性能。

关 键 词:文本分类  博客文章分类  结构特征  内容分析

Structural characteristics and content analysis fusion for blog post classification
ZHANG Yong , WANG Fang , ZHANG Yiyun.Structural characteristics and content analysis fusion for blog post classification[J].Computer Engineering and Applications,2013,49(5).
Authors:ZHANG Yong  WANG Fang  ZHANG Yiyun
Abstract:Aiming at the problems of blog posts contents including multiple themes,unobvious categories ownership and more author's subjective views,structures including tags which are different from texts,common text classification methods not performing well,a new blog posts classification method is presented based on structural characteristics and content analysis.By taking into account blog posts content features,it iterates two different feature extraction methods to enhance the representative ability of feature collection effectively,makes use of main body and title classification.By taking into account the structural features of blog posts,it makes use of tags classification and finally fuses three aspects.The experimental results show that the performance of the improved method is obviously better than common text classification methods.
Keywords:text classification  blog post classification  structural characteristics  content analysis
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号