首页 | 本学科首页   官方微博 | 高级检索  
     

由排序支持向量机抽取博客文章的摘要
引用本文:何海江,陈姝.由排序支持向量机抽取博客文章的摘要[J].电子科技大学学报(自然科学版),2010,39(4):593-597.
作者姓名:何海江  陈姝
作者单位:1.长沙学院计算机系 长沙 410003;
基金项目:湖南省教育厅科学研究项目 
摘    要:提出了一种用平滑型排序支持向量机(Rank-sSVM)抽取博客文章摘要的方法。使用该排序算法抽取的摘要,反映了评论者的意见和博客文集的特性。自动摘要过程中,首先经人工从文章选择重要句子标记为摘要,作为训练对象;再由机器生成表示文章语句的特征集,共14个特征,包含标签、评论等博客文章独有的信息;最后用Rank-sSVM学习人工摘要后,将文章所有句子排序,选取最靠前的若干语句构成摘要。该方法在一个中文博客数据集上取得良好效果。

关 键 词:博客    评论    信息检索    排序学习    支持向量机    摘要
收稿时间:2008-11-28

Extraction of Blog Post Summarization by Using Ranking SVM
Affiliation:1.Department of Computer Science and Technology,Changsha University Changsha 410003;2.School of Information Science and Engineering,Central South University Changsha 410083
Abstract:A new approach is presented for blog post summarization based on ranking smooth support vector machine (Rank-sSVM). The use of ranking algorithm for this task allows one to adapt summaries to the commenter needs and to the blog corpus characteristics. To use Rank-sSVM, firstly, key sentences are extracted manually from blog posts as training samples. Feature set representing post sentences, which consist of 14 features including tag, comment and other unique blog information, is generated by machine. After all the sentences are ranked by the ranking model, the most important ones in front are selected to summarize the post. The experimental results show that the proposed method has good performance on Chinese blog datasets.
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《电子科技大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《电子科技大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号