首页 | 本学科首页   官方微博 | 高级检索  
     

基于词项关联的短文本分类研究
引用本文:章 昉,颜华驹,刘明君,赵中英.基于词项关联的短文本分类研究[J].集成技术,2015,4(3):69-78.
作者姓名:章 昉  颜华驹  刘明君  赵中英
作者单位:天津海量信息技术有限公司;中国科学院深圳先进技术研究院;中山大学信息科学与技术学院
基金项目:深圳市知识创新计划基础研究项目(JCYJ20130401170306838)
摘    要:以短文本为主体的微博等社交媒体,因具备文本短、特征稀疏等特性,使得传统文本分类方法不能够高精度地对短文本进行分类。针对这一问题,文章提出了基于词项关联的短文本分类方法。首先对训练集进行强关联规则挖掘,将强关联规则加入到短文本的特征中,提高短文本特征密度,进而提高短文本分类精度。对比实验表明,该方法一定程度上减缓了短文本特征稀疏特点对分类结果的影响,提高了分类准确率、召回率和F1值。

关 键 词:数据挖掘  短文本  分类  关联规则

The Research of Short Texts Classification Based on Association Rules of Lexical Items
Authors:ZHANG Fang  YAN Huaju  LIU Mingjun and ZHAO Zhongying
Affiliation:ZHANG Fang;YAN Huaju;LIU Mingjun;ZHAO Zhongying;Hylanda Information Technology Co., Ltd;Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciencess;School of Information Science and Technology, Sun Yat-sen University;
Abstract:Due to its characteristics of shortness and sparseness, short text, as the main body of microblog and other social media, cannot be accurately classified by the traditional text classification methods. To solve this problem, a method of short text classification based on association rules of lexical items was proposed in this paper. Firstly, the training set based on the strong association rules was mined, and then the strong association rules was added to the features of short text so as to increase the feature density of short text, thereby to increase the accuracy of results of short text classification. Comparative experiments show that this method, to some extent, reduces the impact of sparseness of short text on the classification results, and it improves the classification accuracy, recall values and F1 values.
Keywords:data mining  short text  classification  association rules
本文献已被 CNKI 等数据库收录!
点击此处可从《集成技术》浏览原始摘要信息
点击此处可从《集成技术》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号