首页 | 本学科首页   官方微博 | 高级检索  
     

面向问答社区的中文问题分类
引用本文:董才正,刘柏嵩.面向问答社区的中文问题分类[J].计算机应用,2016,36(4):1060-1065.
作者姓名:董才正  刘柏嵩
作者单位:1. 宁波大学 信息科学与工程学院, 浙江 宁波 315211;2. 宁波大学 图书馆与信息中心, 浙江 宁波 315211
基金项目:浙江省教育厅(文)/科研计划项目(20071008);浙江省部级实验室开放基金资助项目(B2014)~~
摘    要:传统的问题分类体系大都基于事实类问题,传统的问题分类方法也比较依赖于疑问词这一分类特征,但问答社区(CQA)中非事实类问题居多,且许多问题并不包含疑问词,为此,提出一种面向问答社区的粗粒度分类体系,并在此基础上提出一种基于疑问词的层次化结构问题分类方法。该方法首先自动识别问题中的疑问词,若疑问词存在,则用支持向量机(SVM)模型进行分类;而对没有疑问词的问题,则用所构造的基于焦点词的分类器进行分类。通过在从中文问答社区知乎中所爬取的问题数据集上进行实验,与传统的基于SVM模型的分类方法相比,该方法的分类准确率提高了4.7个百分点。实验结果表明,这种根据问题是否含有疑问词而选择不同分类器的方法,减轻了分类方法对疑问词的依赖,能有效提高问答社区中问题分类的准确率。

关 键 词:中文问题分类    问答社区    层次分类    支持向量机    焦点词
收稿时间:2015-08-18
修稿时间:2015-10-29

Community question answering-oriented Chinese question classification
DONG Caizheng;LIU Baisong.Community question answering-oriented Chinese question classification[J].journal of Computer Applications,2016,36(4):1060-1065.
Authors:DONG Caizheng;LIU Baisong
Affiliation:1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo Zhejiang 315211, China;2. Library and Information Center, Ningbo University, Ningbo Zhejiang 315211, China
Abstract:There are many questions without interrogative words in the Community Question Answering (CQA), where non-factoid questions make up a high proportion. In order to solve a specific case that the traditional categories for question classification is based on the factoid questions and the traditional methods for question classification largely depend on the interrogative words, a coarse-grained classification category and a novel hierarchical structure question classification method based on the interrogative words were proposed. The Support Vector Machine (SVM) model was used to classify the questions which contained interrogative words. As for the questions without interrogative words, a classifier based on focus word was constructed. The comparison experiment with method based on SVM was conducted on the dataset of Chinese questions crawled from Zhihu, and the proposed method improved the accuracy by 4.7 percentage points. The experimental results illustrate that the proposed method which selects different classifier according to whether a question contains interrogative words can effectively reduce the dependence on interrogative word, and make more accurate classification for Chinese questions.
Keywords:
本文献已被 CNKI 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号