首页 | 本学科首页   官方微博 | 高级检索  
     

面向文本分类的有监督显式语义表示
引用本文:孙飞,郭嘉丰,兰艳艳,程学旗.面向文本分类的有监督显式语义表示[J].数据采集与处理,2017,32(3):550-558.
作者姓名:孙飞  郭嘉丰  兰艳艳  程学旗
作者单位:中国科学院计算所网络数据科学与技术重点实验室,北京,100190
摘    要:文本表示作为文本分类的一个基本问题,一直广受关注。目前文本表示主要有词袋模型、隐式语义表达和基于知识库的显式语义表达3种方式。本文首先分析对比了这3种文本表示方式在文本分类中的效果。实验发现,基于知识库的显式语义表达并没有如预期一样提高文本分类的效果。经分析,其原因在于显式语义表达在扩展文档表达时易引入噪声。针对该问题,本文提出了一种有监督的显式语义表达方法。该方法利用数据集的标注信息识别文档中与分类最相关的核心概念,并扩展核心概念以形成文档显式语义表达。3个标准分类数据集上的结果证实了本文所提文本表示方法的有效性。

关 键 词:文本分类  文本表达  有监督显式语义表示

Supervised Explicit Semantic Representation for Text Categorization
Sun Fei,Guo Jiafeng,Lan Yanyan,Cheng Xueqi.Supervised Explicit Semantic Representation for Text Categorization[J].Journal of Data Acquisition & Processing,2017,32(3):550-558.
Authors:Sun Fei  Guo Jiafeng  Lan Yanyan  Cheng Xueqi
Affiliation:Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Abstract:As a fundamental problem of text categorization, text representation is widely concerned. Currently, there are three main ways of text representation: bag-of-words model, latent semantic representation and knowledge-based explicit semantic representation. The paper analyzes and compared the effects of these methods applied to text categorization. Experiments show that the knowledge-based explicit semantic representation cannot improve the text categorization performance as expected. To tackle the problem that the knowledge-based explicit semantic representation easily introduces noise in extending text, a supervised explicit semantic representation method is proposed. The dataset label information is used to identify the most relevant concepts in document and the document is represented in explicit semantic based on expanding those key concepts. The results of three datasets confirm the effectiveness of the proposed method.
Keywords:text categorization  text representation  supervised explicit semantic representation
点击此处可从《数据采集与处理》浏览原始摘要信息
点击此处可从《数据采集与处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号