首页 | 本学科首页   官方微博 | 高级检索  
     

汉语比较句识别研究
引用本文:黄小江,万小军,杨建武,肖建国.汉语比较句识别研究[J].中文信息学报,2008,22(5):30-38.
作者姓名:黄小江  万小军  杨建武  肖建国
作者单位:北京大学 计算机科学技术研究所,北京 100871
基金项目:国家高技术研究发展计划(863计划),国家自然科学基金,教育部高等学校博士点新教师基金
摘    要:比较是常见的表达方式,提取事物之间的比较关系是一项新颖而有实用价值的研究。识别自然语言中的比较句,是提取比较关系的一个重要步骤。目前还没有针对汉语比较句的自动识别研究,语言学上比较句的哪些特征能够应用到自动识别上来是一个亟待研究的问题。该文讨论了汉语比较句的范畴、外延和特征,定义了汉语比较句识别的任务,并提出用SVM分类器将汉语句子分为“比较”和“非比较”两类。该文比较了比较句的语言学特征和统计特征,包括特征词、序列模式等在分类中的作用。实验结果表明:基于类序列规则的SVM分类器能够有效地识别汉语比较句,效果优于传统基于词的文本分类。

关 键 词:计算机应用  中文信息处理  汉语比较句识别  比较挖掘  文本分类  序列模式  

Learning to Identify Chinese Comparative Sentences
HUANG Xiao-jiang,WAN Xiao-jun,YANG Jian-wu,XIAO Jian-guo.Learning to Identify Chinese Comparative Sentences[J].Journal of Chinese Information Processing,2008,22(5):30-38.
Authors:HUANG Xiao-jiang  WAN Xiao-jun  YANG Jian-wu  XIAO Jian-guo
Affiliation:Institute of Computer Science and Technology of Peking University, Beijing 100871, China
Abstract:Comparison is a common kind of expression,and it is novel and substantial research to extract comparative relations between objects.Identifying comparative sentences in natural language is an important step in extracting comparative relations.To our knowledge,there is no research on identifying Chinese comparative sentences automatically.This paper first defines the problem of Chinese comparative sentence identification,and then proposes to use SVM to classify a Chinese sentence into either "comparative" or not.Various linguistic and statistical features have been explored,such as keywords and sequential patterns.Experimental results demonstrate the effectiveness of the sequential patterns,i.e.the classifier with sequential patterns can significantly outperform the traditional term-based classifier.We also empirically investigate the important factors that affect classification performance.
Keywords:computer application  Chinese information processing  Chinese comparative sentences identification  comparative mining  text classification  sequential pattern
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号