基于SVM的中文组块分析 SVM Based Chinese Text Chunking期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于SVM的中文组块分析

引用本文：	李珩,朱靖波,姚天顺.基于SVM的中文组块分析[J].中文信息学报,2004,18(2):2-8.

作者姓名：	李珩朱靖波姚天顺

作者单位：	东北大学计算机软件与理论研究所

基金项目：	国家自然科学基金，国家重点基础研究发展计划(973计划)，国家自然科学基金委员会-微软亚洲研究院联合资助项目

摘要：	基于SVM(support vector machine)理论的分类算法,由于其完善的理论基础和良好的实验结果,目前已逐渐引起国内外研究者的关注。和其他分类算法相比,基于结构风险最小化原则的SVM在小样本模式识别中表现较好的泛化能力。文本组块分析作为句法分析的预处理阶段,通过将文本划分成一组互不重叠的片断,来达到降低句法分析的难度。本文将中文组块识别问题看成分类问题,并利用SVM加以解决。实验结果证明,SVM算法在汉语组块识别方面是有效的,在哈尔滨工业大学树库语料测试的结果是F=88.67%,并且特别适用于有限的汉语带标信息的情况。
关键词：	计算机应用中文信息处理支持向量机结构风险最小化文本组块
文章编号：	1003-0077(2004)02-0001-07
修稿时间：	2003年7月25日
SVM Based Chinese Text Chunking

LI Heng,ZHU Jing-bo,YAO Tian-shun.SVM Based Chinese Text Chunking[J].Journal of Chinese Information Processing,2004,18(2):2-8.

Authors:	LI Heng ZHU Jing-bo YAO Tian-shun

Affiliation:	Institute of Computer Software and Theory ,Northeastern University

Abstract:	The classification algorithm based on SVM (support vector machine) attracts more attention from researchers due to its perfect theoretical properties and good empirical results. Compared with other classification algorithms, structural risk minimizations based SVM achieve high generalization performance with small number of samples. The text chunking, as a preprocessing step for parsing, is to divide text into syntactically related non-overlapping groups of words (chunks), reducing the complexity of the full parsing. In this paper, we treat Chinese text chunking as a classification problem, and apply SVM to solve it. The chunking experiments were carried out on the HIT Chinese Treebank corpus. Experimental results show that it is an effective approach, achieving an F score of 88.67%, especially for a small number of Chinese labeled samples.

Keywords:	computer application Chinese information processing support vector machine structural risk minimization text chunking
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏