傣文自动分词系统的设计与实现 Daiwen Word Segmentation System Design and Implementation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

傣文自动分词系统的设计与实现

引用本文：	高廷丽,陶建华,戴红亮,李雅. 傣文自动分词系统的设计与实现[J]. 中文信息学报, 2013, 27(6): 187-192

作者姓名：	高廷丽陶建华戴红亮李雅

作者单位：	1. 中国科学院自动化研究所模式识别国家重点实验室,北京 100190; 2. 教育部语言文字应用研究所,北京 100010

基金项目：	国家自然科学基金资助项目(61273288,61233009,61203258,61305003,61332017,61375027),中国—新加坡数字媒体研究院基金(CSIDM)资助项目。

摘要：	傣文自动分词是傣文信息处理中的基础工作,是后续进行傣文输入法开发、傣文自动机器翻译系统开发、傣文文本信息抽取等傣文信息处理的基础,受限于傣语语料库技术,傣文自然语言处理技术较为薄弱。本文首先对傣文特点进行了分析, 并在此基础上构建了傣文语料库,同时将中文分词方法应用到傣文中,结合傣文自身的特点,设计了一个基于音节序列标注的傣文分词系统,经过实验,该分词系统达到了95.58%的综合评价值。
关键词：	傣文分词 CRF 绝对切分词
Daiwen Word Segmentation System Design and Implementation

GAO Tingli,TAO Jianhua,DAI Hongliang,LI Ya. Daiwen Word Segmentation System Design and Implementation[J]. Journal of Chinese Information Processing, 2013, 27(6): 187-192

Authors:	GAO Tingli TAO Jianhua DAI Hongliang LI Ya

Affiliation:	1. National Laboratory of Pattern Recognition, Institute of Aatormation Chinese Academy of Sciences, Beijing 100190, China; 2. Institute of Applied Linguistice Ministry of Education, Beijing 100190, China

Abstract:	Daiwen word segmentation is the basis for Daiwen information processing work. Its the basic work for Daiwen input method, Daiwen machine translation system development, daiwen text information extraction and other information processing words. Limited by Daiwen corpus technology, Daiwen natural language processing technology is relatively weak. This paper first analyzes the characteristics of Daiwen, and on this basis, build a Daiwen corpus, then, applied Chinese word segmentation method to Daiwen segmentation, combined with its own characteristics, Designed an Daiwen word segmentation system based on the sequence annotation. Through experiments, the segmentation system has reached a comprehensive appraisal 95.58%. Key wordsDaiwen; segmentation; CRF; absolute segmentation word

Keywords:	Daiwen segmentation CRF absolute segmentation word

	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏