首页 | 本学科首页   官方微博 | 高级检索  
     

傣文自动分词系统的设计与实现
引用本文:高廷丽,陶建华,戴红亮,李 雅. 傣文自动分词系统的设计与实现[J]. 中文信息学报, 2013, 27(6): 187-192
作者姓名:高廷丽  陶建华  戴红亮  李 雅
作者单位:1. 中国科学院 自动化研究所 模式识别国家重点实验室,北京 100190;
2. 教育部语言文字应用研究所,北京 100010
基金项目:国家自然科学基金资助项目(61273288,61233009,61203258,61305003,61332017,61375027),中国—新加坡数字媒体研究院基金(CSIDM)资助项目。
摘    要:傣文自动分词是傣文信息处理中的基础工作,是后续进行傣文输入法开发、傣文自动机器翻译系统开发、傣文文本信息抽取等傣文信息处理的基础,受限于傣语语料库技术,傣文自然语言处理技术较为薄弱。本文首先对傣文特点进行了分析, 并在此基础上构建了傣文语料库,同时将中文分词方法应用到傣文中,结合傣文自身的特点,设计了一个基于音节序列标注的傣文分词系统,经过实验,该分词系统达到了95.58%的综合评价值。

关 键 词:傣文  分词  CRF  绝对切分词  

Daiwen Word Segmentation System Design and Implementation
GAO Tingli,TAO Jianhua,DAI Hongliang,LI Ya. Daiwen Word Segmentation System Design and Implementation[J]. Journal of Chinese Information Processing, 2013, 27(6): 187-192
Authors:GAO Tingli  TAO Jianhua  DAI Hongliang  LI Ya
Affiliation:1. National Laboratory of Pattern Recognition, Institute of Aatormation Chinese Academy of Sciences, Beijing 100190, China;
2. Institute of Applied Linguistice Ministry of Education, Beijing 100190, China
Abstract:Daiwen word segmentation is the basis for Daiwen information processing work. Its the basic work for Daiwen input method, Daiwen machine translation system development, daiwen text information extraction and other information processing words. Limited by Daiwen corpus technology, Daiwen natural language processing technology is relatively weak. This paper first analyzes the characteristics of Daiwen, and on this basis, build a Daiwen corpus, then, applied Chinese word segmentation method to Daiwen segmentation, combined with its own characteristics, Designed an Daiwen word segmentation system based on the sequence annotation. Through experiments, the segmentation system has reached a comprehensive appraisal 95.58%.
Key wordsDaiwen; segmentation; CRF; absolute segmentation word
Keywords:Daiwen   segmentation   CRF   absolute segmentation word  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号