藏语语料库加工方法研究 Tibetan corpus processing method期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

藏语语料库加工方法研究

引用本文：	才让加.藏语语料库加工方法研究[J].计算机工程与应用,2011,47(6):138-139.

作者姓名：	才让加

作者单位：	青海师范大学藏文信息研究中心，西宁 810008

基金项目：	国家语委基金，国家社会科学基金，国家社会科学重点基金，973计划前期研究专项

摘要：	为了使藏语语料库具有规范性、统一性和实用性，提高加工的整体水平，在藏语语料库的加工过程中首先要对五花八门的藏语语料库进行整理和统一，得到高质量的原始语料库，其次确定藏语原料库加工的切分单位，针对藏语的语法特征提出藏语语料库藏语词语类别和词类标记集，同时在对藏语词语进行归类和统计的基础上建立分词标注词典库，设计并实现藏文自动分词标注软件，利用分词标注软件对大规模藏语语料库进行切分和标注，最终实现藏语语料库的多级加工。
关键词：	藏语语料库规范词类标记集词典分词标注
修稿时间：
Tibetan corpus processing method

CAI Rangjia.Tibetan corpus processing method[J].Computer Engineering and Applications,2011,47(6):138-139.

Authors:	CAI Rangjia

Affiliation:	Research Center of Tibetan Information，Qinghai Normal University，Xining 810008，China

Abstract:	In order to make the Tibetan corpus standardization,unity,practicability and to improve the overall level of pro- cessing.The multifarious Tibetan corpus in the processing part should be arranged and unified,which can get high quality of raw corpora.Then the processing units of Tibetan ancillary facilities for segmentation is determined,the Tibetan language syntax category and Tibetan words corpus are put forward, and participles tagging dictionary is set up based on the selection in the words of Tibetan categorize and statistics.The Tibetan automatic word segmentation tagging is designed and carried.The large-scale Tibetan corpus is segmented and labeled by using participle labeling software.The multilevel processing of Tibetan Corpus is implemented.

Keywords:	Tibetan corpus norms lexicon mark sets dictionary participle labeling
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏