藏文语料库深加工方法研究 Method study of deeper processing for Tibetan corpus期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

藏文语料库深加工方法研究

引用本文：	才藏太. 藏文语料库深加工方法研究[J]. 计算机工程与应用, 2012, 48(26): 127-130,147

作者姓名：	才藏太

作者单位：	青海师范大学计算机学院,西宁,810008

基金项目：	国家973计划项目，国家自然科学基金，青海师范大学创新基金项目

摘要：	随着自然语言信息处理的不断发展和完善,大规模语料文本处理已经成为计算语言学界的一个热门话题。一个重要的原因是从大规模的语料库中能够提取出所需要的知识。结合973前期项目《藏文语料库分词标注规范研究》的开发经验,论述了班智达大型藏文语料库的建设,分词标注词典库和分词标注软件的设计与实现,重点讨论了词典库的索引结构及查找算法、分词标注软件的格词分块匹配算法和还原算法。
关键词：	藏文语料库分词标注分词词典还原算法
Method study of deeper processing for Tibetan corpus

CAI Zangtai. Method study of deeper processing for Tibetan corpus[J]. Computer Engineering and Applications, 2012, 48(26): 127-130,147

Authors:	CAI Zangtai

Affiliation:	CAI Zangtai School of Computer,Qinghai Normal University,Xining 810008,China

Abstract:	As the constant development and improvement of natural language information processing,enormous linguistic material text processing has become a hot topic in the area of computational linguistics.One important reason is that it can collect the demanding knowledge from the huge corpus.This article puts together the development experience of the 973 project—— "Studies on syncopate-dimensional norms of the Tibetan corpus",elaborates on the large-scale construction of the Banzhiada Tibetan corpus,the design and the realization of the syncopate-dimensional dictionary storehouse and the syncopate-dimensional software.It mainly discusses the index structure and the lookup algorithm of the dictionary storehouse,the matching algorithm case auxiliary words block and the decompression algorithm of syncopate-dimensional software.

Keywords:	Tibetan corpus segmentation and tag segmentation dictionary decompression algorithm
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏