首页 | 本学科首页   官方微博 | 高级检索  
     

藏文语料库深加工方法研究
引用本文:才藏太. 藏文语料库深加工方法研究[J]. 计算机工程与应用, 2012, 48(26): 127-130,147
作者姓名:才藏太
作者单位:青海师范大学 计算机学院,西宁,810008
基金项目:国家973计划项目,国家自然科学基金,青海师范大学创新基金项目
摘    要:随着自然语言信息处理的不断发展和完善,大规模语料文本处理已经成为计算语言学界的一个热门话题。一个重要的原因是从大规模的语料库中能够提取出所需要的知识。结合973前期项目《藏文语料库分词标注规范研究》的开发经验,论述了班智达大型藏文语料库的建设,分词标注词典库和分词标注软件的设计与实现,重点讨论了词典库的索引结构及查找算法、分词标注软件的格词分块匹配算法和还原算法。

关 键 词:藏文语料库  分词标注  分词词典  还原算法

Method study of deeper processing for Tibetan corpus
CAI Zangtai. Method study of deeper processing for Tibetan corpus[J]. Computer Engineering and Applications, 2012, 48(26): 127-130,147
Authors:CAI Zangtai
Affiliation:CAI Zangtai School of Computer,Qinghai Normal University,Xining 810008,China
Abstract:As the constant development and improvement of natural language information processing,enormous linguistic material text processing has become a hot topic in the area of computational linguistics.One important reason is that it can collect the demanding knowledge from the huge corpus.This article puts together the development experience of the 973 project—— "Studies on syncopate-dimensional norms of the Tibetan corpus",elaborates on the large-scale construction of the Banzhiada Tibetan corpus,the design and the realization of the syncopate-dimensional dictionary storehouse and the syncopate-dimensional software.It mainly discusses the index structure and the lookup algorithm of the dictionary storehouse,the matching algorithm case auxiliary words block and the decompression algorithm of syncopate-dimensional software.
Keywords:Tibetan corpus  segmentation and tag  segmentation dictionary  decompression algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号