TIP-LAS:一个开源的藏文分词词性标注系统 TIP-LAS: An Open Source Toolkit for Tibetan Word Segmentationand POS Tagging期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

TIP-LAS:一个开源的藏文分词词性标注系统

引用本文：	李亚超,江静,加羊吉,于洪志. TIP-LAS:一个开源的藏文分词词性标注系统[J]. 中文信息学报, 2015, 29(6): 203-207

作者姓名：	李亚超江静加羊吉于洪志

作者单位：	甘肃省民族语言智能处理重点实验室,西北民族大学,甘肃兰州 730030

基金项目：	西北民族大学中央高校基本科研业务费专项资金(31920140064,31920150089)

摘要：	TIP-LAS是一个开源的藏文分词词性标注系统,提供藏文分词、词性标注功能。该系统基于条件随机场模型实现基于音节标注的藏文分词系统,采用最大熵模型,并融合音节特征,实现藏文词性标注系统。经过试验及对比分析,藏文分词系统和词性标注系统取得了较好的实验效果,系统的源代码可以从网上获取。希望该研究可以推动藏文分词、词性标注等基础工作的发展,提供一个可以比较、共享的研究平台。
关键词：	藏文分词词性标注条件随机场最大熵
TIP-LAS: An Open Source Toolkit for Tibetan Word Segmentationand POS Tagging

LI Yachao,JIANG Jing,JIA Yangji,YU Hongzhi. TIP-LAS: An Open Source Toolkit for Tibetan Word Segmentationand POS Tagging[J]. Journal of Chinese Information Processing, 2015, 29(6): 203-207

Authors:	LI Yachao JIANG Jing JIA Yangji YU Hongzhi

Affiliation:	Key Laboratory of National Language Intelligent Processing, Northwest University for Nationalities, Lanzhou, Gansu 730030, China)

Abstract:	TIP-LAS is an open source toolkit for Tibetan segmentation and POS tagging. The toolkit implements the Tibetan segmentation system based on syllable tagging by the CRF model, and integrates the maximum entropy model with syllables features for Tibetan POS tagging. In the experiments, this system achieves good results. The source code is shared in the Internet, together with the experimental corpus. Key words Tibetan; word segmentation; part of speech tagging; conditional random fields; maximum entropy

Keywords:	Tibetan word segmentation part of speech tagging conditional random fields maximum entropy

	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏