基于语义串抽取及主题相似度度量的维吾尔文文本分类 Semantic String-Based Topic Similarity Measuring Approach for Uyghur Text Classification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于语义串抽取及主题相似度度量的维吾尔文文本分类

引用本文：	吐尔地·托合提,维尼拉·木沙江,艾斯卡尔·艾木都拉.基于语义串抽取及主题相似度度量的维吾尔文文本分类[J].中文信息学报,2017,31(4):100-107.

作者姓名：	吐尔地·托合提维尼拉·木沙江艾斯卡尔·艾木都拉

作者单位：	新疆大学信息科学与工程学院, 新疆乌鲁木齐 830046

基金项目：	国家自然科学基金(61562083,61262062,61262063);新疆维吾尔自治区高校科研计划重点项目(XJEDU2012I11)

摘要：	该文研究一种改进的n元递增算法来抽取维吾尔文本中表达关键信息的语义串,并用带权语义串集来刻画文本主题,提出了一种类似于Jaccard相似度的文本和类主题相似度度量方法,并实现了相应的维吾尔文分类算法。实验结果表明,该文提出的文本模型简单有效,分类算法计算量不高,而且还能达到或超过经典分类器的分类综合性能。
关键词：	维吾尔文 n元递增算法语义串抽取主题相似度文本分类
Semantic String-Based Topic Similarity Measuring Approach for Uyghur Text Classification

Turdi Tohti,Winira Musajan,Askar Hamdulla.Semantic String-Based Topic Similarity Measuring Approach for Uyghur Text Classification[J].Journal of Chinese Information Processing,2017,31(4):100-107.

Authors:	Turdi Tohti Winira Musajan Askar Hamdulla

Affiliation:	School of Information Science and Engineering, Xinjiang University, Urumqi, Xinjing 830046, China

Abstract:	This paper proposes an improved frequent pattern-growth approach to discover and extract the semantic strings which express key information in Uyghur texts. Then the topics are described by these weighted semantic strings. Based on these features, the Uyghur text classification is conducted by a new-designed Jaccard-like similarity measure. Experimental results show that the proposed method achieves comparable performance with a reasonable computation cost with regard to two traditional classifiers.

Keywords:	Uyghur language frequent pattern-growth algorithm semantic string extraction topic similarity text classification

	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏