语义词特征提取及其在维吾尔文文本分类中的应用 Semantics-based Feature Extraction and Its Application in Uyghur Text Classification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

语义词特征提取及其在维吾尔文文本分类中的应用

引用本文：	吐尔地·托合提,艾克白尔·帕塔尔,艾斯卡尔·艾木都拉.语义词特征提取及其在维吾尔文文本分类中的应用[J].中文信息学报,2014,28(4):140-144.

作者姓名：	吐尔地·托合提艾克白尔·帕塔尔艾斯卡尔·艾木都拉

作者单位：	新疆大学信息科学与工程学院,新疆乌鲁木齐 830046

基金项目：	国家自然科学基金(61063022,61262062,61163033,61163032);新疆维吾尔自治区高技术研究发展计划项目(201212124);新疆维吾尔自治区高校科研计划重点项目(XJEDU2012I11);教育部新世纪优秀人才支持计划(NCET-10-0969)

摘要：	基于机器学习的文本分类中,维吾尔文传统分词方法表现出非常明显的不足和局限性。该文使用另外一种维吾尔文自动分词方法dme-TS。dme-TS中,不再以词间空格作为切分标记提取词特征,而是用一种组合统计量(dme)来度量文本中相邻单词之间的关联程度,并以dme度量的弱关联的词间位置作为切分点,提取对学习算法真正有意义的语义词特征。实验结果表明,用dme-TS提取文本特征可以降低特征空间的维度,同时也能有效的提高传统以单词为特征的分类算法的性能。
关键词：	维吾尔文分词词特征 dme-TS 语义词特征文本分类
Semantics-based Feature Extraction and Its Application in Uyghur Text Classification

Turdi Tohti,Akbar Pattar,Askar Hamdulla.Semantics-based Feature Extraction and Its Application in Uyghur Text Classification[J].Journal of Chinese Information Processing,2014,28(4):140-144.

Authors:	Turdi Tohti Akbar Pattar Askar Hamdulla

Affiliation:	School of Information Science and Engineering, Xinjiang University, Urumqi, Xinjiang 830046, China

Abstract:	In the text classification based on machine learning, the Uyghur traditional segmentation shows its deficiencies and limitations obviously. This paper uses another Uyghur automatic word segmentation method named as dme-TS. This segmentation method, no longer uses inter-word space as natural delimiter, but uses a kind of combination statistics (dme) to estimate the agglutinative strength between two adjacent Uyghur words, with the weak dme position as a segmentation point, The experimental result shows that, dme-TS can reduce the dimension of the feature space, at the same time also can effectively improve the classification performance of the tradition algorithm with the word for the features.

Keywords:	Uyghur word segmentation word features dme-TS semantic word features text classification
本文献已被 CNKI 等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏