独立于语种的文本分类方法 Language-Independent Text Categorization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

独立于语种的文本分类方法

引用本文：	陈林,杨丹.独立于语种的文本分类方法[J].计算机工程与科学,2008,30(6):128-130.

作者姓名：	陈林杨丹

作者单位：	重庆大学软件学院,重庆,400044

摘要：	本文提出了一种独立于语种不需分词的文本分类方法。与传统文本分类模型相比，该方法在字的级别上利用了n元语法模型，文本分类时无需进行分词，并且避免了特征选择和大量预处理过程。我们系统地研究了模型中的关键因素以及它们对分类结果的影响，并详细介绍了评价方法。该文本分类方法已经在中文和英文两个语种上得到实现，并获得了较好的分类性能。
关键词：	文本分类 n元语法模型语种
文章编号：	1007-130X(2008)06-0128-03
修稿时间：	2007年9月7日
Language-Independent Text Categorization

CHEN Lin,YANG Dan.Language-Independent Text Categorization[J].Computer Engineering & Science,2008,30(6):128-130.

Authors:	CHEN Lin YANG Dan

Abstract:	The paper proposes an approach to language independent text classification without word segmentation. Unlike the case of traditional text classification models, the approach based on the character-level n-gram language modeling avoids word segmentation, explicit feature selection and extensive pre-processing. We systematically study the key factors in language modeling and their influence on classification, and describe an evaluation method in detail. Experimental results show that the proposed method can achieve good performance in text classification tasks.

Keywords:	text classification n-gram model language
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与科学》浏览原始摘要信息
	点击此处可从《计算机工程与科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏