一个无需词典支持和切词处理的中文文档分类系统 A CHINESE DOCUMENT CATEGORIZATION SYSTEM WITHOUT DICTIONARY SUPPORT AND SEGMENTATION PROCESSING期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一个无需词典支持和切词处理的中文文档分类系统

引用本文：	周水庚,关佶红,胡运发,周傲英. 一个无需词典支持和切词处理的中文文档分类系统[J]. 计算机研究与发展, 2001, 38(7): 839-844

作者姓名：	周水庚关佶红胡运发周傲英

作者单位：	1. 武汉大学 2. 武汉大学计算机学院 3. 复旦大学计算机科学与工程系

基金项目：	中国博士后科学基金，国家“八六三”高技术研究发展计划基金项目(86 3-30 6 -ZT0 4-0 2 -2 )，国家自然科学基金 (6 0 0 0 30 16 )的

摘要：	报道了一个无需词典支持和切词处理的中文文档分类系统。其特点是利用N－gram信息进行中文文档分类，使中文文档分类摆脱了对词典和切词处理的依赖，实现了中文文档分类的领域无关性和时间无关性；采用开放的体系结构使文档分类系统易于功能扩充和性能完善。测试结果表明该系统具有令人满意的分类性能。
关键词：	中文文档分类系统词典支持切词处理中文信息处理 Internet
A CHINESE DOCUMENT CATEGORIZATION SYSTEM WITHOUT DICTIONARY SUPPORT AND SEGMENTATION PROCESSING

Abstract:	In this paper, a Chinese document categorization system without dictionary support and segmentation processing is developed, in which the N gram information instead of Chinese words is used so that the classifier can shake off the support of dictionaries and segmentation processing and subsequently become domain and time independent, and an open architecture is adopted to facilitate functional expansion and performance improvement. Experimental results show that it can achieve satisfying categorization performance.

Keywords:	Chinese text categorization N gram information feature selection Bayesian classification k NN method
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏