基于文本集密度的特征词选择与权重计算方法 A Method of Feature Selection and Weighting Scheme Based on Text Set Density期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于文本集密度的特征词选择与权重计算方法

引用本文：	吴卫华,袁宁,周劲,王洪军.基于文本集密度的特征词选择与权重计算方法[J].计算机与数字工程,2005,33(3):11-13,52.

作者姓名：	吴卫华袁宁周劲王洪军

作者单位：	山东省智奥电算开发中心,济南,250013;济南大学信息科学与工程学院,济南,250022

基金项目：	山东省自然科学基金资助项目 (项目编号 :Y2 0 0 1G0 3)

摘要：	根据汉语语言自身的特点，在基于原有的特征项提取方法基础之上，提出了基于文本集密度的特征词选择的思想，对于特征项个数和选择进行了界定，找出了不损失文本有效信息的最小特征词语集，并且利用其中的中间值作为词语权重计算的一部分，创造出更为合理的权重计算方案。最后利用一种新的衡量权重好坏的标准——元打分法，对文中所提出的方法的正确性和有效性进行了实验和证明。
关键词：	文本分类分词特征选择权重计算方案
A Method of Feature Selection and Weighting Scheme Based on Text Set Density

Wu Weihua,YUAN Ning,Zhou Jin,Wang Hongjun.A Method of Feature Selection and Weighting Scheme Based on Text Set Density[J].Computer and Digital Engineering,2005,33(3):11-13,52.

Authors:	Wu Weihua YUAN Ning Zhou Jin Wang Hongjun

Affiliation:	Wu Weihua 1) Yuan Ning 2) Zhou Jin 2) Wang Hongjun 1)

Abstract:	Studying word segmentation and feature item pick-up, this paper puts forward feature selection based on text set density according to the character of Chinese. This article defines the number and selecting of feature item and finds out the least feature set not losing effect information of text, then makes use of middle value as a part of word weighting to create a logical weighting scheme. Finally,it uses meta-scoring to prove the validity of the scheme.

Keywords:	text categorization word segmentation feature selection weighting
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏