首页 | 本学科首页   官方微博 | 高级检索  
     

适用于非平衡数据的多关系多分类模型
引用本文:杨鹤标,王健.适用于非平衡数据的多关系多分类模型[J].计算机工程,2010,36(20):52-54.
作者姓名:杨鹤标  王健
作者单位:江苏大学计算机科学与通信工程学院,江苏,镇江,212013
基金项目:江苏省高技术研究基金资助项目,江苏省高校自然科学基金资助项目 
摘    要:针对多关系多分类的非平衡数据,提出一种分类模型。在预处理阶段,建立目标类纠错输出编码(ECOC)、目标关系与背景关系间的虚拟连接并完成属性聚集处理,进而划分训练集和验证集。在训练阶段,依据一对多划分思想,结合CrossMine算法构造多个子分类器,采用AUC法评估验证各子分类器。在验证阶段,比较目标类ECOC与各子分类器分类结果连接字的海明距离,选择最小海明距离的目标类为最终分类。经合成和真实数据的实验,验证了模型有效性及分类效果。

关 键 词:多关系分类  非平衡数据  多类分类  纠错输出编码  一对多划分

Multi-relational Multi-class Model for Imbalanced Data
YANG He-biao,WANG Jian.Multi-relational Multi-class Model for Imbalanced Data[J].Computer Engineering,2010,36(20):52-54.
Authors:YANG He-biao  WANG Jian
Affiliation:(School of Computer Sciencre and Telecommunication Engineering, Jiangsu University, Zhenjiang 212013, China)
Abstract:This paper proposes a multi-relational model which is applied to the multi-class imbalanced data. In the preprocessing stage, each class is assigned an Error Correcting Output Coding(ECOC). After setting up the virtual joins between the target and background relations, appropriate aggregation functions are used for different features. On this condition, the data can be divided into training set and validation set. Sub-classifiers are built on the training set in combination with One-vs-All classification method and CrossMine algorithm, and all the sub-classifiers are validated by their AUC values. The ECOC of the target class is compared with the Hamming distance of the linked word produced by the sub-classifiers on the validation set, and the class is chosen which has the shortest Hamming distance for the final result. The validity and effectiveness of the classifier by experiments are shown on both synthetic and real datasets.
Keywords:multi-relational classification  imbalanced data  multi-class classification  Error Correcting Output Coding(ECOC)  One-vs-All classification
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号