首页 | 本学科首页   官方微博 | 高级检索  
     

基于线性回归和属性集成的分类算法
引用本文:强保华,唐波,王玉峰,邹显春,柳正利,孙忠旭,谢武.基于线性回归和属性集成的分类算法[J].计算机科学,2017,44(6):212-215, 244.
作者姓名:强保华  唐波  王玉峰  邹显春  柳正利  孙忠旭  谢武
作者单位:桂林电子科技大学广西可信软件重点实验室,广西云计算与大数据协同创新中心 桂林541004,桂林电子科技大学广西可信软件重点实验室,广西云计算与大数据协同创新中心 桂林541004,中国电子科技集团公司第54研究所 石家庄050081,西南大学计算机与信息科学学院 重庆400715,桂林电子科技大学广西可信软件重点实验室,广西云计算与大数据协同创新中心 桂林541004,桂林电子科技大学广西可信软件重点实验室,广西云计算与大数据协同创新中心 桂林541004,桂林电子科技大学广西可信软件重点实验室,广西云计算与大数据协同创新中心 桂林541004
基金项目:本文受国家海洋技术公共福利项目(201505002),国家自然科学基金(61462020),广西可信软件重点实验室开放项目(KX201510),广西云计算与大数据协同创新项目(YD16E04),研究生创新项目(YJCXS201538)资助
摘    要:对于高维度小样本数据的分类问题,高维属性的复杂性限制了分类模型预测的准确率。为了进一步提高准确率,提出了基于线性回归和属性集成的分类算法。首先,采用线性回归为每一个属性构建属性线性分类器(Attri-bute Linear Classifier,ALC);其次,为了避免因ALC数量过多而导致准确率下降,利用经验风险最小化策略中的经验损失值作为评估标准来优选ALC;最后,应用多数投票法来集成被筛选的ALC。采用高维度小样本的基因表达数据集进行实验,结果显示该算法具有比逻辑回归、支持向量机和随机森林算法更高的准确率。

关 键 词:线性回归  单属性分类  经验损失  属性集成  多数投票法
收稿时间:2017/1/17 0:00:00
修稿时间:2017/3/27 0:00:00

Classification Algorithm Using Linear Regression and Attribute Ensemble
QIANG Bao-hu,TANG Bo,WANG Yu-feng,ZOU Xian-chun,LIU Zheng-li,SUN Zhong-xu and XIE Wu.Classification Algorithm Using Linear Regression and Attribute Ensemble[J].Computer Science,2017,44(6):212-215, 244.
Authors:QIANG Bao-hu  TANG Bo  WANG Yu-feng  ZOU Xian-chun  LIU Zheng-li  SUN Zhong-xu and XIE Wu
Affiliation:Guangxi Cooperative Innovation Center of Cloud Computing and Big Data,Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China,Guangxi Cooperative Innovation Center of Cloud Computing and Big Data,Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China,The 54th Research Institute,China Electronics Technology Group Corporation,Shijiazhuang 050081,China,College of Computer and Information Science,Southwest University,Chongqing 400715,China,Guangxi Cooperative Innovation Center of Cloud Computing and Big Data,Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China,Guangxi Cooperative Innovation Center of Cloud Computing and Big Data,Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China and Guangxi Cooperative Innovation Center of Cloud Computing and Big Data,Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China
Abstract:For the classification problems of high-dimensionality and small-sample data,the predictive accuracy of the classification model is restricted by the complexity of the high dimensional attributes.To further improve the accuracy,a classification algorithm using linear regression and attributes ensemble (LRAE) was proposed.The linear regression is utilized to construct an attribute linear classifier (ALC) for each attribute.To avoid the decrease of accuracy caused by too many ALCs,empirical loss value in the empirical risk minimization strategy is used as the evaluation criteria to select ALCs.The majority voting method is adopted to integrate ALCs.The results of experiments using gene expression data demonstrate that the accuracy of LRAE algorithm is relatively higher than that of logistic regression,support vector machine and random forest algorithms.
Keywords:Linear regression  Single attribute classification  Empirical loss  Attribute ensemble  Majority voting method
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号