首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Clustering ensemble integrates multiple base clustering results to obtain a consensus result and thus improves the stability and robustness of the single clustering method. Since it is natural to use a hypergraph to represent the multiple base clustering results, where instances are represented by nodes and base clusters are represented by hyperedges, some hypergraph based clustering ensemble methods are proposed. Conventional hypergraph based methods obtain the final consensus result by partitioning a pre-defined static hypergraph. However, since base clusters may be imperfect due to the unreliability of base clustering methods, the pre-defined hypergraph constructed from the base clusters is also unreliable. Therefore, directly obtaining the final clustering result by partitioning the unreliable hypergraph is inappropriate. To tackle this problem, in this paper, we propose a clustering ensemble method via structured hypergraph learning, i.e., instead of being constructed directly, the hypergraph is dynamically learned from base results, which will be more reliable. Moreover, when dynamically learning the hypergraph, we enforce it to have a clear clustering structure, which will be more appropriate for clustering tasks, and thus we do not need to perform any uncertain postprocessing, such as hypergraph partitioning. Extensive experiments show that, our method not only performs better than the conventional hypergraph based ensemble methods, but also outperforms the state-of-the-art clustering ensemble methods.  相似文献   

2.
Ensemble learning has attracted considerable attention owing to its good generalization performance. The main issues in constructing a powerful ensemble include training a set of diverse and accurate base classifiers, and effectively combining them. Ensemble margin, computed as the difference of the vote numbers received by the correct class and the another class received with the most votes, is widely used to explain the success of ensemble learning. This definition of the ensemble margin does not consider the classification confidence of base classifiers. In this work, we explore the influence of the classification confidence of the base classifiers in ensemble learning and obtain some interesting conclusions. First, we extend the definition of ensemble margin based on the classification confidence of the base classifiers. Then, an optimization objective is designed to compute the weights of the base classifiers by minimizing the margin induced classification loss. Several strategies are tried to utilize the classification confidences and the weights. It is observed that weighted voting based on classification confidence is better than simple voting if all the base classifiers are used. In addition, ensemble pruning can further improve the performance of a weighted voting ensemble. We also compare the proposed fusion technique with some classical algorithms. The experimental results also show the effectiveness of weighted voting with classification confidence.  相似文献   

3.
Compressing videos while maintaining an acceptable level of Quality of Experience (QoE) is indispensable. To this aim, a feasible method is to further increase the Quantization Parameter (QP) of video stream to eliminate visual redundancy, simultaneously utilizing perceptual characteristics of Human Visual System (HVS) to impose a threshold constraint on the maximum QP. In this paper, we employ Just Noticeable Distortion (JND) to characterize the aforementioned threshold constraint, thereby avoiding perceptual loss during QP refinement process. We propose an effective JND-based algorithm for QP optimization, in which a video saliency detection is introduced to extract regions of interest, a refinement model based on a lightweight network is designed to predict QP value and an ensemble learning method to improve generalization performance. Theoretical analysis and experimental results demonstrate that the proposed algorithm has been successfully applied to Versatile Video Coding (VVC) to achieve significant bitrate reduction without sacrificing perceived quality.  相似文献   

4.
Meta-learning is one of the latest research directions in machine learning, which is considered to be one of the most probably ways to realize strong artificial intelligence. Meta-learning focuses on seeking solutions for machines to learn like human beings do - to recognize things through only few sample data and quickly adapt to new tasks. Challenges occur in how to train an efficient machine model with limited labeled data, since the model is easily over-fitted. In this paper, we address this obvious but important problem and propose a metric-based meta-learning model, which combines attention mechanisms and ensemble learning method. In our model, we first design a dual path attention module which considers both channel attention and spatial attention module, and the attention modules have been stacked to conduct a meta-learner for few shot meta-learning. Then, we apply an ensemble method called snap-shot ensemble to the attention-based meta-learner in order to generate more models in a single episode. Features abstracted from the models are put into the metric-based architecture to compute a prototype for each class. Our proposed method intensifies the feature extracting ability of backbone network in meta-learner and reduces over-fitting through ensemble learning and metric learning method. Experimental results toward several meta-learning datasets show that our approach is effective.  相似文献   

5.
The volunteer computing paradigm, along with the tailored use of peer-to-peer communication, has recently proven capable of solving a wide area of data-intensive problems in a distributed scenario. The Mining@Home framework is based on these paradigms and it has been implemented to run a wide range of distributed data mining applications. The efficiency and scalability of the architecture can be fully exploited when the overall task can be partitioned into distinct jobs that may be executed in parallel, and input data can be reused, which naturally leads to the use of data cachers. This paper explores the opportunities offered by Mining@Home for coping with the discovery of classifiers through the use of the bagging approach: multiple learners are used to compute models from the same input data, so as to extract a final model with high statistical accuracy. Analysis focuses on the evaluation of experiments performed in a real distributed environment, enriched with simulation assessment–to evaluate very large environments–and with an analytical investigation based on the iso-efficiency methodology. An extensive set of experiments allowed to analyze a number of heterogeneous scenarios, with different problem sizes, which helps to improve the performance by appropriately tuning the number of workers and the number of interconnected domains.  相似文献   

6.
蔡铁  伍星  李烨 《计算机应用》2008,28(8):2091-2093
为构造集成学习中具有差异性的基分类器,提出基于数据离散化的基分类器构造方法,并用于支持向量机集成。该方法采用粗糙集和布尔推理离散化算法处理训练样本集,能有效删除不相关和冗余的属性,提高基分类器的准确性和差异性。实验结果表明,所提方法能取得比传统集成学习算法Bagging和Adaboost更好的性能。  相似文献   

7.
Deep Neural Network (DNN) is widely used in engineering applications for its ability to handle problems with almost any nonlinearities. However, it is generally difficult to obtain sufficient high-fidelity (HF) sample points for expensive optimization tasks, which may affect the generalization performance of DNN and result in inaccurate predictions. To solve this problem and improve the prediction accuracy of DNN, this paper proposes an on-line transfer learning based multi-fidelity data fusion (OTL-MFDF) method including two parts. In the first part, the ensemble of DNNs is established. Firstly, a large number of low-fidelity sample points and a few HF sample points are generated, which are used as the source dataset and target dataset, respectively. Then, the Bayesian Optimization (BO) is utilized to obtain several groups of hyperparameters, based on which DNNs are pre-trained using the source dataset. Next, these pre-trained DNNs are re-trained by fine-tuning on the target dataset, and the ensemble of DNNs is established by assigning different weights to each pre-trained DNN. In the second part, the on-line learning system is developed for adaptive updating of the ensemble of DNNs. To evaluate the uncertainty error of the predicted values of DNN and determine the location of the updated HF sample point, the query-by-committee strategy based on the ensemble of DNNs is developed. The Covariance Matrix Adaptation Evolutionary Strategies is employed as the optimizer to find out the location where the maximal disagreement is achieved by the ensemble of DNNs. The design space is partitioned by the Voronoi diagram method, and then the selected point is moved to its nearest Voronoi cell boundary to avoid clustering between the updated point and the existing sample points. Three different types of test problems and an engineering example are adopted to illustrate the effectiveness of the OTL-MFDF method. Results verify the outstanding efficiency, global prediction accuracy and applicability of the OTL-MFDF method.  相似文献   

8.
The annoyance of spam emails increasingly plagues both individuals and organizations. In response, most of prior research investigates spam filtering as a classical text categorization task, in which training examples must include both spam (positive examples) and legitimate (negative examples) emails. However, in many spam filtering scenarios, obtaining legitimate emails for training purpose can be more difficult than collecting spam and unclassified emails. Hence, it is more appropriate to construct a classification model for spam filtering that uses positive training examples (i.e., spam) and unlabeled instances only and does not require legitimate emails as negative training examples. Several single-class learning techniques, such as PNB and PEBL, have been proposed in the literature. However, they incur inherent limitations with regard to spam filtering. In this study, we propose and develop an ensemble approach, referred to as E2, to address these limitations. Specifically, we follow the two-stage framework of PEBL but extend each stage with an ensemble strategy. The empirical evaluation results from two spam filtering corpora suggest that our proposed E2 technique generally outperforms benchmark techniques (i.e., PNB and PEBL) and exhibits more stable performance than its counterparts.  相似文献   

9.
Emotion is a status that combines people’s feelings, thoughts, and behaviors, and plays a crucial role in communication among people. Large studies suggest that human emotions can also be conveyed through online interactions. Previous studies have addressed the mechanism of emotional contagion; however, emotional contagion, through users of online social networks, has not yet been thoroughly researched. Therefore, in this study, initially, the definition of emotion roles, which may play an important role in emotional contagion, is introduced. On this basis, an emotion role mining approach based on multiview ensemble learning (ERM-ME) is proposed to detect emotion roles in social networks by fusing the information contained in different features. The ERM-ME approach includes three stages: detection of emotional communities, local fusion, and global fusion. First, ERM-ME divides emotional communities based on user emotional preferences. Then, emotional features are employed to train basic classifiers, which are then combined into meta-classifiers. Finally, an accuracy-based weighted voting scheme is used to integrate the results of meta-classifiers to achieve a more accurate and comprehensive classification. Experiments and evaluations are performed using Flickr and Microblog datasets to verify the practicability and effectiveness of the proposed method. Extensive experimental results show that the proposed approach outperforms alternative methods. The micro F-score is used as an evaluation indicator. Using the ERM-ME approach, the indicator is improved by approximately 1.09%–14.57% on Flickr and 5.19%–8.95% on Microblog, compared with Graph Convolutional Network, random forest, AdaBoost, bagging, and stacking.  相似文献   

10.
This paper presents a method for improved ensemble learning, by treating the optimization of an ensemble of classifiers as a compressed sensing problem. Ensemble learning methods improve the performance of a learned predictor by integrating a weighted combination of multiple predictive models. Ideally, the number of models needed in the ensemble should be minimized, while optimizing the weights associated with each included model. We solve this problem by treating it as an example of the compressed sensing problem, in which a sparse solution must be reconstructed from an under-determined linear system. Compressed sensing techniques are then employed to find an ensemble which is both small and effective. An additional contribution of this paper, is to present a new performance evaluation method (a new pairwise diversity measurement) called the roulette-wheel kappa-error. This method takes into account the different weightings of the classifiers, and also reduces the total number of pairs of classifiers needed in the kappa-error diagram, by selecting pairs through a roulette-wheel selection method according to the weightings of the classifiers. This approach can greatly improve the clarity and informativeness of the kappa-error diagram, especially when the number of classifiers in the ensemble is large. We use 25 different public data sets to evaluate and compare the performance of compressed sensing ensembles using four different sparse reconstruction algorithms, combined with two different classifier learning algorithms and two different training data manipulation techniques. We also give the comparison experiments of our method against another five state-of-the-art pruning methods. These experiments show that our method produces comparable or better accuracy, while being significantly faster than the compared methods.  相似文献   

11.
Prostate cancer is a highly incident malignant cancer among men. Early detection of prostate cancer is necessary for deciding whether a patient should receive costly and invasive biopsy with possible serious complications. However, existing cancer diagnosis methods based on data mining only focus on diagnostic accuracy, while neglecting the interpretability of the diagnosis model that is necessary for helping doctors make clinical decisions. To take both accuracy and interpretability into consideration, we propose a stacking-based ensemble learning method that simultaneously constructs the diagnostic model and extracts interpretable diagnostic rules. For this purpose, a multi-objective optimization algorithm is devised to maximize the classification accuracy and minimize the ensemble complexity for model selection. As for model combination, a random forest classifier-based stacking technique is explored for the integration of base learners, i.e., decision trees. Empirical results on real-world data from the General Hospital of PLA demonstrate that the classification performance of the proposed method outperforms that of several state-of-the-art methods in terms of the classification accuracy, sensitivity and specificity. Moreover, the results reveal that several diagnostic rules extracted from the constructed ensemble learning model are accurate and interpretable.  相似文献   

12.
针对现有网络表示学习方法泛化能力较弱等问题,提出了将stacking集成思想应用于网络表示学习的方法,旨在提升网络表示性能。首先,将3个经典的浅层网络表示学习方法DeepWalk、Node2Vec、Line作为并列的初级学习器,训练得到三部分的节点嵌入拼接后作为新数据集;然后,选择图卷积网络(graph convolutional network, GCN)作为次级学习器对新数据集和网络结构进行stacking集成得到最终的节点嵌入,GCN处理半监督分类问题有很好的效果,因为网络表示学习具有无监督性,所以利用网络的一阶邻近性设计损失函数;最后,设计评价指标分别评价初级学习器和集成后的节点嵌入。实验表明,选用GCN集成的效果良好,各评价指标平均提升了1.47~2.97倍。  相似文献   

13.
ContextSeveral issues hinder software defect data including redundancy, correlation, feature irrelevance and missing samples. It is also hard to ensure balanced distribution between data pertaining to defective and non-defective software. In most experimental cases, data related to the latter software class is dominantly present in the dataset.ObjectiveThe objectives of this paper are to demonstrate the positive effects of combining feature selection and ensemble learning on the performance of defect classification. Along with efficient feature selection, a new two-variant (with and without feature selection) ensemble learning algorithm is proposed to provide robustness to both data imbalance and feature redundancy.MethodWe carefully combine selected ensemble learning models with efficient feature selection to address these issues and mitigate their effects on the defect classification performance.ResultsForward selection showed that only few features contribute to high area under the receiver-operating curve (AUC). On the tested datasets, greedy forward selection (GFS) method outperformed other feature selection techniques such as Pearson’s correlation. This suggests that features are highly unstable. However, ensemble learners like random forests and the proposed algorithm, average probability ensemble (APE), are not as affected by poor features as in the case of weighted support vector machines (W-SVMs). Moreover, the APE model combined with greedy forward selection (enhanced APE) achieved AUC values of approximately 1.0 for the NASA datasets: PC2, PC4, and MC1.ConclusionThis paper shows that features of a software dataset must be carefully selected for accurate classification of defective components. Furthermore, tackling the software data issues, mentioned above, with the proposed combined learning model resulted in remarkable classification performance paving the way for successful quality control.  相似文献   

14.
Least squares support vector regression (LSSVR) is an effective and competitive approach for crude oil price prediction, but its performance suffers from parameter sensitivity and long tuning time. This paper considers the user-defined parameters as uncertain (or random) factors to construct an LSSVR ensemble learning paradigm, by taking four major steps. First, probability distributions of the user-defined parameters in LSSVR are designed using grid method for low upper bound estimation (LUBE). Second, random sets of parameters are generated according to the designed probability distributions to formulate diverse individual LSSVR members. Third, each individual member is applied to individual prediction. Finally, all individual results are combined to the final output via ensemble weighted averaging, with probabilities measuring the corresponding weights. The computational experiment using the crude oil spot price of West Texas Intermediate (WTI) verifies the effectiveness of the proposed LSSVR ensemble learning paradigm with uncertain parameters compared with some existing LSSVR variants (using other popular parameters selection algorithms), in terms of prediction accuracy and time-saving.  相似文献   

15.
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature.While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC).Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases.RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems.  相似文献   

16.
This article addresses the problem of identifying the most likely music performer, given a set of performances of the same piece by a number of skilled candidate pianists. We propose a set of very simple features for representing stylistic characteristics of a music performer, introducing ‘norm-based’ features that relate to a kind of ‘average’ performance. A database of piano performances of 22 pianists playing two pieces by Frédéric Chopin is used in the presented experiments. Due to the limitations of the training set size and the characteristics of the input features we propose an ensemble of simple classifiers derived by both subsampling the training set and subsampling the input features. Experiments show that the proposed features are able to quantify the differences between music performers. The proposed ensemble can efficiently cope with multi-class music performer recognition under inter-piece conditions, a difficult musical task, displaying a level of accuracy unlikely to be matched by human listeners (under similar conditions).  相似文献   

17.
Abstract Error Correcting Output Coding (ECOC) methods for multiclass classification present several open problems ranging from the trade-off between their error recovering capabilities and the learnability of the induced dichotomies to the selection of proper base learners and to the design of well-separated codes for a given multiclass problem. We experimentally analyse some of the main factors affecting the effectiveness of ECOC methods. We show that the architecture of ECOC learning machines influences the accuracy of the ECOC classifier, highlighting that ensembles of parallel and independent dichotomic Multi-Layer Perceptrons are well-suited to implement ECOC methods. We quantitatively evaluate the dependence among codeword bit errors using mutual information based measures, experimentally showing that a low dependence enhances the generalisation capabilities of ECOC. Moreover we show that the proper selection of the base learner and the decoding function of the reconstruction stage significantly affects the performance of the ECOC ensemble. The analysis of the relationships between the error recovering power, the accuracy of the base learners, and the dependence among codeword bits show that all these factors concur to the effectiveness of ECOC methods in a not straightforward way, very likely dependent on the distribution and complexity of the data.An erratum to this article can be found at  相似文献   

18.
Manufacturing quality control (QC) in plastic injection moulding is of the upmost importance since almost one third of plastic products are manufactured via the injection moulding process. Moreover, smart manufacturing technologies are enabling the generation of huge amounts of data in production lines. This data can be used for predicting the quality of manufactured plastic products using machine learning methods, allowing companies to save costs and improve their production efficiency. However, high-performance machine learning models are usually too complicated to be understood by human intuition. Therefore, we have introduced a rule-based explanations (RBE) framework that combines several machine learning interpretation methods to help to understand the decision mechanisms of accurate and complex predictive models – specifically tree ensemble models. These generated rules can be used to visually and easily understand the main factors that affect the quality in the manufacturing process. To demonstrate the applicability of RBE, we present two experiments with real industrial data gathered from a plastic injection moulding machine in a Singapore model factory. The collected datasets contain condition data for several manufacturing processes as well as the QC results for sink mark defects in the production of small plastic products. The experiments revealed that it is possible to extract meaningful explanations in the form of simple decision rules that are enhanced with partial dependence plots and feature importance rankings for a better understanding of the underlying mechanisms and data relationships of accurate tree ensembles.  相似文献   

19.
处理类不平衡数据时,少数类的边界实例非常容易被错分。为了降低类不平衡对分类器性能的影响,提出了自适应边界采样算法(AB-SMOTE)。AB-SMOTE算法对少数类的边界样本进行自适应采样,提高了数据集的平衡度和有效性。同时将AB-SMOTE算法与数据清理技术融合,形成基于AdaBoost的集成算法ABTAdaBoost。ABTAdaBoost算法主要包括三个阶段:第一个阶段对训练数据集采用AB-SMOTE算法,降低数据集的类不平衡度;第二个阶段使用Tomek links数据清理技术,清除数据集中的噪声和抽样方法产生的重叠样例,有效提高数据的可用性;第三个阶段使用AdaBoost集成算法生成一个基于N个弱分类器的集成分类器。实验分别以J48决策树和朴素贝叶斯作为基分类器,在12个UCI数据集上的实验结果表明:ABTAdaBoost算法的预测性能优于其它几种算法。  相似文献   

20.
In multi-instance learning, the training set is composed of labeled bags each consists of many unlabeled instances, that is, an object is represented by a set of feature vectors instead of only one feature vector. Most current multi-instance learning algorithms work through adapting single-instance learning algorithms to the multi-instance representation, while this paper proposes a new solution which goes at an opposite way, that is, adapting the multi-instance representation to single-instance learning algorithms. In detail, the instances of all the bags are collected together and clustered into d groups first. Each bag is then re-represented by d binary features, where the value of the ith feature is set to one if the concerned bag has instances falling into the ith group and zero otherwise. Thus, each bag is represented by one feature vector so that single-instance classifiers can be used to distinguish different classes of bags. Through repeating the above process with different values of d, many classifiers can be generated and then they can be combined into an ensemble for prediction. Experiments show that the proposed method works well on standard as well as generalized multi-instance problems. Zhi-Hua Zhou is currently Professor in the Department of Computer Science & Technology and head of the LAMDA group at Nanjing University. His main research interests include machine learning, data mining, information retrieval, and pattern recognition. He is associate editor of Knowledge and Information Systems and on the editorial boards of Artificial Intelligence in Medicine, International Journal of Data Warehousing and Mining, Journal of Computer Science & Technology, and Journal of Software. He has also been involved in various conferences. Min-Ling Zhang received his B.Sc. and M.Sc. degrees in computer science from Nanjing University, China, in 2001 and 2004, respectively. Currently he is a Ph.D. candidate in the Department of Computer Science & Technology at Nanjing University and a member of the LAMDA group. His main research interests include machine learning and data mining, especially in multi-instance learning and multi-label learning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号