首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A set of classification rules can be considered as a disjunction of rules, where each rule is a disjunct. A small disjunct is a rule covering a small number of examples. Small disjuncts are a serious problem for effective classification, because the small number of examples satisfying these rules makes their prediction unreliable and error-prone. This paper offers two main contributions to the research on small disjuncts. First, it investigates six candidate solutions (algorithms) for the problem of small disjuncts. Second, it reports the results of a meta-learning experiment, which produced meta-rules predicting which algorithm will tend to perform best for a given data set. The algorithms investigated in this paper belong to different machine learning paradigms and their hybrid combinations, as follows: two versions of a decision-tree (DT) induction algorithm; two versions of a hybrid DT/genetic algorithm (GA) method; one GA; one hybrid DT/instance-based learning (IBL) algorithm. Experiments with 22 data sets evaluated both the predictive accuracy and the simplicity of the discovered rule sets, with the following conclusions. If one wants to maximize predictive accuracy only, then the hybrid DT/IBL seems to be the best choice. On the other hand, if one wants to maximize both predictive accuracy and rule set simplicity -- which is important in the context of data mining -- then a hybrid DT/GA seems to be the best choice.  相似文献   

2.
链路预测是复杂网络的重要研究方向,当前的链路预测算法因可利用的网络信息有限,导致预测算法的精确度受限。为了提高预测算法的性能,采用改进的AdaBoost算法进行链路预测。首先根据复杂网络样本建立邻接矩阵,完成样本的矩阵化处理;然后采用AdaBoost算法进行分类训练,通过权重投票获取预测结果;最后,考虑到复杂网络弱分类器预测正负误差分布的不均衡问题,设置权重调整因子η及其调整范围[η1,η2],并根据η值动态调整AdaBoost算法的多个弱分类器分类结果的权重,从而获得准确的链路预测结果。实验结果证明,相比其他常用网络链路预测算法及传统AdaBoost算法,改进的AdaBoost算法的预测准确率优势明显,且在节点数量较多时,其预测时间性能和其他算法的差距较小。  相似文献   

3.
The prediction accuracy and generalization ability of neural/neurofuzzy models for chaotic time series prediction highly depends on employed network model as well as learning algorithm. In this study, several neural and neurofuzzy models with different learning algorithms are examined for prediction of several benchmark chaotic systems and time series. The prediction performance of locally linear neurofuzzy models with recently developed Locally Linear Model Tree (LoLiMoT) learning algorithm is compared with that of Radial Basis Function (RBF) neural network with Orthogonal Least Squares (OLS) learning algorithm, MultiLayer Perceptron neural network with error back-propagation learning algorithm, and Adaptive Network based Fuzzy Inference System. Particularly, cross validation techniques based on the evaluation of error indices on multiple validation sets is utilized to optimize the number of neurons and to prevent over fitting in the incremental learning algorithms. To make a fair comparison between neural and neurofuzzy models, they are compared at their best structure based on their prediction accuracy, generalization, and computational complexity. The experiments are basically designed to analyze the generalization capability and accuracy of the learning techniques when dealing with limited number of training samples from deterministic chaotic time series, but the effect of noise on the performance of the techniques is also considered. Various chaotic systems and time series including Lorenz system, Mackey-Glass chaotic equation, Henon map, AE geomagnetic activity index, and sunspot numbers are examined as case studies. The obtained results indicate the superior performance of incremental learning algorithms and their respective networks, such as, OLS for RBF network and LoLiMoT for locally linear neurofuzzy model.  相似文献   

4.
Several studies have demonstrated the superior performance of ensemble classification algorithms, whereby multiple member classifiers are combined into one aggregated and powerful classification model, over single models. In this paper, two rotation-based ensemble classifiers are proposed as modeling techniques for customer churn prediction. In Rotation Forests, feature extraction is applied to feature subsets in order to rotate the input data for training base classifiers, while RotBoost combines Rotation Forest with AdaBoost. In an experimental validation based on data sets from four real-life customer churn prediction projects, Rotation Forest and RotBoost are compared to a set of well-known benchmark classifiers. Moreover, variations of Rotation Forest and RotBoost are compared, implementing three alternative feature extraction algorithms: principal component analysis (PCA), independent component analysis (ICA) and sparse random projections (SRP). The performance of rotation-based ensemble classifier is found to depend upon: (i) the performance criterion used to measure classification performance, and (ii) the implemented feature extraction algorithm. In terms of accuracy, RotBoost outperforms Rotation Forest, but none of the considered variations offers a clear advantage over the benchmark algorithms. However, in terms of AUC and top-decile lift, results clearly demonstrate the competitive performance of Rotation Forests compared to the benchmark algorithms. Moreover, ICA-based Rotation Forests outperform all other considered classifiers and are therefore recommended as a well-suited alternative classification technique for the prediction of customer churn that allows for improved marketing decision making.  相似文献   

5.
An active learning algorithm is devised for training Self-Organizing Feature Maps on large data sets. Active learning algorithms recognize that not all exemplars are created equal. Thus, the concepts of exemplar age and difficulty are used to filter the original data set such that training epochs are only conducted over a small subset of the original data set. The ensuing Hierarchical Dynamic Subset Selection algorithm introduces definitions for exemplar difficulty suitable to an unsupervised learning context and therefore appropriate Self-organizing map (SOM) stopping criteria. The algorithm is benchmarked on several real world data sets with training set exemplar counts in the region of 30–500 thousand. Cluster accuracy is demonstrated to be at least as good as that from the original SOM algorithm while requiring a fraction of the computational overhead.  相似文献   

6.
支持向量机(SVM)因为核函数应用内积运算造成了模型较强的“黑箱性”。目前SVM的“黑箱性”研究主要采用规则提取方法解决分类问题,而回归问题鲜有提及。针对回归问题,尝试性提出基于回归树算法的SVM回归规则提取方法,算法充分利用支持向量的特殊性以及回归树的优势,建立支持向量的决策树模型,成功提取出决策能力高,包含变量少,计算量小且容易读取的规则。通过标准数据集Auto MPG和实际的煤制甲醇生产数据集进行了验证,与其他算法对比分析结果表明,所提取的回归规则在训练精度和预测精度等方面都有一定程度的提高。  相似文献   

7.
Many techniques have been proposed for credit risk prediction, from statistical models to artificial intelligence methods. However, very few research efforts have been devoted to deal with the presence of noise and outliers in the training set, which may strongly affect the performance of the prediction model. Accordingly, the aim of the present paper is to systematically investigate whether the application of filtering algorithms leads to an increase in accuracy of instance-based classifiers in the context of credit risk assessment. The experimental results with 20 different algorithms and 8 credit databases show that the filtered sets perform significantly better than the non-preprocessed training sets when using the nearest neighbour decision rule. The experiments also allow to identify which techniques are most robust and accurate when confronted with noisy credit data.  相似文献   

8.
Ant colony optimization (ACO) algorithms have been successfully applied in data classification, which aim at discovering a list of classification rules. However, due to the essentially random search in ACO algorithms, the lists of classification rules constructed by ACO-based classification algorithms are not fixed and may be distinctly different even using the same training set. Those differences are generally ignored and some beneficial information cannot be dug from the different data sets, which may lower the predictive accuracy. To overcome this shortcoming, this paper proposes a novel classification rule discovery algorithm based on ACO, named AntMinermbc, in which a new model of multiple rule sets is presented to produce multiple lists of rules. Multiple base classifiers are built in AntMinermbc, and each base classifier is expected to remedy the weakness of other base classifiers, which can improve the predictive accuracy by exploiting the useful information from various base classifiers. A new heuristic function for ACO is also designed in our algorithm, which considers both of the correlation and coverage for the purpose to avoid deceptive high accuracy. The performance of our algorithm is studied experimentally on 19 publicly available data sets and further compared to several state-of-the-art classification approaches. The experimental results show that the predictive accuracy obtained by our algorithm is statistically higher than that of the compared targets.  相似文献   

9.
Learning‐to‐rank (LtR) has become an integral part of modern ranking systems. In this field, the random forest–based rank‐learning algorithms are shown to be among of the top performers. Traditionally, each tree of a random forest is learnt using a bootstrapped copy of the training set, where approximately 63% of the examples are unique. The goal of using a bootstrapped copy instead of the original training set is to reduce the correlation between individual trees, thereby making the prediction of the ensemble more accurate. In this regard, the following question may be raised: how can we leverage the correlation between the trees in favor of performance and scalability of a random forest–based LtR algorithm? In this article, we investigate whether we can further decrease the correlation between the trees while maintaining or possibly improving accuracy. Among several potential options to achieve this goal, we investigate the size of the subsamples used for learning individual trees. We examine the performance of a random forest–based LtR algorithm as we control the correlation using this parameter. Experiments on LtR data sets reveal that for small‐ to moderate‐sized data sets, substantial reduction in training time can be achieved using only a small amount of training data per tree. Moreover, due to the positive correlation between the variability across the trees and performance of a random forest, we observe an increase in accuracy while maintaining the same level of model stability as the baseline. For big data sets, although our experiments did not observe an increase in accuracy (because, with larger data sets, the individual tree variance is already comparatively smaller), our technique is still applicable as it allows for greater scalability.  相似文献   

10.
针对癌症数据集中存在非平衡数据及噪声样本的问题,提出一种基于RENN和SMOTE算法的癌症患者生存预测算法RENN-SMOTE-SVM。基于最近邻规则,利用RENN算法减少多数类样本中噪声样本数量,并通过SMOTE算法在少数类样本间进行线性插值增加样本数量,从而获得平衡数据集。基于美国癌症数据库非平衡乳腺癌患者数据集对癌症患者的生存情况进行预测分析,实验结果表明,与SVM算法、Tomeklinks-SVM算法等5种常用算法相比,该算法的分类及预测效果更好,其正确率、F1-score、G-means值分别为0.883,0.904,0.779。  相似文献   

11.
In theory, branch predictors with more complicated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be implemented because of their long access delay. To date, many strategies have been proposed to balance delay with accuracy, but none has completely solved the issue. The architecture for ahead branch prediction (A2BP) separates traditional predictors into two parts. First is a small table located at the front-end of the pipeline, which makes the prediction brief enough even for some aggressive processors. Second, operations on complicated algorithms and large data structures for accurate predictions are all moved to the back-end of the pipeline. An effective mechanism is introduced for ahead branch prediction in the back-end and small table update in the front. To substantially improve prediction accuracy, an indirect branch prediction algorithm based on branch history and target path (BHTP) is implemented in A2BP. Experiments with the standard performance evaluation corporation (SPEC) benchmarks on gem5/SimpleScalar simulators demonstrate that A2BP improves average performance by 2.92% compared with a commonly used branch target buffer-based predictor. In addition, indirect branch misses with the BHTP algorithm are reduced by an average of 28.98% compared with the traditional algorithm.  相似文献   

12.
Support vector machines (SVMs) are state-of-the-art tools used to address issues pertinent to classification. However, the explanation capabilities of SVMs are also their main weakness, which is why SVMs are typically regarded as incomprehensible black box models. In the present study, a rule extraction algorithm to extract the comprehensible rule from SVMs and enhance their explanation capability is proposed. The proposed algorithm seeks to use the support vectors from a training model of SVMs and combine genetic algorithms for constructing rule sets. The proposed method can not only generate rule sets from SVMs based on the mixed discrete and continuous variables but can also select important variables in the rule set simultaneously. Measurements of accuracy, sensitivity, specificity, and fidelity are utilized to compare the performance of the proposed method with direct learner algorithms and several rule-extraction techniques from SVMs. The results indicate that the proposed method performs at least as well as with the most successful direct rule learners. Finally, an actual case of pressure ulcer was studied, and the results indicated the practicality of our proposed method in real applications.  相似文献   

13.
为了更高效、准确地预测矿井主运输传送带火灾的发生,提出了一种基于粗糙集-支持向量机RS-SVM的煤矿火灾预测算法。利用RS理论对8个变量映射为粗集知识系统进行离散化处理以及属性约简,去除冗余信息,排除对于实验不必要的干扰,获得知识系统规则集;通过训练确定RS-SVM模型,再回判来验证此模型的准则性,最后对RS-SVM、贝叶斯、RBF-NN三种预测算法进行样本的预测分析,结果表明RS-SVM算法与其他两种算法相比有着明显的优势,在少样本时的预测准确性更高、速度快、抗扰性好、非线性能力强,现场实用性强,使用范围广,对于火灾的预测具有重要意义。  相似文献   

14.
基于Adaboost算法的回声状态网络预报器   总被引:1,自引:0,他引:1  
把单个回声状态网络(echo state network,ESN)的预测模型作改进,对整体ESN预测精度的提高是有限的.针对以上问题,本文考虑整体ESN.首先利用Adaboost算法提升单个ESN的泛化性能及预测精度,并且根据Adaboost算法的结果,建立一种ESN预报器(Adaboost ESN,ABESN).这个ESN预报器根据拟合误差不断修正训练样本的权重,拟合误差越大,训练样本权重值就越大;因此,它在下一次迭代时,就会侧重在难以学习的样本.把单个ESN的预测模型经过加权,然后按照加法组合在一起,形成最终的ESN预测模型.将该预测模型应用于太阳黑子、Mackey-Glass时间序列的预测研究,仿真结果表明所提出的预测模型在实际时间序列预测领域的有效性.  相似文献   

15.
现有的软件缺陷预测方法面临数据类别不平衡性、高维数据处理等问题。如何有效解决上述问题已成为目前相关领域的研究热点。针对软件缺陷预测所面临的类别不平衡、预测精度低等问题,本文提出一种基于混合采样与Random_Stacking的软件缺陷预测算法DP_HSRS。DP_HSRS算法首先采用混合采样算法对不平衡数据进行平衡化处理;然后在该平衡数据集上采用Random_Stacking算法进行软件缺陷预测。Random_Stacking算法是对传统Stacking算法的一种有效改进,它通过融合多个经典的分类算法以及Bagging机制构建多个Stacking分类器,对多个Stacking分类器进行投票,得到一个集成分类器,最后利用该集成分类器对软件缺陷进行预测。通过在NASA MDP数据集上的实验结果表明,DP_HSRS算法的性能优于现有的算法,具有更好的缺陷预测性能。  相似文献   

16.
Yang  Yong  Kong  Xiangwei  Feng  Chaoyu 《Multimedia Tools and Applications》2018,77(14):17993-18005

Steganalysis is a technology of detecting the presence of secret messages in digital media. Recently, many algorithms have been proposed and achieved satisfactory detection accuracy. However, the performance of these algorithms will be reduced by double-compression, due to the mismatch between training and testing sets. To address this problem, we proposed Transferring Feature on Double-compressed JPEG images (TFD) to improve the detection accuracy. Specifically, our algorithm consists of two parts. First, we detect the double-compression of testing images by constructing multi-classifier with Markov feature. Then we transfer the steganalysis feature into a new feature space, in order to reduce the difference of feature distributions between training and testing sets. We intend to obtain a transformation matrix by adjusting the expectation and standard deviation of training set, minimizing the feature discrepancy between both sets and keeping classification ability of training set, simultaneously. The experimental results show that the proposed algorithm has better performance in double-compressed mismatched steganalysis.

  相似文献   

17.
This paper introduces a hybrid system termed cascade adaptive resonance theory mapping (ARTMAP) that incorporates symbolic knowledge into neural-network learning and recognition. Cascade ARTMAP, a generalization of fuzzy ARTMAP, represents intermediate attributes and rule cascades of rule-based knowledge explicitly and performs multistep inferencing. A rule insertion algorithm translates if-then symbolic rules into cascade ARTMAP architecture. Besides that initializing networks with prior knowledge can improve predictive accuracy and learning efficiency, the inserted symbolic knowledge can be refined and enhanced by the cascade ARTMAP learning algorithm. By preserving symbolic rule form during learning, the rules extracted from cascade ARTMAP can be compared directly with the originally inserted rules. Simulations on an animal identification problem indicate that a priori symbolic knowledge always improves system performance, especially with a small training set. Benchmark study on a DNA promoter recognition problem shows that with the added advantage of fast learning, cascade ARTMAP rule insertion and refinement algorithms produce performance superior to those of other machine learning systems and an alternative hybrid system known as knowledge-based artificial neural network (KBANN). Also, the rules extracted from cascade ARTMAP are more accurate and much cleaner than the NofM rules extracted from KBANN.  相似文献   

18.
一种基于多维集的关联模式挖掘算法   总被引:2,自引:0,他引:2  
大多数维间关联规则挖掘算法如基于数据立方体的关联规则挖掘算法都假定对象的属性取值只具有单值性.将对象的属性取值扩展到多值,据此提出多维集的概念和基于多维集关联规则的语义特征.在此语义特征下,提出了一个多维集的关联规则挖掘算法.该算法利用多维集关联规则的限制特征,能够在数据集缩减的同时进行侯选集的三重剪枝,因此,具有比直接使用apriori等算法更好的性能,分析了算法的性能和正确性、完备性,并通过实验对算法有效性进行了对比.  相似文献   

19.
ContextSoftware defect prediction has been widely studied based on various machine-learning algorithms. Previous studies usually focus on within-company defects prediction (WCDP), but lack of training data in the early stages of software testing limits the efficiency of WCDP in practice. Thus, recent research has largely examined the cross-company defects prediction (CCDP) as an alternative solution.ObjectiveHowever, the gap of different distributions between cross-company (CC) data and within-company (WC) data usually makes it difficult to build a high-quality CCDP model. In this paper, a novel algorithm named Double Transfer Boosting (DTB) is introduced to narrow this gap and improve the performance of CCDP by reducing negative samples in CC data.MethodThe proposed DTB model integrates two levels of data transfer: first, the data gravitation method reshapes the whole distribution of CC data to fit WC data. Second, the transfer boosting method employs a small ratio of labeled WC data to eliminate negative instances in CC data.ResultsThe empirical evaluation was conducted based on 15 publicly available datasets. CCDP experiment results indicated that the proposed model achieved better overall performance than compared CCDP models. DTB was also compared to WCDP in two different situations. Statistical analysis suggested that DTB performed significantly better than WCDP models trained by limited samples and produced comparable results to WCDP with sufficient training data.ConclusionsDTB reforms the distribution of CC data from different levels to improve the performance of CCDP, and experimental results and analysis demonstrate that it could be an effective model for early software defects detection.  相似文献   

20.
There exist numerous state of the art classification algorithms that are designed to handle the data with nominal or binary class labels. Unfortunately, less attention is given to the genre of classification problems where the classes are organized as a structured hierarchy; such as protein function prediction (target area in this work), test scores, gene ontology, web page categorization, text categorization etc. The structured hierarchy is usually represented as a tree or a directed acyclic graph (DAG) where there exist IS-A relationship among the class labels. Class labels at upper level of the hierarchy are more abstract and easy to predict whereas class labels at deeper level are most specific and challenging for correct prediction. It is helpful to consider this class hierarchy for designing a hypothesis that can handle the tradeoff between prediction accuracy and prediction specificity. In this paper, a novel ant colony optimization (ACO) based single path hierarchical classification algorithm is proposed that incorporates the given class hierarchy during its learning phase. The algorithm produces IF–THEN ordered rule list and thus offer comprehensible classification model. Detailed discussion on the architecture and design of the proposed technique is provided which is followed by the empirical evaluation on six ion-channels data sets (related to protein function prediction) and two publicly available data sets. The performance of the algorithm is encouraging as compared to the existing methods based on the statistically significant Student's t-test (keeping in view, prediction accuracy and specificity) and thus confirm the promising ability of the proposed technique for hierarchical classification task.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号