首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
Abstract: The artificial immune recognition system (AIRS) has been shown to be an efficient approach to tackling a variety of problems such as machine learning benchmark problems and medical classification problems. In this study, the resource allocation mechanism of AIRS was replaced with a new one based on fuzzy logic. The new system, named Fuzzy-AIRS, was used as a classifier in the classification of three well-known medical data sets, the Wisconsin breast cancer data set (WBCD), the Pima Indians diabetes data set and the ECG arrhythmia data set. The performance of the Fuzzy-AIRS algorithm was tested for classification accuracy, sensitivity and specificity values, confusion matrix, computation time and receiver operating characteristic curves. Also, the AIRS and Fuzzy-AIRS algorithms were compared with respect to the amount of resources required in the execution of the algorithm. The highest classification accuracy obtained from applying the AIRS and Fuzzy-AIRS algorithms using 10-fold cross-validation was, respectively, 98.53% and 99.00% for classification of WBCD; 79.22% and 84.42% for classification of the Pima Indians diabetes data set; and 100% and 92.86% for classification of the ECG arrhythmia data set. Hence, these results show that Fuzzy-AIRS can be used as an effective classifier for medical problems.  相似文献   

2.

Features subset selection (FSS) generally plays an essential role in the implementation of data mining, particularly in the field of high-dimensional medical data analysis, as well as in supplying early detection with essential features and high accuracy. The latest modern feature selection models are now using the ability of optimization algorithms for extracting features of particular properties to get the highest accuracy performance possible. Many of the optimization algorithms, such as genetic algorithm, often use the required parameters that would need to be adjusted for better results. For the function selection procedure, tuning these parameter values is a difficult challenge. In this paper, a new wrapper-based feature selection approach called binary teaching learning based optimization (BTLBO) is introduced. The binary teaching learning based optimization (BTLBO) is among the most sophisticated meta-heuristic method which does not involve any specific algorithm parameters. It requires only standard process parameters such as population size and a number of iterations to extract a set of features selected from a data. This is a demanding process, to achieve the best possible set of features would be to use a method which is independent of the method controlling parameters. This paper introduces a new modified binary teaching–learning-based optimization (NMBTLBO) as a technique to select subset features and demonstrate support vector machine (SVM) accuracy of binary identification as a fitness function for the implementation of the feature subset selection process. The new proposed algorithm NMBTLBO contains two steps: first, the new updating procedure, second, the new method to select the primary teacher in teacher phase in binary teaching-learning based on optimization algorithm. The proposed technique NMBTLBO was used to classify the rheumatic disease datasets collected from Baghdad Teaching Hospital Outpatient Rheumatology Clinic during 2016–2018. Compared with the original BTLBO algorithm, the improved NMBTLBO algorithm has achieved a major difference in accuracy. Validation was carried out by testing the accuracy of four classification methods: K-nearest neighbors, decision trees, support vector machines and K-means. Study results showed that the classification accuracy of the four methods was increased for the proposed method of selection of features (NMBTLBO) compared to the BTLBO algorithm. SVM classifier provided 89% accuracy of BTLBO-SVM and 95% with NMBTLBO –SVM. Decision trees set the values of 94% with BTLBO-SVM and 95% with the feature selection of NMBTLBO-SVM. The analysis indicates that the latest method (NMBTLBO) enhances classification accuracy.

  相似文献   

3.
This paper introduces a new learning technique for the multicriteria classification method PROAFTN. This new technique, called DEPRO, utilizes a Differential Evolution (DE) algorithm for learning and optimizing the output of the classification method PROAFTN. The limitation of the PROAFTN method is largely due to the set of parameters (e.g., intervals and weights) required to be obtained to perform the classification procedure. Therefore, a learning method is needed to induce and extract these parameters from data. DE is an efficient metaheuristic optimization algorithm based on a simple mathematical structure to mimic a complex process of evolution. Some of the advantages of DE over other global optimization methods are that it often converges faster and with more certainty than many other methods and it uses fewer control parameters. In this work, the DE algorithm is proposed to inductively obtain PROAFTN’s parameters from data to achieve a high classification accuracy. Based on results generated from 12 public datasets, DEPRO provides excellent results, outperforming the most common classification algorithms.  相似文献   

4.
张卫  古林燕  刘嘉 《集成技术》2020,9(6):48-58
为加快卷积神经网络的训练,该研究提出一种受区域分解方法启发的新型学习策略。将该方 法应用于残差网络(ResNet)进行图像分类时,使用 ResNet32 可获得最佳结果。进一步地,将 ResNet32 分成 4 个子网络,其中每个子网具有 0.47 M 参数,此为原始 ResNet32 的 1/16,从而简化了学习过程。 此外,由于可以并行训练子网络,因此在使用 CIFAR-10 数据集进行分类任务时,计算时间可以从 8.53 h (通过常规学习策略)减少到 5.65 h,分类准确性从 92.82% 提高到 94.09%。CIFAR-100 和 Food-101 数 据集也实现了类似的改进。实验结果显示,所提出的学习策略可以大大减少计算时间,并提高分类的 准确性。这表明所提出的策略可以潜在地应用于训练带有大量参数的卷积神经网络。  相似文献   

5.
A major task in developing a fuzzy classification system is to generate a set of fuzzy rules from training instances to deal with a specific classification problem. In recent years, many methods have been developed to generate fuzzy rules from training instances. We present a new method to generate fuzzy rules from training instances to deal with the Iris data classification problem. The proposed method can discard some useless input attributes to improve the average classification accuracy rate. It can obtain a higher average classification accuracy rate and it generates fewer fuzzy rules and fewer input fuzzy sets in the generated fuzzy rules than the existing methods.  相似文献   

6.
基于样本权重更新的不平衡数据集成学习方法   总被引:1,自引:0,他引:1  
不平衡数据的问题普遍存在于大数据、机器学习的各个应用领域,如医疗诊断、异常检测等。研究者提出或采用了多种方法来进行不平衡数据的学习,比如数据采样(如SMOTE)或者集成学习(如EasyEnsemble)的方法。数据采样中的过采样方法可能存在过拟合或边界样本分类准确率较低等问题,而欠采样方法则可能导致欠拟合。文中将SMOTE,Bagging,Boosting等算法的基本思想进行融合,提出了Rotation SMOTE算法。该算法通过在Boosting过程中根据基分类器的预测结果对少数类样本进行SMOTE来间接地增大少数类样本的权重,并借鉴Focal Loss的基本思想提出了根据基分类器预测结果直接优化AdaBoost权重更新策略的FocalBoost算法。对不同应用领域共11个不平衡数据集的多个评价指标进行实验测试,结果表明,相比于其他不平衡数据算法(包括SMOTEBoost算法和EasyEnsemble算法),Rotation SMOTE算法在所有数据集上具有最高的召回率,并且在大多数数据集上具有最佳或者次佳的G-mean以及F1Score;而相比于原始的AdaBoost,FocalBoost则在其中9个不平衡数据集上都获得了更优的性能指标。  相似文献   

7.
半监督学习和集成学习是目前机器学习领域中的重要方法。半监督学习利用未标记样本,而集成学习综合多个弱学习器,以提高分类精度。针对名词型数据,本文提出一种融合聚类和集成学习的半监督分类方法SUCE。在不同的参数设置下,采用多个聚类算法生成大量的弱学习器;利用已有的类标签信息,对弱学习器进行评价和选择;通过集成弱学习器对测试集进行预分类,并将置信度高的样本放入训练集;利用扩展的训练集,使用ID3、Nave Bayes、 kNN、C4.5、OneR、Logistic等基础算法对其他样本进行分类。在UCI数据集上的实验结果表明,当训练样本较少时,本方法能稳定提高多数基础算法的准确性。  相似文献   

8.
A Computer-Aided Diagnostic (CAD) system that uses Artificial Neural Network (ANN) trained by drawing in the relative advantages of Differential Evolution (DE), Particle Swarm Optimization (PSO) and gradient descent based backpropagation (BP) for classifying clinical datasets is proposed. The DE algorithm with a modified best mutation operation is used to enhance the search exploration of PSO. The ANN is trained using PSO and the global best value obtained is used as a seed by the BP. Local search is performed using BP, in which the weights of the Neural Network (NN) are adjusted to obtain an optimal set of NN weights. Three benchmark clinical datasets namely, Pima Indian Diabetes, Wisconsin Breast Cancer and Cleveland Heart Disease, obtained from the University of California Irvine (UCI) machine learning repository have been used. The performance of the trained neural network classifier proposed in this work is compared with the existing gradient descent backpropagation, differential evolution with backpropagation and particle swarm optimization with gradient descent backpropagation algorithms. The experimental results show that DEGI-BP provides 85.71% accuracy for diabetes, 98.52% for breast cancer and 86.66% for heart disease datasets. This CAD system can be used by junior clinicians as an aid for medical decision support.  相似文献   

9.

In the fields of pattern recognition and machine learning, the use of data preprocessing algorithms has been increasing in recent years to achieve high classification performance. In particular, it has become inevitable to use the data preprocessing method prior to classification algorithms in classifying medical datasets with the nonlinear and imbalanced data distribution. In this study, a new data preprocessing method has been proposed for the classification of Parkinson, hepatitis, Pima Indians, single proton emission computed tomography (SPECT) heart, and thoracic surgery medical datasets with the nonlinear and imbalanced data distribution. These datasets were taken from UCI machine learning repository. The proposed data preprocessing method consists of three steps. In the first step, the cluster centers of each attribute were calculated using k-means, fuzzy c-means, and mean shift clustering algorithms in medical datasets including Parkinson, hepatitis, Pima Indians, SPECT heart, and thoracic surgery medical datasets. In the second step, the absolute differences between the data in each attribute and the cluster centers are calculated, and then, the average of these differences is calculated for each attribute. In the final step, the weighting coefficients are calculated by dividing the mean value of the difference to the cluster centers, and then, weighting is performed by multiplying the obtained weight coefficients by the attribute values in the dataset. Three different attribute weighting methods have been proposed: (1) similarity-based attribute weighting in k-means clustering, (2) similarity-based attribute weighting in fuzzy c-means clustering, and (3) similarity-based attribute weighting in mean shift clustering. In this paper, we aimed to aggregate the data in each class together with the proposed attribute weighting methods and to reduce the variance value within the class. Thus, by reducing the value of variance in each class, we have put together the data in each class and at the same time, we have further increased the discrimination between the classes. To compare with other methods in the literature, the random subsampling has been used to handle the imbalanced dataset classification. After attribute weighting process, four classification algorithms including linear discriminant analysis, k-nearest neighbor classifier, support vector machine, and random forest classifier have been used to classify imbalanced medical datasets. To evaluate the performance of the proposed models, the classification accuracy, precision, recall, area under the ROC curve, κ value, and F-measure have been used. In the training and testing of the classifier models, three different methods including the 50–50% train–test holdout, the 60–40% train–test holdout, and tenfold cross-validation have been used. The experimental results have shown that the proposed attribute weighting methods have obtained higher classification performance than random subsampling method in the handling of classifying of the imbalanced medical datasets.

  相似文献   

10.
In this paper, we address the problem of image set classification, where each set contains a different number of images acquired from the same subject. In most of the existing literature, each image set is modeled using all its available samples. As a result, the corresponding time and storage costs are high. To address this problem, we propose a joint prototype and metric learning approach. The prototypes are learned to represent each gallery image set using fewer samples without affecting the recognition performance. A Mahalanobis metric is learned simultaneously to measure the similarity between sets more accurately. In particular, each gallery set is represented as a regularized affine hull spanned by the learned prototypes. The set-to-set distance is optimized via updating the prototypes and the Mahalanobis metric in an alternating manner. To highlight the importance of representing image sets using fewer samples, we analyzed the corresponding test time complexity with respect to the number of images used per set. Experimental results using YouTube Celebrity, YouTube Faces, and ETH-80 datasets illustrate the efficiency on the task of video face recognition, and object categorization.  相似文献   

11.
Healthcare data analysis is currently a challenging and crucial research issue for the development of a robust disease diagnosis and prediction system. Many specific and a few common methods have been discussed in the literature for healthcare data classification. The present study implements 32 classification methods of six categories (Bayes, function‐based, lazy, meta, rule‐based, and tree‐based) with the objective of searching the best and common categories and methods in healthcare data mining. The performance of each classification method has been evaluated based on analysis time, classification accuracy, precision, recall, F‐measure, area under the receiver operating characteristic curve, root mean square error, kappa coefficient, Kulczynski's measure, and Fowlkes–Mallows index and compared with more than 90 classification methods used in past studies. Seventeen healthcare datasets related to thyroid, cancer, skin disease, heart disease, hepatitis, lymphography, audiology, diabetes, surgery, arrhythmia, postsurvival, liver, and tumour have been used in the performance assessment of the classification methods. The tree‐based classification methods have a better performance (with an average classification accuracy of 79.92% and maximum accuracy of 99.50%; an analysis time of 3.91 s for the logistic model tree classifier) than the other methods. Furthermore, the association of datasets and classification methods has been discussed.  相似文献   

12.
Ensembles that combine the decisions of classifiers generated by using perturbed versions of the training set where the classes of the training examples are randomly switched can produce a significant error reduction, provided that large numbers of units and high class switching rates are used. The classifiers generated by this procedure have statistically uncorrelated errors in the training set. Hence, the ensembles they form exhibit a similar dependence of the training error on ensemble size, independently of the classification problem. In particular, for binary classification problems, the classification performance of the ensemble on the training data can be analysed in terms of a Bernoulli process. Experiments on several UCI datasets demonstrate the improvements in classification accuracy that can be obtained using these class-switching ensembles.  相似文献   

13.
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature.While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC).Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases.RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems.  相似文献   

14.
Due to technological improvements, the number and volume of datasets are considerably increasing and bring about the need for additional memory and computational complexity. To work with massive datasets in an efficient way; feature selection, data reduction, rule based and exemplar based methods have been introduced. This study presents a method, which may be called joint generalized exemplar (JGE), for classification of massive datasets. This method aims to enhance the computational performance of NGE by working against nesting and overlapping of hyper-rectangles with reassessing the overlapping parts with the same procedure repeatedly and joining non-overlapped hyper-rectangle sections that falling within the same class. This provides an opportunity to have adaptive decision boundaries, and also employing batch data searching instead of incremental searching. Later, the classification was done in accordance with the distance between each particular query and generalized exemplars. The accuracy and time requirements for classification of synthetic datasets and a benchmark dataset obtained by JGE, NGE and other popular machine learning methods were compared and the achieved results by JGE found acceptable.  相似文献   

15.
To extract knowledge from a set of numerical data and build up a rule-based system is an important research topic in knowledge acquisition and expert systems. In recent years, many fuzzy systems that automatically generate fuzzy rules from numerical data have been proposed. In this paper, we propose a new fuzzy learning algorithm based on the alpha-cuts of equivalence relations and the alpha-cuts of fuzzy sets to construct the membership functions of the input variables and the output variables of fuzzy rules and to induce the fuzzy rules from the numerical training data set. Based on the proposed fuzzy learning algorithm, we also implemented a program on a Pentium PC using the MATLAB development tool to deal with the Iris data classification problem. The experimental results show that the proposed fuzzy learning algorithm has a higher average classification ratio and can generate fewer rules than the existing algorithm.  相似文献   

16.
In this paper, we present two learning mechanisms for artificial neural networks (ANN's) that can be applied to solve classification problems with binary outputs. These mechanisms are used to reduce the number of hidden units of an ANN when trained by the cascade-correlation learning algorithm (CAS). Since CAS adds hidden units incrementally as learning proceeds, it is difficult to predict the number of hidden units required when convergence is reached. Further, learning must be restarted when the number of hidden units is larger than expected. Our key idea in this paper is to provide alternatives in the learning process and to select the best alternative dynamically based on run-time information obtained. Mixed-mode learning (MM), our first algorithm, provides alternative output matrices so that learning is extended to find one of the many one-to-many mappings instead of finding a unique one-to-one mapping. Since the objective of learning is relaxed by this transformation, the number of learning epochs can be reduced. This in turn leads to a smaller number of hidden units required for convergence. Population-based learning for ANN's (PLAN), our second algorithm, maintains alternative network configurations to select at run time promising networks to train based on error information obtained and time remaining. This dynamic scheduling avoids training possibly unpromising ANNs to completion before exploring new ones. We show the performance of these two mechanisms by applying them to solve the two-spiral problem, a two-region classification problem, and the Pima Indian diabetes diagnosis problem.  相似文献   

17.
目的在多标签有监督学习框架中,构建具有较强泛化性能的分类器需要大量已标注训练样本,而实际应用中已标注样本少且获取代价十分昂贵。针对多标签图像分类中已标注样本数量不足和分类器再学习效率低的问题,提出一种结合主动学习的多标签图像在线分类算法。方法基于min-max理论,采用查询最具代表性和最具信息量的样本挑选策略主动地选择待标注样本,且基于KKT(Karush-Kuhn-Tucker)条件在线地更新多标签图像分类器。结果在4个公开的数据集上,采用4种多标签分类评价指标对本文算法进行评估。实验结果表明,本文采用的样本挑选方法比随机挑选样本方法和基于间隔的采样方法均占据明显优势;当分类器达到相同或相近的分类准确度时,利用本文的样本挑选策略选择的待标注样本数目要明显少于采用随机挑选样本方法和基于间隔的采样方法所需查询的样本数。结论本文算法一方面可以减少获取已标注样本所需的人工标注代价;另一方面也避免了传统的分类器重新训练时利用所有数据所产生的学习效率低下的问题,达到了当新数据到来时可实时更新分类器的目的。  相似文献   

18.
Evolutionary design of a fuzzy classifier from data   总被引:6,自引:0,他引:6  
Genetic algorithms show powerful capabilities for automatically designing fuzzy systems from data, but many proposed methods must be subjected to some minimal structure assumptions, such as rule base size. In this paper, we also address the design of fuzzy systems from data. A new evolutionary approach is proposed for deriving a compact fuzzy classification system directly from data without any a priori knowledge or assumptions on the distribution of the data. At the beginning of the algorithm, the fuzzy classifier is empty with no rules in the rule base and no membership functions assigned to fuzzy variables. Then, rules and membership functions are automatically created and optimized in an evolutionary process. To accomplish this, parameters of the variable input spread inference training (VISIT) algorithm are used to code fuzzy systems on the training data set. Therefore, we can derive each individual fuzzy system via the VISIT algorithm, and then search the best one via genetic operations. To evaluate the fuzzy classifier, a fuzzy expert system acts as the fitness function. This fuzzy expert system can effectively evaluate the accuracy and compactness at the same time. In the application section, we consider four benchmark classification problems: the iris data, wine data, Wisconsin breast cancer data, and Pima Indian diabetes data. Comparisons of our method with others in the literature show the effectiveness of the proposed method.  相似文献   

19.
Classification with imbalanced datasets supposes a new challenge for researches in the framework of machine learning. This problem appears when the number of patterns that represents one of the classes of the dataset (usually the concept of interest) is much lower than in the remaining classes. Thus, the learning model must be adapted to this situation, which is very common in real applications. In this paper, a dynamic over-sampling procedure is proposed for improving the classification of imbalanced datasets with more than two classes. This procedure is incorporated into a memetic algorithm (MA) that optimizes radial basis functions neural networks (RBFNNs). To handle class imbalance, the training data are resampled in two stages. In the first stage, an over-sampling procedure is applied to the minority class to balance in part the size of the classes. Then, the MA is run and the data are over-sampled in different generations of the evolution, generating new patterns of the minimum sensitivity class (the class with the worst accuracy for the best RBFNN of the population). The methodology proposed is tested using 13 imbalanced benchmark classification datasets from well-known machine learning problems and one complex problem of microbial growth. It is compared to other neural network methods specifically designed for handling imbalanced data. These methods include different over-sampling procedures in the preprocessing stage, a threshold-moving method where the output threshold is moved toward inexpensive classes and ensembles approaches combining the models obtained with these techniques. The results show that our proposal is able to improve the sensitivity in the generalization set and obtains both a high accuracy level and a good classification level for each class.  相似文献   

20.
目的 随着3D扫描技术和虚拟现实技术的发展,真实物体的3D识别方法已经成为研究的热点之一。针对现有基于深度学习的方法训练时间长,识别效果不理想等问题,提出了一种结合感知器残差网络和超限学习机(ELM)的3D物体识别方法。方法 以超限学习机的框架为基础,使用多层感知器残差网络学习3D物体的多视角投影特征,并利用提取的特征数据和已知的标签数据同时训练了ELM分类层、K最近邻(KNN)分类层和支持向量机(SVM)分类层识别3D物体。网络使用增加了多层感知器的卷积层替代传统的卷积层。卷积网络由改进的残差单元组成,包含多个卷积核个数恒定的并行残差通道,用于拟合不同数学形式的残差项函数。网络中半数卷积核参数和感知器参数以高斯分布随机产生,其余通过训练寻优得到。结果 提出的方法在普林斯顿3D模型数据集上达到了94.18%的准确率,在2D的NORB数据集上达到了97.46%的准确率。该算法在两个国际标准数据集中均取得了当前最好的效果。同时,使用超限学习机框架使得本文算法的训练时间比基于深度学习的方法减少了3个数量级。结论 本文提出了一种使用多视角图识别3D物体的方法,实验表明该方法比现有的ELM方法和深度学习等最新方法的识别率更高,抗干扰性更强,并且其调节参数少,收敛速度快。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号