首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 28 毫秒
1.
Protein function prediction is an important problem in functional genomics. Typically, protein sequences are represented by feature vectors. A major problem of protein datasets that increase the complexity of classification models is their large number of features. Feature selection (FS) techniques are used to deal with this high dimensional space of features. In this paper, we propose a novel feature selection algorithm that combines genetic algorithms (GA) and ant colony optimization (ACO) for faster and better search capability. The hybrid algorithm makes use of advantages of both ACO and GA methods. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of two prominent population-based algorithms, ACO and genetic algorithms. Experimentation is carried out using two challenging biological datasets, involving the hierarchical functional classification of GPCRs and enzymes. The criteria used for comparison are maximizing predictive accuracy, and finding the smallest subset of features. The results of experiments indicate the superiority of proposed algorithm.  相似文献   

2.
Feature selection of very high-resolution (VHR) images is a key prerequisite for supervised classification. However, it is always difficult to acquire the features which have the highest correlation to the type of land cover for improving classification accuracy. To address this problem, this paper proposed a methodology of feature selection using the results of multiple segmentation via genetic algorithm (GA) and correlation feature selection (CFS) integrating sparse auto-encoder (SAE). Firstly, 61 features, including spectral features and spatial features, are extracted from the results of multi-scale segmentation over a WorldView-2 image in Xicheng District, Beijing. Then, 40-dimensional features and 30-dimensional features are derived from the selection with GA+CFS and the optimization with SAE, respectively. Thirdly, the final classification is achieved by logistic regression (LR) based on different subsets of features extracted from the WorldView-2 image. It is found that the result of feature selection could contribute to increase in the intra-species separation and reduction in the inner-species variability. Adding extra lower-ranked features appeared to reduce the accuracy of classification. The results indicate that the overall classification accuracy with 30-dimensional features reached 87.56%, and increased 5.61% compared to the results with 61-dimensional features. For the two kinds of optimized features, the Z-test values are all greater than 1.96, which implied that feature dimensionality reduction and feature space optimization could significantly improve the accuracy of image land cover classification. The texture features in the wavelet domain are the most important features for the study area in the WorldView-2 image classification. Adding wavelet and the grey-level co-occurrence matrix (GLCM) information, especially for GLCM features in wavelet, appeared not to improve classification accuracy. The SAE-based method can produce feature subsets for improving mapping accuracy more efficiently.  相似文献   

3.
针对高光谱影像波段数目多,易造成维数灾难的问题,结合遗传算法提供的初始启发信息和蚁群算法寻优能力的优势,提出一种基于改进二进制蚁群算法的波段选择方法。该方法通过遗传算法寻优获取几组较优解,经过计算后作为二进制蚁群算法的初始启发式信息,利用二进制蚁群算法的全局搜索获取最优解;另一方面,为充分利用影像的光谱与空间信息,将波段组合的光谱特征与改进二进制蚁群算法选择的纹理特征融合进行分类,可以获得更高的分类精度。实验结果表明,改进二进制蚁群算法与遗传算法、蚁群算法、二进制蚁群算法相比全局搜索能力更强,且该方法分类精度达到95.63%。  相似文献   

4.
Self-care problems classification is one of the important challenges for occupational therapists. Extent and variety of disorders make the self-care problems classification process complex and time-consuming. To overcome this challenge, an expert model is proposed innovatively in this research. The proposed model is based on Probabilistic Neural Network (PNN) and Genetic Algorithm (GA) for classifying self-care problems of children with physical and motor disability. In this model, PNN is employed as a classifier and GA is applied for feature selection. The PNN is trained by using a standard ICF-CY dataset. Based on ICF-CY, occupational therapists must evaluate many features to diagnose self-care problems. According to the experiences of occupational therapists, these features have different effects on classification. Hence, GA is employed to select relevant and important features in self-care problems classification. Since the classification rules are important for occupational therapists, the self-care problems classification rules are extracted additionally by using the CART algorithm. The experimental results show that by using the feature selection algorithm, the accuracy and time complexity of classification are improved in comparison to other models. The proposed model can classify self-care problems of children with 94.28% accuracy by using only 16.5% of all features.  相似文献   

5.
Feature selection and feature weighting are useful techniques for improving the classification accuracy of K-nearest-neighbor (K-NN) rule. The term feature selection refers to algorithms that select the best subset of the input feature set. In feature weighting, each feature is multiplied by a weight value proportional to the ability of the feature to distinguish pattern classes. In this paper, a novel hybrid approach is proposed for simultaneous feature selection and feature weighting of K-NN rule based on Tabu Search (TS) heuristic. The proposed TS heuristic in combination with K-NN classifier is compared with several classifiers on various available data sets. The results have indicated a significant improvement in the performance in classification accuracy. The proposed TS heuristic is also compared with various feature selection algorithms. Experiments performed revealed that the proposed hybrid TS heuristic is superior to both simple TS and sequential search algorithms. We also present results for the classification of prostate cancer using multispectral images, an important problem in biomedicine.  相似文献   

6.
We propose a systematic ECG quality classification method based on a kernel support vector machine(KSVM) and genetic algorithm(GA) to determine whether ECGs collected via mobile phone are acceptable or not. This method includes mainly three modules, i.e., lead-fall detection, feature extraction, and intelligent classification. First, lead-fall detection is executed to make the initial classification. Then the power spectrum, baseline drifts, amplitude difference, and other time-domain features for ECGs are analyzed and quantified to form the feature matrix. Finally, the feature matrix is assessed using KSVM and GA to determine the ECG quality classification results. A Gaussian radial basis function(GRBF) is employed as the kernel function of KSVM and its performance is compared with that of the Mexican hat wavelet function(MHWF). GA is used to determine the optimal parameters of the KSVM classifier and its performance is compared with that of the grid search(GS) method. The performance of the proposed method was tested on a database from PhysioNet/Computing in Cardiology Challenge 2011, which includes 1500 12-lead ECG recordings. True positive(TP), false positive(FP), and classification accuracy were used as the assessment indices. For training database set A(1000 recordings), the optimal results were obtained using the combination of lead-fall, GA, and GRBF methods, and the corresponding results were: TP 92.89%, FP 5.68%, and classification accuracy 94.00%. For test database set B(500 recordings), the optimal results were also obtained using the combination of lead-fall, GA, and GRBF methods, and the classification accuracy was 91.80%.  相似文献   

7.
针对短时傅里叶变换与小波变换对心电图(Electrocardiogram,ECG)信号特征提取不足以及心律失常识别困难的问题,提出了一种基于S变换特征选择的心律失常分类算法。首先对ECG信号进行S变换,并从幅值和相位两个角度提取ECG信号的时频特征,与形态特征和RR间隔组成原始特征向量。然后将遗传算法与支持向量机(Support vector machine,SVM)结合组成Wrapper式特征选择方法,并在其中融入ReliefF算法,即采用ReliefF算法计算特征权重,并根据特征权重大小来指导遗传算法种群初始化,遗传算法以SVM的分类性能作为适应度函数来搜索特征子集。最后使用"一对多"(One against all,OAA)SVM对MIT-BIH心律失常数据库8种类型心拍进行分类。实验结果表明,该算法达到了较好的分类效果,灵敏度、特异性和准确率分别为96.14%,99.75%和99.81%。  相似文献   

8.
特征选择方法主要包括过滤方法和绕封方法。为了利用过滤方法计算简单和绕封方法精度高的优点,提出一种组合过滤和绕封方法的特征选择新方法。该方法首先利用基于互信息准则的过滤方法得到满足一定精度要求的子集后,再采用绕封方法找到最后的优化特征子集。由于遗传算法在组合优化问题上的成功应用,对特征子集寻优采用了遗传算法。在数值仿真和轴承故障特征选择中,采用新方法在保证诊断精度的同时,可以节省大量选择时间。组合特征选择方法有较好的寻优特征子集的能力,能够节省选择时间,具有高效、高精度的双重优点。  相似文献   

9.
为提高Android恶意软件检测准确率,提出一种基于特征贡献度的特征选择算法。针对现有Android应用数据集特征的分布特点,通过计算特征的类内以及类间贡献度,设定阈值筛选出贡献度高的特征数据,用于恶意应用检测分类。实验结果表明,所提算法能有效且可靠地检测恶意应用,其准确率和召回率十分接近,适用于恶意应用检测;与传统特征选择算法相比,该算法可以在较少特征数量的情况下达到理想的检测效果。  相似文献   

10.
In text classification based on a vector space model, the high dimension of the feature may pose some problems. These problems occur not only for computational reasons, but also because of overfitting. Feature selection is an important preprocessing step used for text classification applications to reduce the vector space size, control the computational time, and maintain or improve performance. In this study, we used an embedded approach in feature selection in which the Chi-square (CHI) feature selector is a filter step. In this step, the less discriminative features are discarded. In the wrapper step, a novel algorithm is proposed based on the combination of the fast global search ability of the genetic algorithm (GA) and the positive feedback mechanism of ant colony optimization (ACO). In order to validate our approach, we carried out a series of experiments on Reuters-21578 corpus, and we compare the achieved results with some other well-known techniques. The evaluation results are such that our method obtained a better performance compared with the other methods in the majority of cases.  相似文献   

11.
Rough set theory is one of the effective methods to feature selection, which can preserve the meaning of the features. The essence of rough set approach to feature selection is to find a subset of the original features. Since finding a minimal subset of the features is a NP-hard problem, it is necessary to investigate effective and efficient heuristic algorithms. Ant colony optimization (ACO) has been successfully applied to many difficult combinatorial problems like quadratic assignment, traveling salesman, scheduling, etc. It is particularly attractive for feature selection since there is no heuristic information that can guide search to the optimal minimal subset every time. However, ants can discover the best feature combinations as they traverse the graph. In this paper, we propose a new rough set approach to feature selection based on ACO, which adopts mutual information based feature significance as heuristic information. A novel feature selection algorithm is also given. Jensen and Shen proposed a ACO-based feature selection approach which starts from a random feature. Our approach starts from the feature core, which changes the complete graph to a smaller one. To verify the efficiency of our algorithm, experiments are carried out on some standard UCI datasets. The results demonstrate that our algorithm can provide efficient solution to find a minimal subset of the features.  相似文献   

12.
入侵检测实质上是一个分类的问题,对于提高分类精度是十分重要的.支持向量机(SVM)是一个功能强人的用于解决分类问题的工具.基于支持向量机的入侵检测精度较高,但如何获得更高的精度是一个新的问题.本文利用基于支持向量机和遗传算法(GA)的入侵检测来解决这些问题.我们首先利用遗传算法进行特征选择及优化,然后使用支持向量机模型...  相似文献   

13.
张进  丁胜  李波 《计算机应用》2016,36(5):1330-1335
针对支持向量机(SVM)中特征选择和参数优化对分类精度有较大影响,提出了一种改进的基于粒子群优化(PSO)的SVM特征选择和参数联合优化算法(GPSO-SVM),使算法在提高分类精度的同时选取尽可能少的特征数目。为了解决传统粒子群算法在进行优化时易出现陷入局部最优和早熟的问题,该算法在PSO中引入遗传算法(GA)中的交叉变异算子,使粒子在每次迭代更新后进行交叉变异操作来避免这一问题。该算法通过粒子之间的不相关性指数来决定粒子之间的交叉配对,由粒子适应度值的大小决定其变异概率的大小,由此产生新的粒子进入到群体中。这样使得粒子跳出当前搜索到的局部最优位置,提高了群体的多样性,在全局范围内寻找更优值。在不同数据集上进行实验,与基于PSO和GA的特征选择和SVM参数联合优化算法相比,GPSO-SVM的分类精度平均提高了2%~3%,选择的特征数目减少了3%~15%。实验结果表明,所提算法的特征选择和参数优化效果更好。  相似文献   

14.
针对利用单一特征集对肠癌病理图像的识别率难以提高这一情况,提出了一个基于HOG-GLRLM特征肠癌病理图片分类方法。考虑到图像中丰富的纹理和边缘信息,分别利用改进型的灰度行程矩阵和梯度方向直方图提取特征。并采用最小冗余最大关联的方法对各自和合并特征集进行特征选择。实验结果表明该方法的有效性。  相似文献   

15.
高维数据集包含了成千上万可用于数据分析和预测的特征,然而这些数据集存在许多不相关或冗余特征,影响了数据分析和预测的准确性。现有分类技术难以准确地识别最佳特征子集。针对该问题,提出了一种基于wrapper模式的特征选择方法AB-CRO,该方法结合了人工蜂群算法(ABC)和改进的化学反应算法(CRO)的优点进行特征选择。针对迭代过程中较优的个体可能在化学反应过程中被消耗掉的现象,适当地加入精英策略来保持种群的优良性。实验结果表明,AB-CRO算法在最佳特征子集的识别和分类精度方面相对于基准算法ABC,CRO以及基于GA,PSO和混合蛙跳算法都所有改进。  相似文献   

16.

Features subset selection (FSS) generally plays an essential role in the implementation of data mining, particularly in the field of high-dimensional medical data analysis, as well as in supplying early detection with essential features and high accuracy. The latest modern feature selection models are now using the ability of optimization algorithms for extracting features of particular properties to get the highest accuracy performance possible. Many of the optimization algorithms, such as genetic algorithm, often use the required parameters that would need to be adjusted for better results. For the function selection procedure, tuning these parameter values is a difficult challenge. In this paper, a new wrapper-based feature selection approach called binary teaching learning based optimization (BTLBO) is introduced. The binary teaching learning based optimization (BTLBO) is among the most sophisticated meta-heuristic method which does not involve any specific algorithm parameters. It requires only standard process parameters such as population size and a number of iterations to extract a set of features selected from a data. This is a demanding process, to achieve the best possible set of features would be to use a method which is independent of the method controlling parameters. This paper introduces a new modified binary teaching–learning-based optimization (NMBTLBO) as a technique to select subset features and demonstrate support vector machine (SVM) accuracy of binary identification as a fitness function for the implementation of the feature subset selection process. The new proposed algorithm NMBTLBO contains two steps: first, the new updating procedure, second, the new method to select the primary teacher in teacher phase in binary teaching-learning based on optimization algorithm. The proposed technique NMBTLBO was used to classify the rheumatic disease datasets collected from Baghdad Teaching Hospital Outpatient Rheumatology Clinic during 2016–2018. Compared with the original BTLBO algorithm, the improved NMBTLBO algorithm has achieved a major difference in accuracy. Validation was carried out by testing the accuracy of four classification methods: K-nearest neighbors, decision trees, support vector machines and K-means. Study results showed that the classification accuracy of the four methods was increased for the proposed method of selection of features (NMBTLBO) compared to the BTLBO algorithm. SVM classifier provided 89% accuracy of BTLBO-SVM and 95% with NMBTLBO –SVM. Decision trees set the values of 94% with BTLBO-SVM and 95% with the feature selection of NMBTLBO-SVM. The analysis indicates that the latest method (NMBTLBO) enhances classification accuracy.

  相似文献   

17.
Machine learning-based classification techniques provide support for the decision-making process in many areas of health care, including diagnosis, prognosis, screening, etc. Feature selection (FS) is expected to improve classification performance, particularly in situations characterized by the high data dimensionality problem caused by relatively few training examples compared to a large number of measured features. In this paper, a random forest classifier (RFC) approach is proposed to diagnose lymph diseases. Focusing on feature selection, the first stage of the proposed system aims at constructing diverse feature selection algorithms such as genetic algorithm (GA), Principal Component Analysis (PCA), Relief-F, Fisher, Sequential Forward Floating Search (SFFS) and the Sequential Backward Floating Search (SBFS) for reducing the dimension of lymph diseases dataset. Switching from feature selection to model construction, in the second stage, the obtained feature subsets are fed into the RFC for efficient classification. It was observed that GA-RFC achieved the highest classification accuracy of 92.2%. The dimension of input feature space is reduced from eighteen to six features by using GA.  相似文献   

18.
The degree of malignancy in brain glioma is assessed based on magnetic resonance imaging (MRI) findings and clinical data before operation. These data contain irrelevant features, while uncertainties and missing values also exist. Rough set theory can deal with vagueness and uncertainty in data analysis, and can efficiently remove redundant information. In this paper, a rough set method is applied to predict the degree of malignancy. As feature selection can improve the classification accuracy effectively, rough set feature selection algorithms are employed to select features. The selected feature subsets are used to generate decision rules for the classification task. A rough set attribute reduction algorithm that employs a search method based on particle swarm optimization (PSO) is proposed in this paper and compared with other rough set reduction algorithms. Experimental results show that reducts found by the proposed algorithm are more efficient and can generate decision rules with better classification performance. The rough set rule-based method can achieve higher classification accuracy than other intelligent analysis methods such as neural networks, decision trees and a fuzzy rule extraction algorithm based on Fuzzy Min-Max Neural Networks (FRE-FMMNN). Moreover, the decision rules induced by rough set rule induction algorithm can reveal regular and interpretable patterns of the relations between glioma MRI features and the degree of malignancy, which are helpful for medical experts.  相似文献   

19.
分类问题普遍存在于现代工业生产中。在进行分类任务之前,利用特征选择筛选有用的信息,能够有效地提高分类效率和分类精度。最小冗余最大相关算法(mRMR)考虑最大化特征与类别的相关性和最小化特征之间的冗余性,能够有效地选择特征子集;但该算法存在中后期特征重要度偏差大以及无法直接给出特征子集的问题。针对该问题,文中提出了结合邻域粗糙集差别矩阵和mRMR原理的特征选择算法。根据最大相关性和最小冗余性原则,利用邻域熵和邻域互信息定义了特征的重要度,以更好地处理混合数据类型。基于差别矩阵定义了动态差别集,利用差别集的动态演化有效去除冗余属性,缩小搜索范围,优化特征子集,并根据差别矩阵判定迭代截止条件。实验选取SVM,J48,KNN和MLP作为分类器来评价该特征选择算法的性能。在公共数据集上的实验结果表明,与已有算法相比,所提算法的平均分类精度提升了2%左右,同时在特征较多的数据集上能够有效地缩短特征选择时间。所提算法继承了差别矩阵和mRMR的优点,能够有效地处理特征选择问题。  相似文献   

20.
李蒙蒙  秦伟  刘艺  刁兴春 《计算机应用》2021,41(8):2412-2417
特征选择能够有效提升数据分类的性能。为了进一步提升蚁群优化(ACO)在特征选择上的求解能力,提出一种结合头脑风暴优化的混合蚁群优化(ABO)算法。该算法利用信息交流档案维护历史较好解,并通过基于松弛因子的时间最久优先方法动态更新档案。当ACO的全局最优解多次未更新时,采用基于Fuch混沌映射方法的路径-想法转换算子将档案中的路径解转换为想法解,并将其作为初始种群,通过头脑风暴优化(BSO)在更广阔的空间中搜索较好解。对所提算法在6组典型的二分类数据集上进行实验,分析了其参数敏感性,并与混合萤火虫粒子群优化(HFPSO)算法、粒子群优化与引力搜索算法(PSOGSA)以及遗传算法(GA) 这三种典型的演化算法进行对比。实验结果表明,相较于对比算法,所提算法在分类正确率上至少可提高2.88%~5.35%,在F1指标上至少可提高0.02~0.05,验证了所提算法的有效性和优越性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号