首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于云计算的混合并行遗传算法求解最短路径   总被引:2,自引:0,他引:2  
为提高最短路径求解问题的效率,提出一种基于云计算的细粒度混合并行遗传算法求解最短路径的方法。方法采用云计算中H adoop的Map Reduce并行编程模型,提高编码效率,同时将细粒度并行遗传算法和禁忌搜索算法结合,提高了寻优算法的计算速度和局部寻优能力,进而提高最短路径的求解效率。仿真结果表明,该方法在计算速度和性能上优于经典遗传算法和并行遗传算法,是一种有效的最短路径求解方法。  相似文献   

2.
Cancer diagnosis is an important emerging clinical application of microarray data. Its accurate prediction to the type or size of tumors relies on adopting powerful and reliable classification models, so as to patients can be provided with better treatment or response to therapy. However, the high dimensionality of microarray data may bring some disadvantages, such as over-fitting, poor performance and low efficiency, to traditional classification models. Thus, one of the challenging tasks in cancer diagnosis is how to identify salient expression genes from thousands of genes in microarray data that can directly contribute to the phenotype or symptom of disease. In this paper, we propose a new ensemble gene selection method (EGS) to choose multiple gene subsets for classification purpose, where the significant degree of gene is measured by conditional mutual information or its normalized form. After different gene subsets have been obtained by setting different starting points of the search procedure, they will be used to train multiple base classifiers and then aggregated into a consensus classifier by the manner of majority voting. The proposed method is compared with five popular gene selection methods on six public microarray datasets and the comparison results show that our method works well.  相似文献   

3.
Microarray technologies enable quantitative simultaneous monitoring of expression levels for thousands of genes under various experimental conditions. This new technology has provided a new way of biological classification on a genome-wide scale. However, predictive accuracy is affected by the presence of thousands of genes many of which are unnecessary from the classification point of view. So, a key issue of microarray data classification is to identify the smallest possible set of genes that can achieve good predictive accuracy. In this study, we propose a novel Markov blanket-embedded genetic algorithm (MBEGA) for gene selection problem. In particular, the embedded Markov blanket-based memetic operators add or delete features (or genes) from a genetic algorithm (GA) solution so as to quickly improve the solution and fine-tune the search. Empirical results on synthetic and microarray benchmark datasets suggest that MBEGA is effective and efficient in eliminating irrelevant and redundant features based on both Markov blanket and predictive power in classifier model. A detailed comparative study with other methods from each of filter, wrapper, and standard GA shows that MBEGA gives a best compromise among all four evaluation criteria, i.e., classification accuracy, number of selected genes, computational cost, and robustness.  相似文献   

4.
Feature selection has always been a critical step in pattern recognition, in which evolutionary algorithms, such as the genetic algorithm (GA), are most commonly used. However, the individual encoding scheme used in various GAs would either pose a bias on the solution or require a pre-specified number of features, and hence may lead to less accurate results. In this paper, a tribe competition-based genetic algorithm (TCbGA) is proposed for feature selection in pattern classification. The population of individuals is divided into multiple tribes, and the initialization and evolutionary operations are modified to ensure that the number of selected features in each tribe follows a Gaussian distribution. Thus each tribe focuses on exploring a specific part of the solution space. Meanwhile, tribe competition is introduced to the evolution process, which allows the winning tribes, which produce better individuals, to enlarge their sizes, i.e. having more individuals to search their parts of the solution space. This algorithm, therefore, avoids the bias on solutions and requirement of a pre-specified number of features. We have evaluated our algorithm against several state-of-the-art feature selection approaches on 20 benchmark datasets. Our results suggest that the proposed TCbGA algorithm can identify the optimal feature subset more effectively and produce more accurate pattern classification.  相似文献   

5.
针对目前云计算市场如何选择合适的云服务商来组成动态联盟,以便更快更有效地满足终端客户的需求,实现云服务资源的优化配置.运用灰色关联综合评价模型确定云服务市场的优化指标,运用多目标优化模型定量分析和研究了云服务商的伙伴选择问题,选取在云计算市场提供计算服务、存储服务、软件服务的云服务商作为研究对象,提取成本、响应时间、服务质量作为研究优化指标;通过赋予相应的权重值,采用遗传算法对多目标规划化问题进行求解,寻找到符合各个云服务商利益的合作伙伴,最后通过算例证明该算法在解决最佳云服务商伙伴选择组合方面的合理性,验证了该模型及算法的有效性.  相似文献   

6.
Abstract: Cancer classification, through gene expression data analysis, has produced remarkable results, and has indicated that gene expression assays could significantly aid in the development of efficient cancer diagnosis and classification platforms. However, cancer classification, based on DNA array data, remains a difficult problem. The main challenge is the overwhelming number of genes relative to the number of training samples, which implies that there are a large number of irrelevant genes to be dealt with. Another challenge is from the presence of noise inherent in the data set. It makes accurate classification of data more difficult when the sample size is small. We apply genetic algorithms (GAs) with an initial solution provided by t statistics, called t‐GA, for selecting a group of relevant genes from cancer microarray data. The decision‐tree‐based cancer classifier is built on the basis of these selected genes. The performance of this approach is evaluated by comparing it to other gene selection methods using publicly available gene expression data sets. Experimental results indicate that t‐GA has the best performance among the different gene selection methods. The Z‐score figure also shows that some genes are consistently preferentially chosen by t‐GA in each data set.  相似文献   

7.
Microarray technology allows for the monitoring of thousands of gene expressions in various biological conditions, but most of these genes are irrelevant for classifying these conditions. Feature selection is consequently needed to help reduce the dimension of the variable space. Starting from the application of the stochastic meta-algorithm “Optimal Feature Weighting” (OFW) for selecting features in various classification problems, focus is made on the multiclass problem that wrapper methods rarely handle. From a computational point of view, one of the main difficulties comes from the unbalanced classes situation that is commonly encountered in microarray data. From a theoretical point of view, very few methods have been developed so far to minimize the classification error made on the minority classes. The OFW approach is developed to handle multiclass problems using CART and one-vs-one SVM classifiers. Comparisons are made with other multiclass selection algorithms such as Random Forests and the filter method F-test on five public microarray data sets with various complexities. Statistical relevancy of the gene selections is assessed by computing the performances and the stability of these different approaches and the results obtained show that the two proposed approaches are competitive and relevant to selecting genes classifying the minority classes.Application to a pig folliculogenesis study follows and a detailed interpretation of the genes that were selected shows that the OFW approach answers the biological question.  相似文献   

8.
This paper presents a parallel genetic algorithm (GA) called the cellular compact genetic algorithm (c-cGA) and its implementation for adaptive hardware. An adaptive hardware based on the c-cGA is proposed to automate real-time classification of ECG signals. The c-cGA not only provides a strong search capability while maintaining genetic diversity using multiple GAs but also has a cellular-like structure and is a straight-forward algorithm suitable for hardware implementation. The c-cGA hardware and an adaptive digital filter structure also perform an adaptive feature selection in real time. The c-cGA is applied to a block-based neural network (BbNN) for online learning in the hardware. Using an adaptive hardware approach based on the c-cGA, an adaptive hardware system for classifying ECG signals is feasible. The proposed adaptive hardware can be implemented in a field programmable gate array (FPGA) for an adaptive embedded system applied to personalised ECG signal classifications for long-term patient monitoring.  相似文献   

9.
Gene expression microarray is a rapidly maturing technology that provides the opportunity to assay the expression levels of thousands or tens of thousands of genes in a single experiment. We present a new heuristic to select relevant gene subsets in order to further use them for the classification task. Our method is based on the statistical significance of adding a gene from a ranked-list to the final subset. The efficiency and effectiveness of our technique is demonstrated through extensive comparisons with other representative heuristics. Our approach shows an excellent performance, not only at identifying relevant genes, but also with respect to the computational cost.  相似文献   

10.
Image annotation can be formulated as a classification problem. Recently, Adaboost learning with feature selection has been used for creating an accurate ensemble classifier. We propose dynamic Adaboost learning with feature selection based on parallel genetic algorithm for image annotation in MPEG-7 standard. In each iteration of Adaboost learning, genetic algorithm (GA) is used to dynamically generate and optimize a set of feature subsets on which the weak classifiers are constructed, so that an ensemble member is selected. We investigate two methods of GA feature selection: a binary-coded chromosome GA feature selection method used to perform optimal feature subset selection, and a bi-coded chromosome GA feature selection method used to perform optimal-weighted feature subset selection, i.e. simultaneously perform optimal feature subset selection and corresponding optimal weight subset selection. To improve the computational efficiency of our approach, master-slave GA, a parallel program of GA, is implemented. k-nearest neighbor classifier is used as the base classifier. The experiments are performed over 2000 classified Corel images to validate the performance of the approaches.  相似文献   

11.
A reliable and precise classification of tumors is essential for successful treatment of cancer. Gene selection is an important step for improved diagnostics. The modified SFFS (sequential forward floating selection) algorithm based on weighted Mahalanobis distance, called MSWM, is proposed to identify optimal informative gene subsets taking into account joint discriminatory power for accurate discrimination in this study. Firstly, we make use of the one-dimensional weighted Mahalanobis distance to perform a preliminary selection of genes and then make use of the modified SFFS method and multidimensional weighted Mahalanobis distance to obtain the optimal informative gene subset for tumor classification. Finally, we used the k nearest neighbor and naive Bayes methods to classify tumors based on the optimal gene subset selected using the MSWM method. To validate the efficiency, the proposed MSWM method is applied to classify two different DNA microarray datasets. Our empirical study shows that the MSWM method for tumor classification can obtain better effectiveness of classification than the BWR (the ratio of between-groups to within-groups sum of squares) and IVGA_I (independent variable group analysis I) methods. It suggests that the MSWM gene selection method is ability to obtain correct informative gene subsets taking into account genes’ joint discriminatory power for tumor classification.  相似文献   

12.
并行遗传算法下的农业业务外包伙伴选择   总被引:1,自引:0,他引:1       下载免费PDF全文
供应链管理环境下的业务外包模式为企业建立和提高自身的竞争优势提供了一种新的途径。其中,合作伙伴的选择是业务外包成功与否的关键因素。分析了农业产品加工业业务外包合作伙伴选择问题,给出了伙伴选择的多目标模型,提出了用一种自适应并行遗传算法来解决伙伴选择的问题,在增强和保持种群多样性的同时,表现出了较好的搜索性能。  相似文献   

13.
In this paper, a hybrid genetic approach is proposed to solve the problem of designing a subdatabase of the original one with the highest classification performances, the lowest number of features and the highest number of patterns. The method can simultaneously treat the double problem of editing instance patterns and selecting features as a single optimization problem, and therefore aims at providing a better level of information. The search is optimized by dividing the algorithm into self-controlled phases managed by a combination of pure genetic process and dedicated local approaches. Different heuristics such as an adapted chromosome structure and evolutionary memory are introduced to promote diversity and elitism in the genetic population. They particularly facilitate the resolution of real applications in the chemometric field presenting databases with large feature sizes and medium cardinalities. The study focuses on the double objective of enhancing the reliability of results while reducing the time consumed by combining genetic exploration and a local approach in such a way that excessive computational CPU costs are avoided. The usefulness of the method is demonstrated with artificial and real data and its performance is compared to other approaches.
Frederic RosEmail:
  相似文献   

14.
DNA microarray technology has emerged as a prospective tool for diagnosis of cancer and its classification. It provides better insights of many genetic mutations occurring within a cell associated with cancer. However, thousands of gene expressions measured for each biological sample using microarray pose a great challenge. Many statistical and machine learning methods have been applied to get most relevant genes prior to cancer classification. A two phase hybrid model for cancer classification is being proposed, integrating Correlation-based Feature Selection (CFS) with improved-Binary Particle Swarm Optimization (iBPSO). This model selects a low dimensional set of prognostic genes to classify biological samples of binary and multi class cancers using Naive–Bayes classifier with stratified 10-fold cross-validation. The proposed iBPSO also controls the problem of early convergence to the local optimum of traditional BPSO. The proposed model has been evaluated on 11 benchmark microarray datasets of different cancer types. Experimental results are compared with seven other well known methods, and our model exhibited better results in terms of classification accuracy and the number of selected genes in most cases. In particular, it achieved up to 100% classification accuracy for seven out of eleven datasets with a very small sized prognostic gene subset (up to <1.5%) for all eleven datasets.  相似文献   

15.
In this letter, neural networks (NNs) classify alcoholics and nonalcoholics using features extracted from visual evoked potential (VEP). A genetic algorithm (GA) is used to select the minimum number of channels that maximize classification performance. GA population fitness is evaluated using fuzzy ARTMAP (FA) NN, instead of the widely used multilayer perceptron (MLP). MLP, despite its effective classification, requires long training time (on the order of 10(3) times compared to FA). This causes it to be unsuitable to be used with GA, especially for on-line training. It is shown empirically that the optimal channel configuration selected by the proposed method is unbiased, i.e., it is optimal not only for FA but also for MLP classification. Therefore, it is proposed that for future experiments, these optimal channels could be considered for applications that involve classification of alcoholics.  相似文献   

16.

This work describes a method that combines a Bayesian feature selection approach with a clustering genetic algorithm to get classification rules in data-mining applications. A Bayesian network is generated from a data set and the Markov blanket of the class variable is applied to the feature subset selection task. The general rule extraction method is simple and consists of employing the clustering process in the examples of each class separately. In this way, clusters of similar examples are found for each class. These clusters can be viewed as subclasses and can, consequently, be modeled into logical rules. In this context, the problem of finding the optimal number of classification rules can be viewed as the problem of finding the best number of clusters. The Clustering Genetic Algorithm can find the best clustering in a data set, according to the Average Silhouette Width criterion, and it was applied to extract classification rules. The proposed methodology is illustrated by means of simulations in three data sets that are benchmarks for data-mining methods--Wisconsin Breast Cancer, Mushroom, and Congressional Voting Records. The rules extracted with all the attributes are compared to those extracted with the features belonging to the Markov blanket and the obtained results show that the proposed method is very promising.  相似文献   

17.
Li  Zhao  Lu  Wei  Sun  Zhanquan  Xing  Weiwei 《Neural computing & applications》2016,28(1):513-524

Text classification is a popular research topic in data mining. Many classification methods have been proposed. Feature selection is an important technique for text classification since it is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. In recent years, data have become increasingly larger in both the number of instances and the number of features in many applications. As a result, classical feature selection methods do not work well in processing large-scale dataset due to the expensive computational cost. To address this issue, in this paper, a parallel feature selection method based on MapReduce is proposed. Specifically, mutual information based on Renyi entropy is used to measure the relationship between feature variables and class variables. Maximum mutual information theory is then employed to choose the most informative combination of feature variables. We implemented the selection process based on MapReduce, which is efficient and scalable for large-scale problems. At last, a practical example well demonstrates the efficiency of the proposed method.

  相似文献   

18.
Attribute subset selection based on rough sets is a crucial preprocessing step in data mining and pattern recognition to reduce the modeling complexity. To cope with the new era of big data, new approaches need to be explored to address this problem effectively. In this paper, we review recent work related to attribute subset selection in decision-theoretic rough set models. We also introduce a scalable implementation of a parallel genetic algorithm in Hadoop MapReduce to approximate the minimum reduct which has the same discernibility power as the original attribute set in the decision table. Then, we focus on intrusion detection in computer networks and apply the proposed approach on four datasets with varying characteristics. The results show that the proposed model can be a powerful tool to boost the performance of identifying attributes in the minimum reduct in large-scale decision systems.  相似文献   

19.
An important approach for image classification is the clustering of pixels in the spectral domain. Fast detection of different land cover regions or clusters of arbitrarily varying shapes and sizes in satellite images presents a challenging task. In this article, an efficient scalable parallel clustering technique of multi-spectral remote sensing imagery using a recently developed point symmetry-based distance norm is proposed. The proposed distributed computing time efficient point symmetry based K-Means technique is able to correctly identify presence of overlapping clusters of any arbitrary shape and size, whether they are intra-symmetrical or inter-symmetrical in nature. A Kd-tree based approximate nearest neighbor searching technique is used as a speedup strategy for computing the point symmetry based distance. Superiority of this new parallel implementation with the novel two-phase speedup strategy over existing parallel K-Means clustering algorithm, is demonstrated both quantitatively and in computing time, on two SPOT and Indian Remote Sensing satellite images, as even K-Means algorithm fails to detect the symmetry in clusters. Different land cover regions, classified by the algorithms for both images, are also compared with the available ground truth information. The statistical analysis is also performed to establish its significance to classify both satellite images and numeric remote sensing data sets, described in terms of feature vectors.  相似文献   

20.
A fuzzy self-tuning parallel genetic algorithm for optimization   总被引:1,自引:0,他引:1  
The genetic algorithm (GA) is now a very popular tool for solving optimization problems. Each operator has its special approach route to a solution. For example, a GA using crossover as its major operator arrives at solutions depending on its initial conditions. In other words, a GA with multiple operators should be more robust in global search. However, a multiple operator GA needs a large population size thus taking a huge time for evaluation. We therefore apply fuzzy reasoning to give effective operators more opportunity to search while keeping the overall population size constant. We propose a fuzzy self-tuning parallel genetic algorithm (FPGA) for optimization problems. In our test case FPGA there are four operators—crossover, mutation, sub-exchange, and sub-copy. These operators are modified using the eugenic concept under the assumption that the individuals with higher fitness values have a higher probability of breeding new better individuals. All operators are executed in each generation through parallel processing, but the populations of these operators are decided by fuzzy reasoning. The fuzzy reasoning senses the contributions of these operators, and then decides their population sizes. The contribution of each operator is defined as an accumulative increment of fitness value due to each operator's success in searching. We make the assumption that the operators that give higher contribution are more suitable for the typical optimization problem. The fuzzy reasoning is built under this concept and adjusts the population sizes in each generation. As a test case, a FPGA is applied to the optimization of the fuzzy rule set for a model reference adaptive control system. The simulation results show that the FPGA is better at finding optimal solutions than a traditional GA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号