首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Instance selection is becoming increasingly relevant due to the huge amount of data that is constantly being produced in many fields of research. Although current algorithms are useful for fairly large datasets, scaling problems are found when the number of instances is in the hundreds of thousands or millions. When we face huge problems, scalability becomes an issue, and most algorithms are not applicable.Thus, paradoxically, instance selection algorithms are for the most part impracticable for the same problems that would benefit most from their use. This paper presents a way of avoiding this difficulty using several rounds of instance selection on subsets of the original dataset. These rounds are combined using a voting scheme to allow good performance in terms of testing error and storage reduction, while the execution time of the process is significantly reduced. The method is particularly efficient when we use instance selection algorithms that are high in computational cost. The proposed approach shares the philosophy underlying the construction of ensembles of classifiers. In an ensemble, several weak learners are combined to form a strong classifier; in our method several weak (in the sense that they are applied to subsets of the data) instance selection algorithms are combined to produce a strong and fast instance selection method.An extensive comparison of 30 medium and large datasets from the UCI Machine Learning Repository using 3 different classifiers shows the usefulness of our method. Additionally, the method is applied to 5 huge datasets (from three hundred thousand to more than a million instances) with good results and fast execution time.  相似文献   

2.
Instance selection is becoming more and more relevant due to the huge amount of data that is being constantly produced. However, although current algorithms are useful for fairly large datasets, scaling problems are found when the number of instances is of hundreds of thousands or millions. In the best case, these algorithms are of efficiency O(n 2), n being the number of instances. When we face huge problems, scalability is an issue, and most algorithms are not applicable. This paper presents a divide-and-conquer recursive approach to the problem of instance selection for instance based learning for very large problems. Our method divides the original training set into small subsets where the instance selection algorithm is applied. Then the selected instances are rejoined in a new training set and the same procedure, partitioning and application of an instance selection algorithm, is repeated. In this way, our approach is based on the philosophy of divide-and-conquer applied in a recursive manner. The proposed method is able to match, and even improve, for the case of storage reduction, the results of well-known standard algorithms with a very significant reduction of execution time. An extensive comparison in 30 datasets form the UCI Machine Learning Repository shows the usefulness of our method. Additionally, the method is applied to 5 huge datasets with from 300,000 to more than a million instances, with very good results and fast execution time.  相似文献   

3.
Feature and instance selection are two effective data reduction processes which can be applied to classification tasks obtaining promising results. Although both processes are defined separately, it is possible to apply them simultaneously.This paper proposes an evolutionary model to perform feature and instance selection in nearest neighbor classification. It is based on cooperative coevolution, which has been applied to many computational problems with great success.The proposed approach is compared with a wide range of evolutionary feature and instance selection methods for classification. The results contrasted through non-parametric statistical tests show that our model outperforms previously proposed evolutionary approaches for performing data reduction processes in combination with the nearest neighbor rule.  相似文献   

4.
针对大数据样例选择问题,提出了一种基于随机森林(RF)和投票机制的大数据样例选择算法。首先,将大数据集划分成两个子集,要求第一个子集是大型的,第二个子集是中小型的。然后,将第一个大型子集划分成q个规模较小的子集,并将这些子集部署到q个云计算节点,并将第二个中小型子集广播到q个云计算节点。接下来,在各个节点用本地数据子集训练随机森林,并用随机森林从第二个中小型子集中选择样例,之后合并在各个节点选择的样例以得到这一次所选样例的子集。重复上述过程p次,得到p个样例子集。最后,用这p个子集进行投票,得到最终选择的样例子集。在Hadoop和Spark两种大数据平台上实现了提出的算法,比较了两种大数据平台的实现机制。此外,在6个大数据集上将所提算法与压缩最近邻(CNN)算法和约简最近邻(RNN)算法进行了比较,实验结果显示数据集的规模越大时,与这两个算法相比,提出的算法测试精度更高且时间消耗更短。证明了提出的算法在大数据处理上具有良好的泛化能力和较高的运行效率,可以有效地解决大数据的样例选择问题。  相似文献   

5.
With the advent of technology in various scientific fields, high dimensional data are becoming abundant. A general approach to tackle the resulting challenges is to reduce data dimensionality through feature selection. Traditional feature selection approaches concentrate on selecting relevant features and ignoring irrelevant or redundant ones. However, most of these approaches neglect feature interactions. On the other hand, some datasets have imbalanced classes, which may result in biases towards the majority class. The main goal of this paper is to propose a novel feature selection method based on the interaction information (II) to provide higher level interaction analysis and improve the search procedure in the feature space. In this regard, an evolutionary feature subset selection algorithm based on interaction information is proposed, which consists of three stages. At the first stage, candidate features and candidate feature pairs are identified using traditional feature weighting approaches such as symmetric uncertainty (SU) and bivariate interaction information. In the second phase, candidate feature subsets are formed and evaluated using multivariate interaction information. Finally, the best candidate feature subsets are selected using dominant/dominated relationships. The proposed algorithm is compared with some other feature selection algorithms including mRMR, WJMI, IWFS, IGFS, DCSF, IWFS, K_OFSD, WFLNS, Information Gain and ReliefF in terms of the number of selected features, classification accuracy, F-measure and algorithm stability using three different classifiers, namely KNN, NB, and CART. The results justify the improvement of classification accuracy and the robustness of the proposed method in comparison with the other approaches.  相似文献   

6.
孔莉芳  张虹 《控制与决策》2012,27(7):967-974
针对大量无关或冗余的特征通常会降低模式分类中分类器性能的问题,提出一种基于异步并行微粒群优化的特征子集选择方法(AP-PSO).该方法采用二进制微粒群优化搜索特征子集,利用异步并行方式提高算法的运算效率;为有效协调种群的全局探索和局部开发能力,充分利用混沌运动的遍历性和随机性,提出一种一致混沌变异算子.与已知4种特征子集选择方法进行比较,所得结果验证了该算法的有效性.  相似文献   

7.
Hyperspectral images are captured from hundreds of narrow and contiguous bands from the visible to infrared regions of electromagnetic spectrum. Each pixel of an image is represented by a vector where the components of the vector constitute the reflectance value of the surface for each of the bands. The length of the vector is equal to the number of bands. Due to the presence of large number of bands, classification of hyperspectral images becomes computation intensive. Moreover, higher correlation among neighboring bands increases the redundancy among them. As a result, feature selection becomes very essential for reducing the dimensionality. In the proposed work, an attempt has been made to develop a supervised feature selection technique guided by evolutionary algorithms. Self-adaptive differential evolution (SADE) is used for feature subset generation. Generated subsets are evaluated using a wrapper model where fuzzy k-nearest neighbor classifier is taken into consideration. Our proposed method also uses a feature ranking technique, ReliefF algorithm, for removing duplicate features. To demonstrate the effectiveness of the proposed method, investigation is carried out on three sets of data and the results are compared with four other evolutionary based state-of-the-art feature selection techniques. The proposed method shows promising results compared to others in terms of overall classification accuracy and Kappa coefficient.  相似文献   

8.
Swarm intelligence is a research field that models the collective behavior in swarms of insects or animals. Several algorithms arising from such models have been proposed to solve a wide range of complex optimization problems. In this paper, a novel swarm algorithm called the Social Spider Optimization (SSO) is proposed for solving optimization tasks. The SSO algorithm is based on the simulation of cooperative behavior of social-spiders. In the proposed algorithm, individuals emulate a group of spiders which interact to each other based on the biological laws of the cooperative colony. The algorithm considers two different search agents (spiders): males and females. Depending on gender, each individual is conducted by a set of different evolutionary operators which mimic different cooperative behaviors that are typically found in the colony. In order to illustrate the proficiency and robustness of the proposed approach, it is compared to other well-known evolutionary methods. The comparison examines several standard benchmark functions that are commonly considered within the literature of evolutionary algorithms. The outcome shows a high performance of the proposed method for searching a global optimum with several benchmark functions.  相似文献   

9.
随着数据的海量型增长,如何存储并利用数据成为目前学术研究和工业应用等方面的热门问题。样例选择是解决此类问题的方法之一,它在原始数据中依据既定规则选出代表性的样例,从而有效地降低后续工作的难度。基于此,提出一种基于哈希学习的投票样例选择算法。首先通过主成分分析(PCA)方法将高维数据映射到低维空间;然后利用k-means算法结合矢量量化方法进行迭代运算,并将数据用聚类中心的哈希码表示;接着将分类后的数据按比例进行随机选择,在多次独立运行算法后投票选择出最终的样例。与压缩近邻(CNN)算法和大数据线性复杂度样例选择算法LSH-IS-F相比,所提算法在压缩比方面平均提升了19%。所提算法思想简单容易实现,能够通过调节参数自主控制压缩比。在7个数据集上的实验结果显示所提算法在测试精度相似的情况下在压缩比和运行时间方面较随机哈希有较大优势。  相似文献   

10.
Traditional Chinese medicine (TCM) relies on the combined effects of herbs within prescribed formulae. However, given the combinatorial explosion due to the vast number of herbs available for treatment, the study of these combined effects can become computationally intractable. Thus feature selection has become increasingly crucial as a pre-processing step prior to the study of combined effects in TCM informatics. In accord with this goal, a new feature selection algorithm known as a co-evolving memetic wrapper (COW) is proposed in this paper. COW takes advantage of recent research in genetic algorithms (GAs) and memetic algorithms (MAs) by evolving appropriate feature subsets for a given domain. Our empirical experiments have demonstrated that COW is capable of selecting subsets of herbs from a TCM insomnia dataset that shows signs of combined effects on the prediction of patient outcomes measured in terms of classification accuracy. We compare the proposed algorithm with results from statistical analysis including main effects and up to three way interaction terms and show that COW is capable of correctly identifying the herbs and herb by herb effects that are significantly associated to patient outcome prediction.  相似文献   

11.
A new adaptive orthogonal search (AOS) algorithm is proposed for model subset selection and non-linear system identification. Model structure detection is a key step in any system identification problem. This consists of selecting significant model terms from a redundant dictionary of candidate model terms, and determining the model complexity (model length or model size). The final objective is to produce a parsimonious model that can well capture the inherent dynamics of the underlying system. In the new AOS algorithm, a modified generalized cross-validation criterion, called the adjustable prediction error sum of squares (APRESS), is introduced and incorporated into a forward orthogonal search procedure. The main advantage of the new AOS algorithm is that the mechanism is simple and the implementation is direct and easy, and more importantly it can produce efficient model subsets for most non-linear identification problems.  相似文献   

12.
One of the most accurate types of prototype selection algorithms, preprocessing techniques that select a subset of instances from the data before applying nearest neighbor classification to it, are evolutionary approaches. These algorithms result in very high accuracy and reduction rates, but unfortunately come at a substantial computational cost. In this paper, we introduce a framework that allows to efficiently use the intermediary results of the prototype selection algorithms to further increase their accuracy performance. Instead of only using the fittest prototype subset generated by the evolutionary algorithm, we use multiple prototype subsets in an ensemble setting. Secondly, in order to classify a test instance, we only use prototype subsets that accurately classify training instances in the neighborhood of that test instance. In an experimental evaluation, we apply our new framework to four state-of-the-art prototype selection algorithms and show that, by using our framework, more accurate results are obtained after less evaluations of the prototype selection method. We also present a case study with a prototype generation algorithm, showing that our framework is easily extended to other preprocessing paradigms as well.  相似文献   

13.
An algorithm is proposed for calculating correlation measures based on entropy. The proposed algorithm allows exhaustive exploration of variable subsets on real data. Its time efficiency is demonstrated by comparison against three other variable selection methods based on entropy using 8 data sets from various domains as well as simulated data. The method is applicable to discrete data with a limited number of values making it suitable for medical diagnostic support, DNA sequence analysis, psychometry and other domains.  相似文献   

14.
This paper presents a cooperative coevolutive approach for designing neural network ensembles. Cooperative coevolution is a recent paradigm in evolutionary computation that allows the effective modeling of cooperative environments. Although theoretically, a single neural network with a sufficient number of neurons in the hidden layer would suffice to solve any problem, in practice many real-world problems are too hard to construct the appropriate network that solve them. In such problems, neural network ensembles are a successful alternative. Nevertheless, the design of neural network ensembles is a complex task. In this paper, we propose a general framework for designing neural network ensembles by means of cooperative coevolution. The proposed model has two main objectives: first, the improvement of the combination of the trained individual networks; second, the cooperative evolution of such networks, encouraging collaboration among them, instead of a separate training of each network. In order to favor the cooperation of the networks, each network is evaluated throughout the evolutionary process using a multiobjective method. For each network, different objectives are defined, considering not only its performance in the given problem, but also its cooperation with the rest of the networks. In addition, a population of ensembles is evolved, improving the combination of networks and obtaining subsets of networks to form ensembles that perform better than the combination of all the evolved networks. The proposed model is applied to ten real-world classification problems of a very different nature from the UCI machine learning repository and proben1 benchmark set. In all of them the performance of the model is better than the performance of standard ensembles in terms of generalization error. Moreover, the size of the obtained ensembles is also smaller.  相似文献   

15.
As a new business model, mass customization (MC) intends to enable enterprises to comply with customer requirements at mass production efficiencies. A widely advocated approach to implement MC is platform product customization (PPC). In this approach, a product variant is derived from a given product platform to satisfy customer requirements. Adaptive PPC is such a PPC mode in which the given product platform has a modular architecture where customization is achieved by swapping standard modules and/or scaling modular components to formulate multiple product variants according to market segments and customer requirements. Adaptive PPC optimization includes structural configuration and parametric optimization. This paper presents a new method, namely, a cooperative coevolutionary algorithm (CCEA), to solve the two interrelated problems of structural configuration and parametric optimization in adaptive PPC. The performance of the proposed algorithm is compared with other methods through a set of computational experiments. The results show that CCEA outperforms the existing hierarchical evolutionary approaches, especially for large-scale problems tested in the experiments. From the experiments, it is also noticed that CCEA is slow to converge at the beginning of evolutionary process. This initial slow convergence property of the method improves its searching capability and ensures a high quality solution.  相似文献   

16.
In this paper, a new evolutionary algorithm, called immune clonal coevolutionary algorithm (ICCoA) for dynamic multiobjective optimization (DMO) is proposed. On the basis of the basic principles of artificial immune system, the proposed algorithm adopts the immune clonal selection to solve DMO problems. In addition, the theory of coevolution is incorporated in ICCoA in global operation to preserve the diversity of Pareto-fronts. Moreover, coevolutionary competitive and cooperative operation is designed to enhance the uniformity and the diversity of the solutions. In comparison with NSGA-II, immune clonal algorithm for DMO and direction-based method, the simulation results obtained on 5 difficult test problems and on related performance metrics suggest that ICCoA can achieve better distributed solutions and be very effective in maintaining the uniformity of Pareto-fronts.  相似文献   

17.
Instance reduction techniques can improve generalization, reduce storage requirements and execution time of instance-based learning algorithms. This paper presents an instance reduction algorithm called Adaptive Threshold-based Instance Selection Algorithm (ATISA). ATISA aims to preserve important instances based on a selection criterion that uses the distance of each instance to its nearest enemy as a threshold. This threshold defines the coverage area of each instance that is given by a hyper-sphere centered at it. The experimental results show the effectiveness, in terms of accuracy, reduction rate, and computational time, of the ATISA algorithm when compared with state-of-the-art reduction algorithms.  相似文献   

18.
改进型量子遗传算法求解机器人联盟问题   总被引:2,自引:0,他引:2       下载免费PDF全文
联盟是多机器人之间一种重要的合作方法,如何生成面向某个任务的最优联盟是一个复杂的组合优化问题。引入量子遗传算法来解决这一问题,在求解过程中引入“基于信息正反馈的岛屿模型”对量子遗传算法进行改进,并采用进化方程对量子门进行更新,使其不再易于陷入局部极值。仿真实验结果表明,该算法在解的质量和收敛速度上优于目前同类算法。  相似文献   

19.
In a recent project the authors have developed an approach to assist the identification of the optimal topology of a technical system, capable of overcoming geometrical contradictions that arise from conflicting design requirements. The method is based on the hybridization of partial solutions obtained from mono-objective topology optimization tasks. In order to investigate efficiency, effectiveness and potentialities of the developed hybridization algorithm, a comparison among the proposed approach and traditional topology optimization techniques such as Genetic Algorithms (GAs) and gradient-based methods is presented here. The benchmark has been performed by applying the hybridization algorithm to several case studies of multi-objective optimization problems available in literature. The obtained results demonstrate that the proposed approach is definitely less expensive in terms of computational requirements, than the conventional application of GAs to topology optimization tasks, still keeping the same effectiveness in terms of searching the global optimum solution. Moreover, the comparison among the hybridized solutions and the solutions obtained through GAs and gradient-based optimization methods, shows that the proposed algorithm often leads to very different topologies having better performances.  相似文献   

20.
综合基本微粒群优化算法(Particle Swarm Optimization,PSO)和模拟退火(Simulated Annealing,SA)算法,提出了一种新型的协同进化方法(SAPSO)。通过PSO和SA两种算法的协同搜索,可以有效地克服微粒群算法的早熟收敛。用SAPSO训练神经网络,并将其用于延迟焦化装置粗汽油干点和高压聚乙烯熔融指数的软测量建模。与几种常见建模方法比较,结果表明该软测量模型具有更高的测量精度和更好的泛化性能,能够满足现场测量要求。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号