首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this research, we propose two variants of the Firefly Algorithm (FA), namely inward intensified exploration FA (IIEFA) and compound intensified exploration FA (CIEFA), for undertaking the obstinate problems of initialization sensitivity and local optima traps of the K-means clustering model. To enhance the capability of both exploitation and exploration, matrix-based search parameters and dispersing mechanisms are incorporated into the two proposed FA models. We first replace the attractiveness coefficient with a randomized control matrix in the IIEFA model to release the FA from the constraints of biological law, as the exploitation capability in the neighbourhood is elevated from a one-dimensional to multi-dimensional search mechanism with enhanced diversity in search scopes, scales, and directions. Besides that, we employ a dispersing mechanism in the second CIEFA model to dispatch fireflies with high similarities to new positions out of the close neighbourhood to perform global exploration. This dispersing mechanism ensures sufficient variance between fireflies in comparison to increase search efficiency. The ALL-IDB2 database, a skin lesion data set, and a total of 15 UCI data sets are employed to evaluate efficiency of the proposed FA models on clustering tasks. The minimum Redundancy Maximum Relevance (mRMR)-based feature selection method is also adopted to reduce feature dimensionality. The empirical results indicate that the proposed FA models demonstrate statistically significant superiority in both distance and performance measures for clustering tasks in comparison with conventional K-means clustering, five classical search methods, and five advanced FA variants.  相似文献   

2.
Gender recognition has been playing a very important role in various applications such as human–computer interaction, surveillance, and security. Nonlinear support vector machines (SVMs) were investigated for the identification of gender using the Face Recognition Technology (FERET) image face database. It was shown that SVM classifiers outperform the traditional pattern classifiers (linear, quadratic, Fisher linear discriminant, and nearest neighbour). In this context, this paper aims to improve the SVM classification accuracy in the gender classification system and propose new models for a better performance. We have evaluated different SVM learning algorithms; the SVM‐radial basis function with a 5% outlier fraction outperformed other SVM classifiers. We have examined the effectiveness of different feature selection methods. AdaBoost performs better than the other feature selection methods in selecting the most discriminating features. We have proposed two classification methods that focus on training subsets of images among the training images. Method 1 combines the outcome of different classifiers based on different image subsets, whereas method 2 is based on clustering the training data and building a classifier for each cluster. Experimental results showed that both methods have increased the classification accuracy.  相似文献   

3.
In this paper we try to improve object transparent vision. Our goal is to see how object transparent vision can be improved by: 1. Automated fused image correctness check. 2. Finding correct images for image fusing. We also conducted several experiments related to object transparent vision in order to gain better understanding of subject. The introduced improvements in the paper can provide reasonable result for image fusion for most of the analyzed cases.  相似文献   

4.
Volatility is a key parameter when measuring the size of errors made in modelling returns and other financial variables such as exchanged rates. The autoregressive moving-average (ARMA) model is a linear process in time series; whilst in the nonlinear system, the generalised autoregressive conditional heteroskedasticity (GARCH) and Markov switching GARCH (MS-GARCH) have been widely applied. In statistical learning theory, support vector regression (SVR) plays an important role in predicting nonlinear and nonstationary time series variables. In this paper, we propose a new algorithm, differential Empirical Mode Decomposition (EMD) for improving prediction of exchange rates under support vector regression (SVR). The new algorithm of Differential EMD has the capability of smoothing and reducing the noise, whereas the SVR model with the filtered dataset improves predicting the exchange rates. Simulations results consisting of the Differential EMD and SVR model show that our model outperforms simulations by a state-of-the-art MS-GARCH and Markov switching regression (MSR) models.  相似文献   

5.
Similarity-based clustering is a simple but powerful technique which usually results in a clustering graph for a partitioning of threshold values in the unit interval. The guiding principle of similarity-based clustering is "similar objects are grouped in the same cluster." To judge whether two objects are similar, a similarity measure must be given in advance. The similarity measure presented in the paper is determined in terms of the weighted distance between the features of the objects. Thus, the clustering graph and its performance (which is described by several evaluation indices defined in the paper) will depend on the feature weights. The paper shows that, by using gradient descent technique to learn the feature weights, the clustering performance can be significantly improved. It is also shown that our method helps to reduce the uncertainty (fuzziness and nonspecificity) of the similarity matrix. This enhances the quality of the similarity-based decision making  相似文献   

6.
Accurate project-profit prediction is a crucial issue because it can provide an early feasibility estimate for the project. In order to achieve accurate project-profit prediction, this study developed a novel two-stage forecasting system. In stage one, the proposed forecasting system adopts fuzzy clustering technology, fuzzy c-means (FCM) and kernel fuzzy c-means (KFCM), for the correct grouping of different projects. In stage two, least-squares support vector regression (LSSVR) technology is employed for forecasting the project-profit in different project groups, respectively. Moreover, genetic algorithms (GA) were simultaneously used to select the parameters of the LSSVR. The project data come from a real enterprise in Taiwan. In this study, some forecasting methodologies are also compared, for instance Generalized Regression Neural Network (GRNN), Radial Basis Function Neural Networks (RBFNN), and Back Propagation Neural Network (BPNN), to predict project-profit in this real case. Empirical results indicate that the two-stage forecasting system (FCM+LSSVR and KFCM+LSSVR) has superior performance in terms of forecasting accuracy, compared to other methods. Furthermore, in observing the results of the two-stage forecasting system, it can be seen that FCM+LSSVR can achieve superior performance, and KFCM+LSSVR can achieve consistently good performance. Therefore, based on the empirical results, the two-stage forecasting system was verified to efficiently provide credible predictions for project-profit forecasting.  相似文献   

7.
Surrogate-assisted evolutionary optimization has proved to be effective in reducing optimization time, as surrogates, or meta-models can approximate expensive fitness functions in the optimization run. While this is a successful strategy to improve optimization efficiency, challenges arise when constructing surrogate models in higher dimensional function space, where the trade space between multiple conflicting objectives is increasingly complex. This complexity makes it difficult to ensure the accuracy of the surrogates. In this article, a new surrogate management strategy is presented to address this problem. A k-means clustering algorithm is employed to partition model data into local surrogate models. The variable fidelity optimization scheme proposed in the author's previous work is revised to incorporate this clustering algorithm for surrogate model construction. The applicability of the proposed algorithm is illustrated on six standard test problems. The presented algorithm is also examined in a three-objective stiffened panel optimization design problem to show its superiority in surrogate-assisted multi-objective optimization in higher dimensional objective function space. Performance metrics show that the proposed surrogate handling strategy clearly outperforms the single surrogate strategy as the surrogate size increases.  相似文献   

8.
In this article, an expert system based on support vector machines is developed to predict the sale performance of some insurance company candidates. The system predicts the performance of these candidates based on some scores, which are measurements of cognitive characteristics, personality, selling skills and biodata. An experiment is conducted to compare the accuracy of the proposed system with respect to previously reported systems which use discriminant functions or decision trees. Results show that the proposed system is able to improve the accuracy of a baseline linear discriminant based system by more than 10% and that also exceeds the state of the art systems by almost 5%. The proposed approach can help to reduce considerably the direct and indirect expenses of the companies.  相似文献   

9.
《Applied Soft Computing》2007,7(3):711-721
Multiscale wavelet-based data representation has been shown to be a powerful data analysis tool in various applications. In this paper, the advantages of multiscale representation are utilized to improve the prediction accuracy and parsimony of the auto-regressive with exogenous variable (ARX) model by developing a multiscale ARX (MSARX) modeling algorithm. The idea is to decompose the input–output data at multiple scales, construct an ARX model at each scale using the scaled signal approximations of the data, and then using cross validation, select among all MSARX models the one which best predicts the process response. The MSARX algorithm is shown to improve the parsimony of the estimated models, as ARX models with a fewer number of coefficients are needed at coarser scales. This advantage is attributed to the down-sampling used in multiscale representation. Another important advantage of the MSARX algorithm is that it inherently accounts for the presence of measurement noise through the application of low-pass filters in the multiscale decomposition of the data, which in turn improves the model robustness to measurement noise and enhances its prediction. These prediction and parsimony advantages of MSARX modeling are demonstrated through a simulated second order example, in which the MSARX algorithm outperformed the time-domain one at different noise contents, and the relative improvement of MSARX increased at higher levels of noise.  相似文献   

10.
The complexity of web information environments and multiple‐topic web pages are negative factors significantly affecting the performance of focused crawling. A highly relevant region in a web page may be obscured because of low overall relevance of that page. Segmenting the web pages into smaller units will significantly improve the performance. Conquering and traversing irrelevant page to reach a relevant one (tunneling) can improve the effectiveness of focused crawling by expanding its reach. This paper presents a heuristic‐based method to enhance focused crawling performance. The method uses a Document Object Model (DOM)‐based page partition algorithm to segment a web page into content blocks with a hierarchical structure and investigates how to take advantage of block‐level evidence to enhance focused crawling by tunneling. Page segmentation can transform an uninteresting multi‐topic web page into several single topic context blocks and some of which may be interesting. Accordingly, focused crawler can pursue the interesting content blocks to retrieve the relevant pages. Experimental results indicate that this approach outperforms Breadth‐First, Best‐First and Link‐context algorithm both in harvest rate, target recall and target length. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

11.

Today, social networks have created a wide variety of relationships between users. Friendships on Facebook and trust in the Epinions network are examples of these relationships. Most social media research has often focused on positive interpersonal relationships, such as friendships. However, in many real-world applications, there are also networks of negative relationships whose communication between users is either distrustful or hostile in nature. Such networks are called signed networks. In this work, sign prediction is made based on existing links between nodes. However, in real signed networks, links between nodes are usually sparse and sometimes absent. Therefore, existing methods are not appropriate to address the challenges of accurate sign prediction. To address the sparsity problem, this work aims to propose a method to predict the sign of positive and negative links based on clustering and collaborative filtering methods. Network clustering is done in such a way that the number of negative links between the clusters and the number of positive links within the clusters are as large as possible. As a result, the clusters are as close as possible to social balance. The main contribution of this work is using clustering and collaborative filtering methods, as well as proposing a new similarity criterion, to overcome the data sparseness problem and predict the unknown sign of links. Evaluations on the Epinions network have shown that the prediction accuracy of the proposed method has improved by 8% compared to previous studies.

  相似文献   

12.
基于路径聚类的页面访问次序的挖掘   总被引:1,自引:0,他引:1  
为了发现用户的行为模式以实现Web站点的结构优化,提出了基于用户访问路径的K-PathSearch算法.在对网页实施预处理后,结合页面链接参数,建立用户访问事务处理模型,形成有用数据集.提取样本分析用户的兴趣度,主要影响因素体现在访问次序、次数以及停留时间三方面,并利用重新定义的相似度将兴趣取向相类似的用户划分为一类;在此基础上,定义用户访问最长拟合路径,进而计算路径聚类中心.经计算,聚类数和聚类中心平均长度增比显著,表明模型和算法是可行和有效的.  相似文献   

13.
There exist several methods for binary classification of gene expression data sets. However, in the majority of published methods, little effort has been made to minimize classifier complexity. In view of the small number of samples available in most gene expression data sets, there is a strong motivation for minimizing the number of free parameters that must be fitted to the data. In this paper, a method is introduced for evolving (using an evolutionary algorithm) simple classifiers involving a minimal subset of the available genes. The classifiers obtained by this method perform well, reaching 97% correct classification of clinical outcome on training samples from the breast cancer data set published by van't Veer, and up to 89% correct classification on validation samples from the same data set, easily outperforming previously published results.  相似文献   

14.
Modern data center consists of thousands of servers, racks and switches. Complicated structure means it requires well-designed algorithms to utilize resources of data centers efficiently. Current virtual machine scheduling algorithms mainly focus on the initial allocation of virtual machines based on the CPU, memory and network bandwidth requirements. However, when tasks finished or lease expired, related virtual machines would be deleted from the system which would generate resource fragments. Such fragments lead to unbalanced resource utilization and decline of communication performance. This paper investigates the network influence on typical applications in data centers and proposed a self-adaptive network-aware virtual machine clustering and consolidation algorithm to maintain an optimal system-wide status. Our consolidation algorithm periodically checks whether consolidation is necessary and then clusters and consolidates virtual machines to lower communication cost with an online heuristic. We used two benchmarks in a real environment to examine network influence on different tasks. To evaluate the advantages of the proposed algorithm, we also built a cloud computing testbed. Real workload trace-driven simulations and testbed-based experiments showed that, our algorithm greatly shortened the average finish time of map-reduce tasks and reduced time delay of web applications. Simulation results showed that our algorithm considerably reduced the amount of high-delay jobs, lowered the average traffic passed through aggregate switches and improved the communication ability among virtual machines.  相似文献   

15.
Clustering network sites is a vital issue in parallel and distributed database systems DDBS. Grouping distributed database network sites into clusters is considered an efficient way to minimize the communication time required for query processing. However, clustering network sites is still an open research problem since its optimal solution is NP-complete. The main contribution in this field is to find a near optimal solution that groups distributed database network sites into disjoint clusters in order to minimize the communication time required for data allocation. Grouping a large number of network sites into a small number of clusters effectively increases the transaction response time, results in better data distribution, and improves the distributed database system performance. We present a novel algorithm for clustering distributed database network sites based on the communication time as database query processing is time dependent. Extensive experimental tests and simulations are conducted on this clustering algorithm. The experimental and simulation results show that a better network distribution is achieved with significant network servers load balance and network delay, a minor communication time between network sites is realized, and a higher distributed database system performance is recognized.  相似文献   

16.
Improving clustering by learning a bi-stochastic data similarity matrix   总被引:1,自引:1,他引:0  
An idealized clustering algorithm seeks to learn a cluster-adjacency matrix such that, if two data points belong to the same cluster, the corresponding entry would be 1; otherwise, the entry would be 0. This integer (1/0) constraint makes it difficult to find the optimal solution. We propose a relaxation on the cluster-adjacency matrix, by deriving a bi-stochastic matrix from a data similarity (e.g., kernel) matrix according to the Bregman divergence. Our general method is named the Bregmanian Bi-Stochastication (BBS) algorithm. We focus on two popular choices of the Bregman divergence: the Euclidean distance and the Kullback?CLeibler (KL) divergence. Interestingly, the BBS algorithm using the KL divergence is equivalent to the Sinkhorn?CKnopp (SK) algorithm for deriving bi-stochastic matrices. We show that the BBS algorithm using the Euclidean distance is closely related to the relaxed k-means clustering and can often produce noticeably superior clustering results to the SK algorithm (and other algorithms such as Normalized Cut), through extensive experiments on public data sets.  相似文献   

17.
The scanning n-tuple technique (as introduced by Lucas and Amiri, 1996) is studied in pattern recognition tasks, with emphasis placed on methods that improve its recognition performance. We remove potential edge effect problems and optimize the parameters of the scanning n-tuple method with respect to memory requirements, processing speed and recognition accuracy for a case study task. Next, we report an investigation of self-supervised algorithms designed to improve the performance of the scanning n-tuple method by focusing on the characteristics of the pattern space. The most promising algorithm is studied in detail to determine its performance improvement and the consequential increase in the memory requirements. Experimental results using both small-scale and real-world tasks indicate that this algorithm results in an improvement of the scanning n-tuple classification performance  相似文献   

18.
Clustering is one of the most important issues in data mining, image segmentation, VLSI design, parallel computing and many other areas. We consider the general problem of partitioning n points into k clusters by maximizing the affinity measure of the points into the clusters. This objective function, referred to as Ratio Association, generalizes the classical (Minimum) Sum‐of‐Squares clustering problem, where the affinity is measured as closeness in the Euclidean space. This generalized version has emerged in the context of the approximation of chemical conformations for molecules, and in explaining transportation phenomena in dynamical systems, especially in dynamical astronomy. In particular, we refer to the dynamical systems application in the paper. Although successful heuristics have been developed to approximately solve the problem, the conventional spectral bounds proposed in the literature are not tight enough for “large” instances to assert the quality of those heuristics or to allow solving the problem exactly. In this paper, we investigate how to tighten the spectral bounds by using Lagrangian relaxation and Subgradient optimization methods.  相似文献   

19.
Efficient scheduling of page access in index-based join processing   总被引:1,自引:0,他引:1  
The paper examines the issue of scheduling page accesses in join processing, and proposes new heuristics for the following scheduling problems: 1) an optimal page access sequence for a join such that there are no page reaccesses using the minimum number of buffer pages, and 2) an optimal page access sequence for a join such that the number of page reaccesses for a given number of buffer pages is minimum. The experimental performance results show that the new heuristics perform better than existing heuristics for the first problem and also perform better for the second problem, provided that the number of available buffer pages is not much less than the optimal buffer size  相似文献   

20.
The research presented in this article focuses on the development of a multi-objective optimization algorithm based on the differential evolution (DE) concept combined with Mamdani-type fuzzy logic controllers (FLCs) and $K$ -medoids clustering. The FLCs are used for adaptive control of the DE parameters; $K$ -medoids clustering enables the algorithm to perform a more guided search by evolving neighboring vectors, i.e., vectors that belong to the same cluster. A modified version of the $DE/best/1/bin$ algorithm is adopted as the core search component of the multi-objective optimizer. The FLCs utilize Pareto dominance and cluster-related information as input in order to adapt the algorithmic parameters dynamically. The proposed optimization algorithm is tested using a number of problems from the multi-objective optimization literature in order to investigate the effect of clustering and parameter adaptation on the algorithmic performance under various conditions, e.g., problems of high dimensionality, problems with non-convex Pareto fronts, and problems with discontinuous Pareto fronts. A detailed performance comparison between the proposed algorithm with state-of-the-art multi-objective optimizers is also presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号