首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
This work presents a self-constructing fuzzy cerebellar model articulation controller (SC-FCMAC) model for various applications. A self-constructing learning algorithm, which consists of the self-clustering method (SCM) and the back-propagation algorithm, is presented. The proposed SCM scheme is a rapid, one-pass algorithm which dynamically estimates the number of hypercube cells in input data space. The clustering method does not require prior knowledge, such as the number of clusters in a data set. The back-propagation algorithm is applied to tune the adjustable parameters. Simulation results are obtained to show the performance and applicability of the proposed model.  相似文献   

2.
Graph is a powerful representation formalism that has been widely employed in machine learning and data mining. In this paper, we present a graph-based classification method, consisting of the construction of a special graph referred to as K-associated graph, which is capable of representing similarity relationships among data cases and proportion of classes overlapping. The main properties of the K-associated graphs as well as the classification algorithm are described. Experimental evaluation indicates that the proposed technique captures topological structure of the training data and leads to good results on classification task particularly for noisy data. In comparison to other well-known classification techniques, the proposed approach shows the following interesting features: (1) A new measure, called purity, is introduced not only to characterize the degree of overlap among classes in the input data set, but also to construct the K-associated optimal graph for classification; (2) nonlinear classification with automatic local adaptation according to the input data. Contrasting to K-nearest neighbor classifier, which uses a fixed K, the proposed algorithm is able to automatically consider different values of K, in order to best fit the corresponding overlap of classes in different data subspaces, revealing both the local and global structure of input data. (3) The proposed classification algorithm is nonparametric, implicating high efficiency and no need for model selection in practical applications.  相似文献   

3.
The problem of k-nearest neighbors (kNN) is to find the nearest k neighbors for a query point from a given data set. Among available methods, the principal axis search tree (PAT) algorithm always has good performance on finding nearest k neighbors using the PAT structure and a node elimination criterion. In this paper, a novel kNN search algorithm is proposed. The proposed algorithm stores projection values for all data points in leaf nodes. If a leaf node in the PAT cannot be rejected by the node elimination criterion, data points in the leaf node are further checked using their pre-stored projection values to reject more impossible data points. Experimental results show that the proposed method can effectively reduce the number of distance calculations and computation time for the PAT algorithm, especially for the data set with a large dimension or for a search tree with large number of data points in a leaf node.  相似文献   

4.
In this paper, we propose a novel supervised dimension reduction algorithm based on K-nearest neighbor (KNN) classifier. The proposed algorithm reduces the dimension of data in order to improve the accuracy of the KNN classification. This heuristic algorithm proposes independent dimensions which decrease Euclidean distance of a sample data and its K-nearest within-class neighbors and increase Euclidean distance of that sample and its M-nearest between-class neighbors. This algorithm is a linear dimension reduction algorithm which produces a mapping matrix for projecting data into low dimension. The dimension reduction step is followed by a KNN classifier. Therefore, it is applicable for high-dimensional multiclass classification. Experiments with artificial data such as Helix and Twin-peaks show ability of the algorithm for data visualization. This algorithm is compared with state-of-the-art algorithms in classification of eight different multiclass data sets from UCI collection. Simulation results have shown that the proposed algorithm outperforms the existing algorithms. Visual place classification is an important problem for intelligent mobile robots which not only deals with high-dimensional data but also has to solve a multiclass classification problem. A proper dimension reduction method is usually needed to decrease computation and memory complexity of algorithms in large environments. Therefore, our method is very well suited for this problem. We extract color histogram of omnidirectional camera images as primary features, reduce the features into a low-dimensional space and apply a KNN classifier. Results of experiments on five real data sets showed superiority of the proposed algorithm against others.  相似文献   

5.
This paper presents a novel framework for tracking thousands of vehicles in high resolution, low frame rate, multiple camera aerial videos. The proposed algorithm avoids the pitfalls of global minimization of data association costs and instead maintains multiple object-centric associations for each track. Representation of object state in terms of many to many data associations per track is proposed and multiple novel constraints are introduced to make the association problem tractable while allowing sharing of detections among tracks. Weighted hypothetical measurements are introduced to better handle occlusions, mis-detections and split or merged detections. A two-frame differencing method is presented which performs simultaneous moving object detection in both. Two novel contextual constraints of vehicle following model, and discouragement of track intersection and merging are also proposed. Extensive experiments on challenging, ground truthed data sets are performed to show the feasibility and superiority of the proposed approach. Results of quantitative comparison with existing approaches are presented, and the efficacy of newly introduced constraints is experimentally established. The proposed algorithm performs better and faster than global, 1–1 data association methods.  相似文献   

6.
一种基于人工鱼群的混合聚类算法   总被引:2,自引:0,他引:2  
聚类分析是数据挖掘的核心技术之一,它是一种无导师监督的模式识别方式。聚类分析就是按照数据间的相似程度,依据特定的准则将数据划分成不同子类。文中通过分析K-平均算法的优缺点,提出了一种基于人工鱼群算法的聚类分析算法,并把它与传统的K-平均算法结合得到一种新的混合聚类算法。仿真实验表明,该算法是有效的,具有聚类速度快、精度高特点。  相似文献   

7.
In this research, a data clustering algorithm named as non-dominated sorting genetic algorithm-fuzzy membership chromosome (NSGA-FMC) based on K-modes method which combines fuzzy genetic algorithm and multi-objective optimization was proposed to improve the clustering quality on categorical data. The proposed method uses fuzzy membership value as chromosome. In addition, due to this innovative chromosome setting, a more efficient solution selection technique which selects a solution from non-dominated Pareto front based on the largest fuzzy membership is integrated in the proposed algorithm. The multiple objective functions: fuzzy compactness within a cluster (π) and separation among clusters (sep) are used to optimize the clustering quality. A series of experiments by using three UCI categorical datasets were conducted to compare the clustering results of the proposed NSGA-FMC with two existing methods: genetic algorithm fuzzy K-modes (GA-FKM) and multi-objective genetic algorithm-based fuzzy clustering of categorical attributes (MOGA (π, sep)). Adjusted Rand index (ARI), π, sep, and computation time were used as performance indexes for comparison. The experimental result showed that the proposed method can obtain better clustering quality in terms of ARI, π, and sep simultaneously with shorter computation time.  相似文献   

8.
This paper proposes a new method to weight subspaces in feature groups and individual features for clustering high-dimensional data. In this method, the features of high-dimensional data are divided into feature groups, based on their natural characteristics. Two types of weights are introduced to the clustering process to simultaneously identify the importance of feature groups and individual features in each cluster. A new optimization model is given to define the optimization process and a new clustering algorithm FG-k-means is proposed to optimize the optimization model. The new algorithm is an extension to k-means by adding two additional steps to automatically calculate the two types of subspace weights. A new data generation method is presented to generate high-dimensional data with clusters in subspaces of both feature groups and individual features. Experimental results on synthetic and real-life data have shown that the FG-k-means algorithm significantly outperformed four k-means type algorithms, i.e., k-means, W-k-means, LAC and EWKM in almost all experiments. The new algorithm is robust to noise and missing values which commonly exist in high-dimensional data.  相似文献   

9.
A cluster validity index for fuzzy clustering   总被引:1,自引:0,他引:1  
A new cluster validity index is proposed for the validation of partitions of object data produced by the fuzzy c-means algorithm. The proposed validity index uses a variation measure and a separation measure between two fuzzy clusters. A good fuzzy partition is expected to have a low degree of variation and a large separation distance. Testing of the proposed index and nine previously formulated indices on well-known data sets shows the superior effectiveness and reliability of the proposed index in comparison to other indices and the robustness of the proposed index in noisy environments.  相似文献   

10.
The concept of convergence clubs is analyzed and compared with classical methods for the study of economic β-convergence, which often consider the entire data set as one sample. A technique for the identification of convergence clubs is proposed. The algorithm is based on a modified version of the usual regression trees procedure. The objective function of the method is represented by the difference among the parameters of the model under investigation. Different strategies are adopted in the definition of the model used in the objective function of the algorithm. The first is the classical non-spatial β-convergence model. The others are modified β-convergence models which take into account the dependence showed by spatially distributed data. The proposed procedure identifies situation of local stationarity in the economic growth of the different regions: a group of regions is divided into two sub-groups if the parameter estimates are significantly different among them. The algorithm is applied to 191 European regions for the period 1980-2002. Given the adaptability of the algorithm, its implementation provides a flexible tool for the use of any regression model in the analysis of non-stationary spatial data.  相似文献   

11.
As one of the most popular algorithms for cluster analysis, fuzzy c-means (FCM) and its variants have been widely studied. In this paper, a novel generalized version called double indices-induced FCM (DI-FCM) is developed from another perspective. DI-FCM introduces a power exponent r into the constraints of the objective function such that the fuzziness index m is generalized and a new criterion of selecting an appropriate fuzziness index m is defined. Furthermore, it can be explained from the viewpoint of entropy concept that the power exponent r facilitates the introduction of entropy-based constraints into fuzzy clustering algorithms. As an attractive and judicious application, DI-FCM is integrated with a fuzzy subspace clustering (FSC) algorithm so that a new fuzzy subspace clustering algorithm called double indices-induced fuzzy subspace clustering (DI-FSC) algorithm is proposed for high-dimensional data. DI-FSC replaces the commonly used Euclidean distance with the feature-weighted distance, resulting in having two fuzzy matrices in the objective function. A convergence proof of DI-FSC is also established by applying Zangwill’s convergence theorem. Several experiments on both artificial data and real data were conducted and the experimental results show the effectiveness of the proposed algorithm.  相似文献   

12.
This paper shows fundamentals and applications of the novel parametric fuzzy cerebellar model articulation controller (P-FCMAC) network. It resembles a neural structure that derived from the Albus CMAC algorithm and Takagi–Sugeno–Kang parametric fuzzy inference systems. The Gaussian basis function is used to model the hypercube structure and the linear parametric equation of the network input variance is used to model the TSK-type output. A self-constructing learning algorithm, which consists of the self-clustering method (SCM) and the backpropagation algorithm, is proposed. The proposed the SCM scheme is a fast, one-pass algorithm for a dynamic estimation of the number of hypercube cells in an input data space. The clustering technique does not require prior knowledge of things such as the number of clusters present in a data set. The backpropagation algorithm is used to tune the adjustable parameters. Illustrative examples were conducted to show the performance and applicability of the proposed model.  相似文献   

13.
This paper presents a dynamically scheduled parallel DSP architecture for general purpose DSP computations. The architecture consists of multiple DSP processors and of one or more scheduling units. DSP applications are first captured by stream flow graphs, and then stream flow graphs are statically mapped onto a parallel architecture. The ordering and starting time of DSP tasks are determined by the scheduling unit(s) using a dynamic scheduling algorithm.The main contributions of this paper are summarized as follows:• A scalable parallel DSP architecture: The parallel DSP architecture proposed in this paper is scalable to meet signal processing requirements. For parallel DSP architectures with large configurations, the scheduling unit may become a performance bottleneck. A distributed scheduling mechanism is proposed to address this problem.• A mapping algorithm: An algorithm is proposed to systematically map a stream flow graph onto a parallel DSP architecture.• A dynamic scheduling algorithm: We propose a dynamic scheduling algorithm that will only schedule a node for execution when both input data and output storage space are available. Such scheduling algorithm will allow buffer sizes to be determined at compile time.• A simulation study: Our simulation study reveals the relationships among the grain-size, the processor utilization, and the scheduling capability. We believe these relationships have significant impact on parallel computer architecture design involving dynamic scheduling.  相似文献   

14.
Noise sensitivity is known as a key related issue of AdaBoost algorithm. Previous works exhibit that AdaBoost is prone to be overfitting in dealing with the noisy data sets due to its consistent high weights assignment on hard-to-learn instances (mislabeled instances or outliers). In this paper, a new boosting approach, named noise-detection based AdaBoost (ND-AdaBoost), is exploited to combine classifiers by emphasizing on training misclassified noisy instances and correctly classified non-noisy instances. Specifically, the algorithm is designed by integrating a noise-detection based loss function into AdaBoost to adjust the weight distribution at each iteration. A k-nearest-neighbor (k-NN) and an expectation maximization (EM) based evaluation criteria are both constructed to detect noisy instances. Further, a regeneration condition is presented and analyzed to control the ensemble training error bound of the proposed algorithm which provides theoretical support. Finally, we conduct some experiments on selected binary UCI benchmark data sets and demonstrate that the proposed algorithm is more robust than standard and other types of AdaBoost for noisy data sets.  相似文献   

15.
Partitional clustering of categorical data is normally performed by using K-modes clustering algorithm, which works well for large datasets. Even though the design and implementation of K-modes algorithm is simple and efficient, it has the pitfall of randomly choosing the initial cluster centers for invoking every new execution that may lead to non-repeatable clustering results. This paper addresses the randomized center initialization problem of K-modes algorithm by proposing a cluster center initialization algorithm. The proposed algorithm performs multiple clustering of the data based on attribute values in different attributes and yields deterministic modes that are to be used as initial cluster centers. In the paper, we propose a new method for selecting the most relevant attributes, namely Prominent attributes, compare it with another existing method to find Significant attributes for unsupervised learning, and perform multiple clustering of data to find initial cluster centers. The proposed algorithm ensures fixed initial cluster centers and thus repeatable clustering results. The worst-case time complexity of the proposed algorithm is log-linear to the number of data objects. We evaluate the proposed algorithm on several categorical datasets and compared it against random initialization and two other initialization methods, and show that the proposed method performs better in terms of accuracy and time complexity. The initial cluster centers computed by the proposed approach are close to the actual cluster centers of the different data we tested, which leads to faster convergence of K-modes clustering algorithm in conjunction to better clustering results.  相似文献   

16.
17.
In a random fuzzy information system, by introducing a fuzzy t-similarity relation on the objects set for a subset of attributes set, the approximate representations of knowledge are established. By discussing fuzzy belief measures and fuzzy plausibility measures defined by the lower approximation and the upper approximation in a random fuzzy approximation space, some equivalent conditions of knowledge reduction in a random fuzzy information system are proved. Similarly as in an information system, the fuzzy-set-valued attribute discernibility matrixes in a random fuzzy information system are constructed. Knowledge reduction is defined from the view of fuzzy belief measures and fuzzy plausibility measures and a heuristic knowledge reduction algorithm is proposed, and the time complexity of this algorithm is O(|U|2|A|). A running example illustrates the potential application of algorithm, and the experimental results on the data sets with numerical attributes show that the proposed method is effective.  相似文献   

18.
Aiming at the problem of top-k spatial join query processing in cloud computing systems, a Spark-based top-k spatial join (STKSJ) query processing algorithm is proposed. In this algorithm, the whole data space is divided into grid cells of the same size by a grid partitioning method, and each spatial object in one data set is projected into a grid cell. The Minimum Bounding Rectangle (MBR) of all spatial objects in each grid cell is computed. The spatial objects overlapping with these MBRs in another spatial data set are replicated to the corresponding grid cells, thereby filtering out spatial objects for which there are no join results, thus reducing the cost of subsequent spatial join processing. An improved plane sweeping algorithm is also proposed that speeds up the scanning mode and applies threshold filtering, thus greatly reducing the communication and computation costs of intermediate join results in subsequent top-k aggregation operations. Experimental results on synthetic and real data sets show that the proposed algorithm has clear advantages, and better performance than existing top-k spatial join query processing algorithms.  相似文献   

19.
Clustering a large volume of data in a distributed environment is a challenging issue. Data stored across multiple machines are huge in size, and solution space is large. Genetic algorithm deals effectively with larger solution space and provides better solution. In this paper, we proposed a novel clustering algorithm for distributed datasets, using combination of genetic algorithm (GA) with Mahalanobis distance and k-means clustering algorithm. The proposed algorithm is two phased; in phase 1, GA is applied in parallel on data chunks located across different machines. Mahalanobis distance is used as fitness value in GA, which considers covariance between the data points and thus provides a better representation of initial data. K-means with K-means\( ++ \) initialization is applied in phase 2 on intermediate output to get final result. The proposed algorithm is implemented on Hadoop framework, which is inherently designed to deal with distributed datasets in a fault-tolerant manner. Extensive experiments were conducted for multiple real-life and synthetic datasets to measure performance of our proposed algorithm. Results were compared with MapReduce-based algorithms, mrk-means, parallel k-means and scaling GA.  相似文献   

20.
A new data clustering algorithm Density oriented Kernelized version of Fuzzy c-means with new distance metric (DKFCM-new) is proposed. It creates noiseless clusters by identifying and assigning noise points into separate cluster. In an earlier work, Density Based Fuzzy C-Means (DOFCM) algorithm with Euclidean distance metric was proposed which only considered the distance between cluster centroid and data points. In this paper, we tried to improve the performance of DOFCM by incorporating a new distance measure that has also considered the distance variation within a cluster to regularize the distance between a data point and the cluster centroid. This paper presents the kernel version of the method. Experiments are done using two-dimensional synthetic data-sets, standard data-sets referred from previous papers like DUNN data-set, Bensaid data-set and real life high dimensional data-sets like Wisconsin Breast cancer data, Iris data. Proposed method is compared with other kernel methods, various noise resistant methods like PCM, PFCM, CFCM, NC and credal partition based clustering methods like ECM, RECM, CECM. Results shown that proposed algorithm significantly outperforms its earlier version and other competitive algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号