首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the design of a financial bankruptcy prediction model, financial ratio selection and classifier design play major roles. Methodology based on expert opinion, statistical theory and computational intelligence technique has been widely applied. In this study, a hybrid structure integrating statistical theory and computational intelligence technique was developed using genetic algorithm (GA) with statistical measurements and fuzzy logic based fitness functions for key ratio selection. A fuzzy clustering algorithm was used for the classifier design. In the experiments, two financial ratio sets, one extracted from the suggestions of other studies and the other obtained by using the GA toolbox in the SAS statistical software package, were applied to examine the proposed ratio selection schemes. For classifier design, the developed fuzzy classifier was compared with the well known BPNN classifier frequently used in other studies. Besides, comparison between the developed hybrid structure and other well applied structures was also given. Experimental results based on one to four years of financial data prior to the occurrence of bankruptcy were used to evaluate the performance of the proposed prediction model.  相似文献   

2.
Approaches for scaling DBSCAN algorithm to large spatial databases   总被引:7,自引:0,他引:7       下载免费PDF全文
The huge amount of information stored in datablases owned by coporations(e.g.retail,financial,telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining.Clustering.in data mining,is a useful technique for discovering intersting data distributions and patterns in the underlying data,and has many application fields,such as statistical data analysis,pattern recognition,image processsing,and other business application,s Although researchers have been working on clustering algorithms for decades,and a lot of algorithms for clustering have been developed,there is still no efficient algorithm for clustering very large databases and high dimensional data,As an outstanding representative of clustering algorithms,DBSCAN algorithm shows good performance in spatial data clustering.However,for large spatial databases,DBSCAN requires large volume of memory supprot and could incur substatial I/O costs because it operates directly on the entrie database,In this paper,several approaches are proposed to scale DBSCAN algorithm to large spatial databases.To begin with,a fast DBSCAN algorithm is developed.which considerably speeeds up the original DBSCAN algorithm,Then a sampling based DBSCAN algorithm,a partitioning-based DBSCAN algorithm,and a parallel DBSCAN algorithm are introduced consecutively.Following that ,based on the above-proposed algorithms,a synthetic algorithm is also given,Finally,some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.  相似文献   

3.
4.
由于人们对事物认知的局限性和信息的不确定性,在对决策问题进行聚类分析时,传统的模糊聚类不能有效解决实际场景中的决策问题,因此有学者提出了有关犹豫模糊集的聚类算法.现有的层次犹豫模糊K均值聚类算法没有利用数据集本身的信息来确定距离函数的权值,且簇中心的计算复杂度和空间复杂度都是指数级的,不适用于大数据环境.针对上述问题,...  相似文献   

5.
基于商空间的非均匀粒度聚类分析   总被引:4,自引:0,他引:4  
徐峰  张铃 《计算机工程》2005,31(3):26-28,53
采用距离度量空间的手段讨论了商空间的模糊粒度聚类,结合信息融合技术用不同粒度合成聚类结果,认为聚类可以以非均匀粒度来描述样本集。据此提出了使用Gaussian型函数定义商空间的距离函数的模糊聚类算法(FCluster算法),算法用距离表示信息粒度,不需要定义隶属函数和求出相似矩阵,并且不需要讨论参数的选择。仿真实验说明了算法可以很直观地从不同粒度(距离)观察聚类结果,大大降低了计算复杂度和空间复杂度,适于处理大数据量的样本,并且Gaussian型函数定义的距离对试验样本可以达到很好的效果。  相似文献   

6.
Effective fuzzy c-means clustering algorithms for data clustering problems   总被引:3,自引:0,他引:3  
Clustering is a well known technique in identifying intrinsic structures and find out useful information from large amount of data. One of the most extensively used clustering techniques is the fuzzy c-means algorithm. However, computational task becomes a problem in standard objective function of fuzzy c-means due to large amount of data, measurement uncertainty in data objects. Further, the fuzzy c-means suffer to set the optimal parameters for the clustering method. Hence the goal of this paper is to produce an alternative generalization of FCM clustering techniques in order to deal with the more complicated data; called quadratic entropy based fuzzy c-means. This paper is dealing with the effective quadratic entropy fuzzy c-means using the combination of regularization function, quadratic terms, mean distance functions, and kernel distance functions. It gives a complete framework of quadratic entropy approaching for constructing effective quadratic entropy based fuzzy clustering algorithms. This paper establishes an effective way of estimating memberships and updating centers by minimizing the proposed objective functions. In order to reduce the number iterations of proposed techniques this article proposes a new algorithm to initialize the cluster centers.In order to obtain the cluster validity and choosing the number of clusters in using proposed techniques, we use silhouette method. First time, this paper segments the synthetic control chart time series directly using our proposed methods for examining the performance of methods and it shows that the proposed clustering techniques have advantages over the existing standard FCM and very recent ClusterM-k-NN in segmenting synthetic control chart time series.  相似文献   

7.
目的 传统模糊C-均值聚类应用于图像分割仅考虑像素本身的聚类问题,无法克服噪声干扰对图像分割结果的影响,不利于受到噪声干扰的工业图像、医学影像和高分遥感影像等进行目标提取、识别和解译。嵌入像素空间邻域信息或局部信息的鲁棒模糊C-均值聚类分割算法是近年来图像分割理论研究中的热点课题。为此,针对现有的鲁棒核空间模糊聚类算法非常耗时且抑制噪声能力弱、不适合强噪声干扰下大幅面图像快速分割等问题,提出一种快速鲁棒核空间模糊聚类分割算法。方法 利用待分割图像中像素邻域的灰度信息和空间位置等信息构建线性加权滤波图像,对其进行鲁棒核空间模糊聚类。为了进一步提高算法实时性,引入当前聚类像素与其邻域像素均值所对应的2维直方图信息,构造一种基于2维直方图的鲁棒核空间模糊聚类快速分割最优化数学模型,采用拉格朗日乘子法获得图像分割的像素聚类迭代表达式。结果 对大幅面图像添加一定强度的高斯、椒盐以及混合噪声,以及未加噪标准图像的分割测试结果表明,本文算法比基于邻域空间约束的核模糊C-均值聚类等算法的峰值信噪比至少提高1.5 dB,误分率降低约5%,聚类性能评价的划分系数提高约10%,运行速度比核模糊C-均值聚类和基于邻域空间约束的鲁棒核模糊C-均值聚类算法至少提高30%,与1维直方图核空间模糊C-均值聚类算法具有相当的时间开销,所得分割结果具有较好的主观视觉效果。结论 通过理论分析和实验验证,本文算法相比现有空间邻域信息约束的鲁棒核空间模糊聚类等算法具有更强的抗噪鲁棒性、更优的分割性能和实时性,对大幅面遥感、医学等影像快速解译具有积极的促进作用,能更好地满足实时性要求较高场合的图像分割需要。  相似文献   

8.
Clustering algorithms are increasingly employed for the categorization of image databases, in order to provide users with database overviews and make their access more effective. By including information provided by the user, the categorization process can produce results that come closer to user's expectations. To make such a semi-supervised categorization approach acceptable for the user, this information must be of a very simple nature and the amount of information the user is required to provide must be minimized. We propose here an effective semi-supervised clustering algorithm, active fuzzy constrained clustering (AFCC), that minimizes a competitive agglomeration cost function with fuzzy terms corresponding to pairwise constraints provided by the user. In order to minimize the amount of constraints required, we define an active mechanism for the selection of candidate constraints. The comparisons performed on a simple benchmark and on a ground truth image database show that with AFCC the results of clustering can be significantly improved with few constraints, making this semi-supervised approach an attractive alternative in the categorization of image databases.  相似文献   

9.
结构化模糊K-prototypes聚类算法   总被引:2,自引:0,他引:2  
尽管综合了K-means和K-modes的K-prototypes算法已能有效地处理符号数据,但用聚类中的符号模(modes)来表示聚类中的数据均值将引起大量的信息丢失。为此,本文提出了一种适合于混合类型数据的结构化模糊K-prototypes算法(SFKP),在不增加时空开销的情况下提高聚类能力。实际数据集上的实验结果显示,SFKP算法能够进行更加有效的聚类。  相似文献   

10.
Financial markets play an important role on the economical and social organization of modern society. In these kinds of markets, information is an invaluable asset. However, with the modernization of the financial transactions and the information systems, the large amount of information available for a trader can make prohibitive the analysis of a financial asset. In the last decades, many researchers have attempted to develop computational intelligent methods and algorithms to support the decision-making in different financial market segments. In the literature, there is a huge number of scientific papers that investigate the use of computational intelligence techniques to solve financial market problems. However, only few studies have focused on review the literature of this topic. Most of the existing review articles have a limited scope, either by focusing on a specific financial market application or by focusing on a family of machine learning algorithms. This paper presents a review of the application of several computational intelligent methods in several financial applications. This paper gives an overview of the most important primary studies published from 2009 to 2015, which cover techniques for preprocessing and clustering of financial data, for forecasting future market movements, for mining financial text information, among others. The main contributions of this paper are: (i) a comprehensive review of the literature of this field, (ii) the definition of a systematic procedure for guiding the task of building an intelligent trading system and (iii) a discussion about the main challenges and open problems in this scientific field.  相似文献   

11.
Rapid technological advances imply that the amount of data stored in databases is rising very fast. However, data mining can discover helpful implicit information in large databases. How to detect the implicit and useful information with lower time cost, high correctness, high noise filtering rate and fit for large databases is of priority concern in data mining, specifying why considerable clustering schemes have been proposed in recent decades. This investigation presents a new data clustering approach called PHD, which is an enhanced version of KIDBSCAN. PHD is a hybrid density-based algorithm, which partitions the data set by K-means, and then clusters the resulting partitions with IDBSCAN. Finally, the closest pairs of clusters are merged until the natural number of clusters of data set is reached. Experimental results reveal that the proposed algorithm can perform the entire clustering, and efficiently reduce the run-time cost. They also indicate that the proposed new clustering algorithm conducts better than several existing well-known schemes such as the K-means, DBSCAN, IDBSCAN and KIDBSCAN algorithms. Consequently, the proposed PHD algorithm is efficient and effective for data clustering in large databases.  相似文献   

12.
There are two popular types of forecasting algorithms for fuzzy time series (FTS). One is based on intervals of universal sets of independent variables and the other is based on fuzzy clustering algorithms. Clustering based FTS algorithms are preferred since role and optimal length of intervals are not clearly understood. Therefore data of each variable are individually clustered which requires higher computational time. Fuzzy Logical Relationships (FLRs) are used in existing FTS algorithms to relate input and output data. High number of clusters and FLRs are required to establish precise input/output relations which incur high computational time. This article presents a forecasting algorithm based on fuzzy clustering (CFTS) which clusters vectors of input data instead of clustering data of each variable separately and uses linear combinations of the input variables instead of the FLRs. The cluster centers handle fuzziness and ambiguity of the data and the linear parts allow the algorithm to learn more from the available information. It is shown that CFTS outperforms existing FTS algorithms with considerably lower testing error and running time.  相似文献   

13.
计算两点之间的最短距离是标记图的基本操作之一。对于大图,根据路标节点估算两点之间最短距离的方法来提高查询效率。现有的路标节点选择策略不能在中心性和计算量小两方面同时满足,路标节点存储到其他节点的距离信息,存储量仍然很大。对于大规模有向图来说,路标节点选取策略保证中心性的同时减少了计算量,使用了DBSCAN聚类思想将节点划分成不同的类,选择具有联通性的向前和向后核心节点作为向前和向后路标节点;存储类内路标节点与普通节点之间的距离信息以及类间路标节点之间的距离信息来减少存储量;源节点通过向后路标节点和向前路标节点到达目标节点,采用上界和下界的最小均值作为估计值。理论证明算法策略在时间复杂度和空间复杂度方面与传统方法相比降低了。实验证明对于大图在平均相对误差方面与传统方法误差数量级相同。  相似文献   

14.
目的 为了提高2维直方图模糊C均值聚类分割算法的抗噪性和普适性,提出了属性加权2维直方图模糊C均值聚类分割新方法。方法 针对2维直方图模糊C均值聚类分割算法存在阈值参数选取不当导致抗噪性能差的不足,将属性加权引入2维直方图模糊C均值聚类并有效解决了每维属性聚类贡献度的问题。结果 本文算法相比2维直方图模糊C均值聚类分割法抗椒盐和高斯噪声性能平均提高了2~3 dB;同时,相比模糊局部C均值聚类分割法抗椒盐噪声性能平均提高了2~3 dB且抗高斯噪声性能稍差大约1 dB,但本文算法相比模糊局部C均值聚类分割法的速度平均提高了大约40倍。结论 实验结果表明,本文算法相比现有2维直方图模糊C均值聚类算法更适合噪声图像分割;同时,相比模糊局部C均值聚类算法更有利于实时性要求较高场合的目标跟踪和识别等需要。同时从大量图像测试得出,本文算法对于一般人工合成图像、智能交通图像及遥感图像等具有普遍适用性。  相似文献   

15.
李钊  李晓  王春梅  李诚  杨春 《计算机科学》2016,43(1):246-250, 269
在文本聚类中,相似性度量是影响聚类效果的重要因素。常用的相似性度量测度,如欧氏距离、相关系数等,只能描述文本间的低阶相关性,而文本间的关系非常复杂,基于低阶相关测度的聚类效果不太理想。一些基于复杂测度的文本聚类方法已被提出,但随着数据规模的扩展,文本聚类的计算量不断增加,传统的聚类方法已不适用于大规模文本聚类。针对上述问题,提出一种基于MapReduce的分布式聚类方法,该方法对传统K-means算法进行了改进,采用了基于信息损失量的相似性度量。为进一步提高聚类的效率,将该方法与基于MapReduce的主成分分析方法相结合,以降低文本特征向量的维数。实例分析表明,提出的大规模文本聚类方法的 聚类性能 比已有的聚类方法更好。  相似文献   

16.
Large graphs are scale free and ubiquitous having irregular relationships. Clustering is used to find existent similar patterns in graphs and thus help in getting useful insights. In real-world, nodes may belong to more than one cluster thus, it is essential to analyze fuzzy cluster membership of nodes. Traditional centralized fuzzy clustering algorithms incur high communication cost and produce poor quality of clusters when used for large graphs. Thus, scalable solutions are obligatory to handle huge amount of data in less computational time with minimum disk access. In this paper, we proposed a parallel fuzzy clustering algorithm named ‘PGFC’ for handling scalable graph data. It will be advantageous from the viewpoint of expert systems to develop a clustering algorithm that can assure scalability along with better quality of clusters for handling large graphs.The algorithm is parallelized using bulk synchronous parallel (BSP) based Pregel model. The cluster centers are initialized using degree centrality measure, resulting in lesser number of iterations. The performance of PGFC is compared with other state of art clustering algorithms using synthetic graphs and real world networks. The experimental results reveal that the proposed PGFC scales up linearly to handle large graphs and produces better quality of clusters when compared to other graph clustering counterparts.  相似文献   

17.
传统的模糊连接点FJP聚类算法采用基于欧氏距离的最大 最小合成运算法生成传递闭包,该方法所生成的传递闭包存在失真问题,即包含有较多错误的数据关联信息,最终造成算法聚类精度低且计算时间长。针对以上问题,提出一种改进的模糊连接点聚类算法:先用组合核函数计算数据集的模糊相似度矩阵,提高算法对数据非线性特征的辨识能力,并用大顶堆存储之;然后遍历传递闭包矩阵中的空元素,用堆顶的桥元素填充传递闭包的空元素,直至生成传递闭包。在测试数据集上的实验结果表明,本文算法的平均聚类精度较传统FJP算法有20%以上的提升,显著改善了传递闭包的失真问题;另外,在大型数据集上的计算效率亦优于传统FJP算法的,说明本文改进FJP算法的思路是有效的、可行的。  相似文献   

18.
Researchers realized the importance of integrating fuzziness into association rules mining in databases with binary and quantitative attributes. However, most of the earlier algorithms proposed for fuzzy association rules mining either assume that fuzzy sets are given or employ a clustering algorithm, like CURE, to decide on fuzzy sets; for both cases the number of fuzzy sets is pre-specified. In this paper, we propose an automated method to decide on the number of fuzzy sets and for the autonomous mining of both fuzzy sets and fuzzy association rules. We achieve this by developing an automated clustering method based on multi-objective Genetic Algorithms (GA); the aim of the proposed approach is to automatically cluster values of a quantitative attribute in order to obtain large number of large itemsets in less time. We compare the proposed multi-objective GA based approach with two other approaches, namely: 1) CURE-based approach, which is known as one of the most efficient clustering algorithms; 2) Chien et al. clustering approach, which is an automatic interval partition method based on variation of density. Experimental results on 100 K transactions extracted from the adult data of USA census in year 2000 showed that the proposed automated clustering method exhibits good performance over both CURE-based approach and Chien et al.’s work in terms of runtime, number of large itemsets and number of association rules.  相似文献   

19.
Parallel distributed genetic fuzzy rule selection   总被引:1,自引:1,他引:0  
Genetic fuzzy rule selection has been successfully used to design accurate and compact fuzzy rule-based classifiers. It is, however, very difficult to handle large data sets due to the increase in computational costs. This paper proposes a simple but effective idea to improve the scalability of genetic fuzzy rule selection to large data sets. Our idea is based on its parallel distributed implementation. Both a training data set and a population are divided into subgroups (i.e., into training data subsets and sub-populations, respectively) for the use of multiple processors. We compare seven variants of the parallel distributed implementation with the original non-parallel algorithm through computational experiments on some benchmark data sets.  相似文献   

20.
近年来谱聚类算法被广泛应用于图像分割领域,而相似性矩阵的构造是谱聚类算法的关键步骤。 针对传统谱聚类算法计算复杂度高难以应用到大规模图像分割处理的问题,提出了基于半监督的超像素谱聚类彩色图像分割算法。该算法利用超像素将彩色图像进行预分割,利用用户提供的少量标记信息构造预分割区域的基于半监督的模糊相似性测度,利用该相似性测度构造预分隔区域的相似性矩阵并通过规范切图谱划分准则对预分割区域进行划分得到最终的图像分割结果。由于少量标记信息和模糊理论的引入,提高了传统谱聚类的分割性能,对比实验也表明该算法在分割效果和计算复杂度上都有较大的改善。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号