首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Mining fuzzy association rules in a bank-account database   总被引:1,自引:0,他引:1  
This paper describes how we applied a fuzzy technique to a data-mining task involving a large database that was provided by an international bank with offices in Hong Kong. The database contains the demographic data of over 320,000 customers and their banking transactions, which were collected over a six-month period. By mining the database, the bank would like to be able to discover interesting patterns in the data. The bank expected that the hidden patterns would reveal different characteristics about different customers so that they could better serve and retain them. To help the bank achieve its goal, we developed a fuzzy technique, called fuzzy association rule mining II (FARM II). FARM II is able to handle both relational and transactional data. It can also handle fuzzy data. The former type of data allows FARM II to discover multidimensional association rules, whereas the latter data allows some of the patterns to be more easily revealed and expressed. To effectively uncover the hidden associations in the bank-account database, FARM II performs several steps which are described in detail in this paper. With FARM II, the bank discovered that they had identified some interesting characteristics about the customers who had once used the bank's loan services but then decided later to cease using them. The bank translated what they discovered into actionable items by offering some incentives to retain their existing customers.  相似文献   

2.
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. In real-world applications, transactions may contain quantitative values and each item may have a lifespan from a temporal database. In this paper, we thus propose a data mining algorithm for deriving fuzzy temporal association rules. It first transforms each quantitative value into a fuzzy set using the given membership functions. Meanwhile, item lifespans are collected and recorded in a temporal information table through a transformation process. The algorithm then calculates the scalar cardinality of each linguistic term of each item. A mining process based on fuzzy counts and item lifespans is then performed to find fuzzy temporal association rules. Experiments are finally performed on two simulation datasets and the foodmart dataset to show the effectiveness and the efficiency of the proposed approach.  相似文献   

3.
This paper presents a novel classification approach that integrates fuzzy class association rules and support vector machines. A fuzzy discretization technique based on fuzzy c-means clustering algorithm is employed to transform the training set, particularly quantitative attributes, to a format appropriate for association rule mining. A hill-climbing procedure is adapted for automatic thresholds adjustment and fuzzy class association rules are mined accordingly. The compatibility between the generated rules and fuzzy patterns is considered to construct a set of feature vectors, which are used to generate a classifier. The reported test results show that compatibility rule-based feature vectors present a highly- qualified source of discrimination knowledge that can substantially impact the prediction power of the final classifier. In order to evaluate the applicability of the proposed method to a variety of domains, it is also utilized for the popular task of gene expression classification. Further, we show how this method provide biologists with an accurate and more understandable classifier model compared to other machine learning techniques.  相似文献   

4.
Chaos game representation of proteins   总被引:1,自引:0,他引:1  
The present report proposes a new method for the chaos game representation (CGR) of different families of proteins. Using concatenated amino acid sequences of proteins belonging to a particular family and a 12-sided regular polygon, each vertex of which represents a group of amino acid residues leading to conservative substitutions, the method can generate the CGR of the family and allows pictorial representation of the pattern characterizing the family. An estimation of the percentages of points plotted in different segments of the CGR (grid points) allows quantification of the nonrandomness of the CGR patterns generated. The CGRs of different protein families exhibited distinct visually identifiable patterns. This implies that different functional classes of proteins follow specific statistical biases in the distribution of different mono-, di-, tri-, or higher order peptides along their primary sequences. The potential of grid counts as the discriminative and diagnostic signature of a family of proteins is discussed.  相似文献   

5.
This book is entirely focused on explaining how fuzzy concepts and approaches can be useful in bioinformatics. It is organized into seven chapters and two very useful appendices. Some of the topics covered include: the fundamental concepts of fuzzy set theory and fuzzy logic; specific examples of using fuzzy set theory and fuzzy logic to address bioinformatics analysis and modeling of data; analysis of microarray data; and future applications and directions for fuzzy computational approaches in molecular biology. Appendix I explains the fundamental biological concepts that are relevant to the biological subjects covered in the book and Appendix II lists and describes a wide variety of free online resources related to molecular biology, bioinformatics, and fuzzy set theory.Researchers in bioinformatics and fuzzy set theory and fuzzy logic will find this book to be an important resource for developing interdisciplinary research between the two fields. Educators can also use this book for a graduate course to both introduce the two fields and stimulate research ideas based on the currently active research topics presented.  相似文献   

6.
In many knowledge discovery and data mining tasks, fuzzy clustering is one of the most common tools for data partitioning. In this paper dynamic fuzzy clustering models for classifying a set of multivariate time trajectories (time series, sequences) are developed. In particular, by adopting an exploratory approach, based on a geometric-algebraic formulation of the data time array, different kinds of dynamic fuzzy clustering models, based on cross sectional and longitudinal aspects, are suggested. Furthermore, a modified version of the previous clustering models, that can be seen as a generalization of these models, is proposed. By utilizing these models we can obtain beneficial effects in the clustering process when anomalous trajectories (trajectories with anomalous positions and slopes) are present in the dataset; in fact the models are suitable for detecting structures of time trajectories with anomalous patterns that are not uniformly distributed over the structure's domains and are characterized by strange slopes. In these models, the disruptive effect of the anomalous trajectories is neutralized and smoothed and the information on the influence of individual time trajectories on the detected groups is given. Furthermore, some remarks on dynamic three-way extensions of a few robust fuzzy clustering models for two-way data are suggested. Demonstrative examples are shown and a comparison assessment based on artificial multivariate time-varying data is carried out  相似文献   

7.
李广璞  黄妙华 《计算机科学》2018,45(Z11):1-11, 26
关联分析作为数据挖掘的主要研究模块之一,主要用于发现隐藏在大型数据集中的强关联特征。而多数关联规则挖掘任务可分为频繁模式(频繁项集、频繁序列、频繁子图)的产生和规则的产生。前者发现数据集中满足最小支持度阈值的项集、序列与子图;后者从上一步发现的频繁模式中提取高置信度的规则。频繁项集挖掘是许多数据挖掘任务中的关键问题,也是关联规则挖掘算法的核心。十几年来,学者们致力于提高频繁项集的生成效率,从不同的角度进行改进以提高算法效率,大量的高效可伸缩性算法被提出。文中对频繁项集挖掘进行深入分析,对完全频繁项集、闭频繁项集、极大频繁项集的典型算法进行介绍和评述,最后对频繁项集挖掘算法的研究方向进行简要分析。  相似文献   

8.
We develop a neurofuzzy network technique to extract TSK-type fuzzy rules from a given set of input-output data for system modeling problems. Fuzzy clusters are generated incrementally from the training dataset, and similar clusters are merged dynamically together through input-similarity, output-similarity, and output-variance tests. The associated membership functions are defined with statistical means and deviations. Each cluster corresponds to a fuzzy IF-THEN rule, and the obtained rules can be further refined by a fuzzy neural network with a hybrid learning algorithm which combines a recursive singular value decomposition-based least squares estimator and the gradient descent method. The proposed technique has several advantages. The information about input and output data subspaces is considered simultaneously for cluster generation and merging. Membership functions match closely with and describe properly the real distribution of the training data points. Redundant clusters are combined, and the sensitivity to the input order of training data is reduced. Besides, generation of the whole set of clusters from the scratch can be avoided when new training data are considered.  相似文献   

9.
文中提出一种新的方法通过使用模糊c均值对原始数据集进行预处理操作,通过这个操作可以把定量属性值转换为二进制值,继而就会得到原始数据集的模糊版本(由模糊记录和模糊属性组成)。另外,文中又提出了一种基于模糊Apfiori算法的快速提取规则的算法,这种算法是利用模糊聚类从先前得到的原始数据集的模糊版本中提取模糊频繁项集从而可以得到模糊关联规则。在文章的最后,实验结果显示了提出的新算法在处理大型数据集时在挖掘时间上要优于传统的Apriori算法。对大型数据库来说,该算法在实用性和可用性上面都有很好的发展前景。  相似文献   

10.
When using data-mining tools to analyze big data, users often need tools to support the understanding of individual data attributes and control the analysis progress. This requires the integration of data-mining algorithms with interactive tools to manipulate data and analytical process. This is where visual analytics can help. More than simple visualization of a dataset or some computation results, visual analytics provides users an environment to iteratively explore different inputs or parameters and see the corresponding results. In this research, we explore a design of progressive visual analytics to support the analysis of categorical data with a data-mining algorithm, Apriori. Our study focuses on executing data mining techniques step-by-step and showing intermediate result at every stage to facilitate sense-making. Our design, called Pattern Discovery Tool, targets for a medical dataset. Starting with visualization of data properties and immediate feedback of users’ inputs or adjustments, Pattern Discovery Tool could help users detect interesting patterns and factors effectively and efficiently. Afterward, further analyses such as statistical methods could be conducted to test those possible theories.  相似文献   

11.
We present an application of type-2 neuro-fuzzy modeling to stock price prediction based on a given set of training data. Type-2 fuzzy rules can be generated automatically by a self-constructing clustering method and the obtained type-2 fuzzy rules cab be refined by a hybrid learning algorithm. The given training data set is partitioned into clusters through input-similarity and output-similarity tests, and a type-2 TSK rule is derived from each cluster to form a fuzzy rule base. Then the antecedent and consequent parameters associated with the rules are refined by particle swarm optimization and least squares estimation. Experimental results, obtained by running on several datasets taken from TAIEX and NASDAQ, demonstrate the effectiveness of the type-2 neuro-fuzzy modeling approach in stock price prediction.  相似文献   

12.
A large volume of research in temporal data mining is focusing on discovering temporal rules from time-stamped data. The majority of the methods proposed so far have been mainly devoted to the mining of temporal rules which describe relationships between data sequences or instantaneous events and do not consider the presence of complex temporal patterns into the dataset. Such complex patterns, such as trends or up and down behaviors, are often very interesting for the users. In this paper we propose a new kind of temporal association rule and the related extraction algorithm; the learned rules involve complex temporal patterns in both their antecedent and consequent. Within our proposed approach, the user defines a set of complex patterns of interest that constitute the basis for the construction of the temporal rule; such complex patterns are represented and retrieved in the data through the formalism of knowledge-based Temporal Abstractions. An Apriori-like algorithm looks then for meaningful temporal relationships (in particular, precedence temporal relationships) among the complex patterns of interest. The paper presents the results obtained by the rule extraction algorithm on a simulated dataset and on two different datasets related to biomedical applications: the first one concerns the analysis of time series coming from the monitoring of different clinical variables during hemodialysis sessions, while the other one deals with the biological problem of inferring relationships between genes from DNA microarray data.  相似文献   

13.
Plaque morphology in a diseased coronary artery plays a significant role in the modification of the fluid flow characteristics. The plaque morphology of 42 patients who underwent IVUS (intravascular ultrasound) procedure was quantified by degree of membership in four fuzzy logic sets, which we refer as type I: protruding, type II: ascending, type III: descending, and type IV: diffuse. Of 42 cases, 28% were of type I, 18% type II, 20% type III and 23% type IV, 6% belonged to hybrid types (partial members of multiple types) and the remaining 5% did not fit in any category. The degree of membership is of significance as the inter-class blood flow patterns (those strongly members of the same set) are similar to each other compared to the intra-class behavior, indicating plaque morphology (shape of blockage) is an important metric in addition to the degree of stenosis to represent the flow characteristics in a diseased stenotic coronary artery.  相似文献   

14.
In this research, a hybrid model is developed by integrating a case-based data clustering method and a fuzzy decision tree for medical data classification. Two datasets from UCI Machine Learning Repository, i.e., liver disorders dataset and Breast Cancer Wisconsin (Diagnosis), are employed for benchmark test. Initially a case-based clustering method is applied to preprocess the dataset thus a more homogeneous data within each cluster will be attainted. A fuzzy decision tree is then applied to the data in each cluster and genetic algorithms (GAs) are further applied to construct a decision-making system based on the selected features and diseases identified. Finally, a set of fuzzy decision rules is generated for each cluster. As a result, the FDT model can accurately react to the test data by the inductions derived from the case-based fuzzy decision tree. The average forecasting accuracy for breast cancer of CBFDT model is 98.4% and for liver disorders is 81.6%. The accuracy of the hybrid model is the highest among those models compared. The hybrid model can produce accurate but also comprehensible decision rules that could potentially help medical doctors to extract effective conclusions in medical diagnosis.  相似文献   

15.
ML 型迁移学习模糊系统   总被引:12,自引:6,他引:6  
经典模糊系统构建方法训练时通常仅考虑单一的场景,其伴随的一个重要缺陷是: 如当前场景重要信息缺失,则受训所得系统泛化能力较差.针对此问题, 以Mamdani-Larsen (ML)型模糊系统为对象,探讨了具有迁移学习能力的模糊系统, 即ML型迁移学习模糊系统. ML型迁移学习模糊系统不仅能充分利用当前场景的数据信息, 而且能有效地利用历史知识来进行学习,具有通过迁移历史场景知识来弥补当前场景信息 缺失的能力.具体地,基于经典的压缩集密度估计(Reduced set density estimator, RSDE) ML型模糊系统构建方法, 通过引入迁移学习机制提出了一种基于密度估计的ML型迁移模糊系统构建方法. 在模拟数据和真实数据上的实验研究亦验证了该迁移模糊系统在信息缺失场景下较之于 传统模糊系统建模方法的更好适应性.  相似文献   

16.
针对数据集为模糊值时冗余信息难于消除的问题,提出了基于模糊相似关系的广义模糊粗糙集与QuickReduct算法相结合的方法。利用广义模糊粗糙集数据相似程度对属性值为实数值的数据集合进行约简,不需要预先对原始数据集合进行离散化,约简结果能更完整地反映原信息系统的分类能力。同时算法中利用了启发式信息,使模糊依赖性增加较快的属性作为最小约简。计算实例验证了该方法的有效性。  相似文献   

17.
This paper presents an approach for event detection and annotation of broadcast soccer video. It benefits from the fact that occurrence of some audiovisual features demonstrates remarkable patterns for detection of semantic events. However, the goal of this paper is to propose a flexible system that can be able to be used with minimum reliance on predefined sequences of features and domain knowledge derivative structures. To achieve this goal, we design a fuzzy rule-based reasoning system as a classifier which adopts statistical information from a set of audiovisual features as its crisp input values and produces semantic concepts corresponding to the occurred events. A set of tuples is created by discretization and fuzzification of continuous feature vectors derived from the training data. We extract the hidden knowledge among the tuples and correlation between the features and related events by constructing a decision tree (DT). A set of fuzzy rules is generated by traversing each path from root toward leaf nodes of constructed DT. These rules are inserted in fuzzy rule base of designed fuzzy system and employed by fuzzy inference engine to perform decision-making process and predict the occurred events in input video. Experimental results conducted on a large set of broadcast soccer videos demonstrate the effectiveness of the proposed approach.  相似文献   

18.
张诤  王惠文 《计算机工程》2010,36(23):13-15,18
对样本点数量巨大、用于刻画对象特征的指标众多、带有时空动态特性、包含大量噪声等特点的大规模复杂数据集进行定义。针对大规模复杂数据集的挖掘要求,结合统计分析、粗糙集、模糊集理论中的数据约简思想和方法,提出一种基于样本模糊聚类和粗糙集属性约简的大规模复杂数据集约简方法。  相似文献   

19.
In this paper, we present a data analytics and visualization framework for health-shocks prediction based on large-scale health informatics dataset. The framework is developed using cloud computing services based on Amazon web services (AWS) integrated with geographical information systems (GIS) to facilitate big data capture, storage, index and visualization of data through smart devices for different stakeholders. In order to develop a predictive model for health-shocks, we have collected a unique data from 1000 households, in rural and remotely accessible regions of Pakistan, focusing on factors like health, social, economic, environment and accessibility to healthcare facilities. We have used the collected data to generate a predictive model of health-shock using a fuzzy rule summarization technique, which can provide stakeholders with interpretable linguistic rules to explain the causal factors affecting health-shocks. The evaluation of the proposed system in terms of the interpret-ability and accuracy of the generated data models for classifying health-shock shows promising results. The prediction accuracy of the fuzzy model based on a k-fold cross-validation of the data samples shows above 89% performance in predicting health-shocks based on the given factors.  相似文献   

20.
在关联规则数据挖掘中采用二进制系统易于产生冗余模式。该文提出一种基于二进制事务属性层次划分的两级数据挖掘方法,即MLADM算法。该算法通过高层次模式获取最大可能频繁模式集,在低层次模式中对其进行验证,优先获得长频繁模式。实验结果表明,该算法可以在密集数据集中有效挖掘长模式并避免冗余模式。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号