首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
During electronic commerce (EC) environment, how to effectively mine the useful transaction information will be an important issue to be addressed in designing the marketing strategy for most enterprises. Especially, the relationships between different databases (e.g., the transaction and online browsing database) may have the unknown and potential knowledge of business intelligence. Two important issues of mining association rules were mentioned to address EC application in this study. The first issue is the discovery of generalized fuzzy association rules in the transaction database. The second issue is to discover association rules from the web usage data and the large itemsets identified in the transaction database. A cluster-based fuzzy association rules (CBFAR) mining architecture is then proposed to simultaneously address such two issues in this study. Three contributions were achieved as: (a) an efficient fuzzy association rule miner based on cluster-based fuzzy-sets tables is presented to identify all the large fuzzy itemsets; (b) this approach requires less contrast to generate large itemsets; (3) a fuzzy rule mining approach is used to compute the confidence values for discovering the relationships between transaction database and browsing information database. Finally, a simulated example during EC environment is provided to demonstrate the rationality and feasibility of the proposed approach.  相似文献   

2.
Cluster discovery is an essential part of many data mining applications. While cluster discovery process is mainly unsupervised in nature, it can often be aided by a small amount of labeled data. A probabilistic model on the clustering structure is adopted and a novel unified energy equation for clustering that incorporates both labeled data and unlabeled data is introduced. This formulation is inspired by a force-field model integrating labeling constraint on labeled data and similarity information on unlabeled data for joint estimation. Experimental results show that good clusters can be identified using small amount of labeled data.  相似文献   

3.
In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.  相似文献   

4.
Techniques for mining information from distributed data sources accessible over the Internet are a growing area of research. The mobile Agent paradigm opens a new door for distributed data mining and knowledge discovery applications. In this paper we present the design of a mobile agent system which couples service discovery, using a logical language based application programming interface, and database access. Combining mobility with database access provides a means to create more efficient data mining applications. The processing of data is moved to network wide data locations instead of the traditional approach of bringing huge amount of data to the processing location. Our proposal aims at implementing system tools that will enable intelligent mobile Agents to roam the Internet searching for distributed data services. Agents access the data, discover patterns, extract useful information from facts recorded in the databases, then communicate local results back to the user. The user then generates a global data model through the aggregation of results provided by all Agents. This overcomes barriers posed by network congestion, poor security, and unreliability.  相似文献   

5.
序列模式挖掘算法研究   总被引:5,自引:0,他引:5  
数据挖掘领域一个活跃的研究分支就是序列模式的发现,即在序列数据库中找出所有的频繁子序列。目前的序列模式挖掘方法主要分为两类,一类是候选集生成-测试方法;另一类是模式扩展方法。先介绍序列模式挖掘中的基本概念,然后描述几个重要算法,最后给出性能分析。  相似文献   

6.
Data mining in the form of rule discovery is a growing field of investigation. A recent addition to this field is the use of evolutionary algorithms in the mining process. While this has been used extensively in the traditional mining of relational databases, it has hardly, if at all, been used in mining sequences and time series. In this paper we describe our method for evolutionary sequence mining, using a specialized piece of hardware for rule evaluation, and show how the method can be applied to several different mining tasks, such as supervised sequence prediction, unsupervised mining of interesting rules, discovering connections between separate time series, and investigating tradeoffs between contradictory objectives by using multiobjective evolution.  相似文献   

7.
High utility pattern (HUP) mining is one of the most important research issues in data mining. Although HUP mining extracts important knowledge from databases, it requires long calculations and multiple database scans. Therefore, HUP mining is often unsuitable for real-time data processing schemes such as data streams. Furthermore, many HUPs may be unimportant due to the poor correlations among the items inside of them. Hence,the fast discovery of fewer but more important HUPs would be very useful in many practical domains. In this paper, we propose a novel framework to introduce a very useful measure, called frequency affinity, among the items in a HUP and the concept of interesting HUP with a strong frequency affinity for the fast discovery of more applicable knowledge. Moreover, we propose a new tree structure, utility tree based on frequency affinity (UTFA), and a novel algorithm, high utility interesting pattern mining (HUIPM), for single-pass mining of HUIPs from a database. Our approach mines fewer but more valuable HUPs, significantly reduces the overall runtime of existing HUP mining algorithms and is applicable to real-time data processing. Extensive performance analyses show that the proposed HUIPM algorithm is very efficient and scalable for interesting HUP mining with a strong frequency affinity.  相似文献   

8.
It is frequently the case that data mining is carried out in an environment which contains noisy and missing data. This is particularly likely to be true when the data were originally collected for different purposes, as is commonly the case in data warehousing. In this paper we discuss the use of domain knowledge, e.g., integrity constraints or a concept hierarchy, to re‐engineer the database and allocate sets to which missing or unacceptable outlying data may belong. Attribute‐oriented knowledge discovery has proved to be a powerful approach for mining multi‐level data in large databases. Such methods are set‐oriented in that attribute values are considered to belong to subsets of the domain. These subsets may be provided directly by the database or derived from a knowledge base using inductive logic programming to re‐engineer the database. In this paper we develop an algorithm which allows us to aggregate imprecise data and use it for multi‐level rule induction and knowledge discovery. ©2000 John Wiley & Sons, Inc.  相似文献   

9.
Mathematical Programming in Data Mining   总被引:14,自引:0,他引:14  
Mathematical programming approaches to three fundamental problems will be described: feature selection, clustering and robust representation. The feature selection problem considered is that of discriminating between two sets while recognizing irrelevant and redundant features and suppressing them. This creates a lean model that often generalizes better to new unseen data. Computational results on real data confirm improved generalization of leaner models. Clustering is exemplified by the unsupervised learning of patterns and clusters that may exist in a given database and is a useful tool for knowledge discovery in databases (KDD). A mathematical programming formulation of this problem is proposed that is theoretically justifiable and computationally implementable in a finite number of steps. A resulting k-Median Algorithm is utilized to discover very useful survival curves for breast cancer patients from a medical database. Robust representation is concerned with minimizing trained model degradation when applied to new problems. A novel approach is proposed that purposely tolerates a small error in the training process in order to avoid overfitting data that may contain errors. Examples of applications of these concepts are given.  相似文献   

10.
Many applications of knowledge discovery and data mining such as rule discovery for semantic query optimization, database integration and decision support, require the knowledge to be consistent with the data. However, databases usually change over time and make machine-discovered knowledge inconsistent. Useful knowledge should be robust against database changes so that it is unlikely to become inconsistent after database updates. This paper defines this notion of robustness in the context of relational databases and describes how robustness of first-order Horn-clause rules can be estimated. Experimental results show that our estimation approach can accurately identify robust rules. We also present a rule antecedent pruning algorithm that improves the robustness and applicability of machine discovered rules to demonstrate the usefulness of robustness estimation.  相似文献   

11.
Database mining: a performance perspective   总被引:12,自引:0,他引:12  
The authors' perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology is presented. Three classes of database mining problems involving classification, associations, and sequences are described. It is argued that these problems can be uniformly viewed as requiring discovery of rules embedded in massive amounts of data. A model and some basic operations for the process of rule discovery are described. It is shown how the database mining problems considered map to this model, and how they can be solved by using the basic operations proposed. An example is given of an algorithm for classification obtained by combining the basic rule discovery operations. This algorithm is efficient in discovering classification rules and has accuracy comparable to ID3, one of the best current classifiers  相似文献   

12.
旨在探讨一种新的Web访问信息挖掘方法及其算法。分析一般知识发现与Web挖掘方法,提出常见的基于Weblog访问信息的数据挖掘方法存在的局限性,在此基础之上,提出了一种基于自建访问信息收集库的Web访问信息挖掘方法,详细阐述了其中的知识仓库导航页面集与用户访问行为的描述方法,以及访问事务库的挖掘算法。该新方法具有更简便、灵活、高效的特点。  相似文献   

13.
本文探讨基于关联规则挖掘的中文网页体裁模式发现问题。通过链表结构,将文档集转换为适用于关联规则挖掘的事务数据库,保证了事务数据库出现的词条项按照在文本中出现的顺序排列,实现了Apriori关联规则算法。实验结果表明,这对于某些类别的体裁模式发现有比较好的效果。  相似文献   

14.
Mining association rules is most commonly seen among the techniques for knowledge discovery from databases (KDD). It is used to discover relationships among items or itemsets. Furthermore, temporal data mining is concerned with the analysis of temporal data and the discovery of temporal patterns and regularities. In this paper, a new concept of up-to-date patterns is proposed, which is a hybrid of the association rules and temporal mining. An itemset may not be frequent (large) for an entire database but may be large up-to-date since the items seldom occurring early may often occur lately. An up-to-date pattern is thus composed of an itemset and its up-to-date lifetime, in which the user-defined minimum-support threshold must be satisfied. The proposed approach can mine more useful large itemsets than the conventional ones which discover large itemsets valid only for the entire database. Experimental results show that the proposed algorithm is more effective than the traditional ones in discovering such up-to-date temporal patterns especially when the minimum-support threshold is high.  相似文献   

15.
数据库知识发现的基本思想就是从数据中抽取有价值的信息,其目的是帮助决策者寻找数据间潜在的关联,发现被忽略的要素,而这些信息对预测趋势和决策行为也许是十分有帮助的。时序数据挖掘是数据库知识发现研究中的重要分支之一。趋势分析与相似搜索是时序数据挖掘的主要技术与方法。通过趋势分析,可以制定出比较合理的长期或短期预测,从而为科学决策提供有效的依据;而在相似搜索中,采用了模糊匹配技术,符合人脑思维特性,因而更合理有效。  相似文献   

16.
In the field of data mining, an important issue for association rules generation is frequent itemset discovery, which is the key factor in implementing association rule mining. Therefore, this study considers the user’s assigned constraints in the mining process. Constraint-based mining enables users to concentrate on mining itemsets that are interesting to themselves, which improves the efficiency of mining tasks. In addition, in the real world, users may prefer recording more than one attribute and setting multi-dimensional constraints. Thus, this study intends to solve the multi-dimensional constraints problem for association rules generation.The ant colony system (ACS) is one of the newest meta-heuristics for combinatorial optimization problems, and this study uses the ant colony system to mine a large database to find the association rules effectively. If this system can consider multi-dimensional constraints, the association rules will be generated more effectively. Therefore, this study proposes a novel approach of applying the ant colony system for extracting the association rules from the database. In addition, the multi-dimensional constraints are taken into account. The results using a real case, the National Health Insurance Research Database, show that the proposed method is able to provide more condensed rules than the Apriori method. The computational time is also reduced.  相似文献   

17.
The automatic discovery of classes of errors that represent misconceptions and other knowledge errors underlying discrepancies in novice behavior is not a trivial task. A novel approach to this problem is described, in which relationships among behavioral discrepancies are analyzed and inductively generalized via an unsupervised, incremental, relational multistrategy conceptual clustering method that takes into account similarities as well as causalities in the data. Performance results on the classification of discrepancy sets and discovery of error classes from discrepancies of buggy PROLOG programs demonstrate the potential of the approach.  相似文献   

18.
19.
F.  P.  M.  R.  A.  G.  P.  S.  B.  D.  G.  D.   《Data & Knowledge Engineering》2008,67(3):463-484
Discovering frequent patterns in large databases is one of the most studied problems in data mining, since it can yield substantial commercial benefits. However, some sensitive patterns with security considerations may compromise privacy. In this paper, we aim to determine appropriate balance between need for privacy and information discovery in frequent patterns. A novel method to modify databases for hiding sensitive patterns is proposed in this paper. Multiplying the original database by a sanitization matrix yields a sanitized database with private content. In addition, two probabilities are introduced to oppose against the recovery of sensitive patterns and to reduce the degree of hiding non-sensitive patterns in the sanitized database. The complexity analysis and the security discussion of the proposed sanitization process are provided. The results from a series of experiments performed to show the efficiency and effectiveness of this approach are described.  相似文献   

20.
Mobile context modeling is a process of recognizing and reasoning about contexts and situations in a mobile environment, which is critical for the success of context-aware mobile services. While there are prior works on mobile context modeling, the use of unsupervised learning techniques for mobile context modeling is still under-explored. Indeed, unsupervised techniques have the ability to learn personalized contexts, which are difficult to be predefined. To that end, in this paper, we propose an unsupervised approach to modeling personalized contexts of mobile users. Along this line, we first segment the raw context data sequences of mobile users into context sessions where a context session contains a group of adjacent context records which are mutually similar and usually reflect the similar contexts. Then, we exploit two methods for mining personalized contexts from context sessions. The first method is to cluster context sessions and then to extract the frequent contextual feature-value pairs from context session clusters as contexts. The second method leverages topic models to learn personalized contexts in the form of probabilistic distributions of raw context data from the context sessions. Finally, experimental results on real-world data show that the proposed approach is efficient and effective for mining personalized contexts of mobile users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号