首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Data mining techniques, extracting patterns from large databases are the processes that focus on the automatic exploration and analysis of large quantities of raw data in order to discover meaningful patterns and rules. In the process of applying the methods, most of the managers who are engaging the business encounter a multitude of rules resulted from the data mining technique. In view of multi-faceted characteristics of such rules, in general, the rules are featured by multiple conflicting criteria that are directly related with the business values, such as, e.g. expected monetary value or incremental monetary value.

In the paper, we present a method for rule prioritization, taking into account the business values which are comprised of objective metric or managers’ subjective judgments. The proposed methodology is an attempt to make synergy with decision analysis techniques for solving problems in the domain of data mining. We believe that this approach would be particularly useful for the business managers who are suffering from rule quality or quantity problems, conflicts between extracted rules, and difficulties of building a consensus in case several managers are involved for the rule selection.  相似文献   


2.
Data mining techniques, extracting patterns from large databases are the processes that focus on the automatic exploration and analysis of large quantities of raw data in order to discover meaningful patterns and rules. In the process of applying the methods, most of the managers who are engaging the business encounter a multitude of rules resulted from the data mining technique. In view of multi-faceted characteristics of such rules, in general, the rules are featured by multiple conflicting criteria that are directly related with the business values, such as, e.g. expected monetary value or incremental monetary value.In the paper, we present a method for rule prioritization, taking into account the business values which are comprised of objective metric or managers’ subjective judgments. The proposed methodology is an attempt to make synergy with decision analysis techniques for solving problems in the domain of data mining. We believe that this approach would be particularly useful for the business managers who are suffering from rule quality or quantity problems, conflicts between extracted rules, and difficulties of building a consensus in case several managers are involved for the rule selection.  相似文献   

3.
针对关联数据集合呈现出的大数据特性和蕴含的语义信息,提出了首先建立关联数据集的模式级链接,再进行关联规则挖掘的方法。在同领域RDF数据集上定义RDF数据项模式并提出数据项模式的产生规则;利用RDF数据查询技术从数据项模式获得RDF数据项集合,进而再推导出特定领域内的关联规则。提出的基于关联数据RDF数据项模式的关联规则挖掘方法将关联规则挖掘扩展到同一领域内的数据集合而不再局限于单一数据集,同时给出了基于Hadoop的大规模RDF数据集上的关联规则挖掘的实现方案。实验结果验证了模式级链接对于关联规则挖掘的价值和所提方法的有效性。  相似文献   

4.
挖掘所关注规则的多策略方法研究   总被引:20,自引:1,他引:19  
通过数据挖掘,从大型数据库中发现了大量规则,如何选取所关注的规则,是知识发现的重要研究内容。该文研究了利用领域知识对规则的主观关注程度进行度量的方法,给出了一个能够度量规则的简洁性和新奇性的客观关注程度的计算函数,提出了选取用户关注的规则的多策略方法。  相似文献   

5.
Market basket analysis is one of the typical applications in mining association rules. The valuable information discovered from data mining can be used to support decision making. Generally, support and confidence (objective) measures are used to evaluate the interestingness of association rules. However, in some cases, by using these two measures, the discovered rules may be not profitable and not actionable (not interesting) to enterprises. Therefore, how to discover the patterns by considering both objective measures (e.g. probability) and subjective measures (e.g. profit) is a challenge in data mining, particularly in marketing applications. This paper focuses on pattern evaluation in the process of knowledge discovery by using the concept of profit mining. Data Envelopment Analysis is utilized to calculate the efficiency of discovered association rules with multiple objective and subjective measures. After evaluating the efficiency of association rules, they are categorized into two classes, relatively efficient (interesting) and relatively inefficient (uninteresting). To classify these two classes, Decision Tree (DT)‐based classifier is built by using the attributes of association rules. The DT classifier can be used to find out the characteristics of interesting association rules, and to classify the unknown (new) association rules.  相似文献   

6.
To date, association rule mining has mainly focused on the discovery of frequent patterns. Nevertheless, it is often interesting to focus on those that do not frequently occur. Existing algorithms for mining this kind of infrequent patterns are mainly based on exhaustive search methods and can be applied only over categorical domains. In a previous work, the use of grammar-guided genetic programming for the discovery of frequent association rules was introduced, showing that this proposal was competitive in terms of scalability, expressiveness, flexibility and the ability to restrict the search space. The goal of this work is to demonstrate that this proposal is also appropriate for the discovery of rare association rules. This approach allows one to obtain solutions within specified time limits and does not require large amounts of memory, as current algorithms do. It also provides mechanisms to discard noise from the rare association rule set by applying four different and specific fitness functions, which are compared and studied in depth. Finally, this approach is compared with other existing algorithms for mining rare association rules, and an analysis of the mined rules is performed. As a result, this approach mines rare rules in a homogeneous and low execution time. The experimental study shows that this proposal obtains a small and accurate set of rules close to the size specified by the data miner.  相似文献   

7.
Mining association rules is most commonly seen among the techniques for knowledge discovery from databases (KDD). It is used to discover relationships among items or itemsets. Furthermore, temporal data mining is concerned with the analysis of temporal data and the discovery of temporal patterns and regularities. In this paper, a new concept of up-to-date patterns is proposed, which is a hybrid of the association rules and temporal mining. An itemset may not be frequent (large) for an entire database but may be large up-to-date since the items seldom occurring early may often occur lately. An up-to-date pattern is thus composed of an itemset and its up-to-date lifetime, in which the user-defined minimum-support threshold must be satisfied. The proposed approach can mine more useful large itemsets than the conventional ones which discover large itemsets valid only for the entire database. Experimental results show that the proposed algorithm is more effective than the traditional ones in discovering such up-to-date temporal patterns especially when the minimum-support threshold is high.  相似文献   

8.
影响关联规则挖掘的有趣性因素的研究   总被引:7,自引:2,他引:7  
关联规则挖掘是数据挖掘研究中的一个重要方面,而其中一个重要问题是对挖掘出的规则的感兴趣程度的评估。实际应用中可从数据源中挖掘出大量的规则,但这些规则中的大部分对用户来说是不一定感兴趣的。关联规则挖掘中的有趣性问题可从客观和主观两个方面对关联规则的兴趣度进行评测。利用模板将用户感兴趣的规则和不感兴趣的规则区分开,以此来完成关联规则有趣性的主观评测;在关联规则的置信度和支持度基础上对关联规则的有趣性的客观评测增加了约束。  相似文献   

9.
Mining frequent arrangements of temporal intervals   总被引:3,自引:3,他引:0  
The problem of discovering frequent arrangements of temporal intervals is studied. It is assumed that the database consists of sequences of events, where an event occurs during a time-interval. The goal is to mine temporal arrangements of event intervals that appear frequently in the database. The motivation of this work is the observation that in practice most events are not instantaneous but occur over a period of time and different events may occur concurrently. Thus, there are many practical applications that require mining such temporal correlations between intervals including the linguistic analysis of annotated data from American Sign Language as well as network and biological data. Three efficient methods to find frequent arrangements of temporal intervals are described; the first two are tree-based and use breadth and depth first search to mine the set of frequent arrangements, whereas the third one is prefix-based. The above methods apply efficient pruning techniques that include a set of constraints that add user-controlled focus into the mining process. Moreover, based on the extracted patterns a standard method for mining association rules is employed that applies different interestingness measures to evaluate the significance of the discovered patterns and rules. The performance of the proposed algorithms is evaluated and compared with other approaches on real (American Sign Language annotations and network data) and large synthetic datasets.  相似文献   

10.
The image mining technique deals with the extraction of implicit knowledge and image with data relationship or other patterns not explicitly stored in the images. It is an extension of data mining to image domain. The main objective of this paper is to apply image mining in the domain such as breast mammograms to classify and detect the cancerous tissue. Mammogram image can be classified into normal, benign, and malignant class. Total of 26 features including histogram intensity features and gray-level co-occurrence matrix features are extracted from mammogram images. A hybrid approach of feature selection is proposed, which approximately reduces 75% of the features, and new decision tree is used for classification. The most interesting one is that branch and bound algorithm that is used for feature selection provides the best optimal features and no where it is applied or used for gray-level co-occurrence matrix feature selection from mammogram. Experiments have been taken for a data set of 300 images taken from MIAS of different types with the aim of improving the accuracy by generating minimum number of rules to cover more patterns. The accuracy obtained by this method is approximately 97.7%, which is highly encouraging.  相似文献   

11.
基于层次关联规则的日志本体事件领域关系学习*   总被引:3,自引:1,他引:2  
孙明  陈波  周明天 《计算机应用研究》2009,26(10):3683-3686
为发现Web 使用记录中潜在的用户访问行为,提出了一种基于层次关联规则的日志本体事件领域关系学习方法。该方法利用日志本体中复合事件与原子事件之间的整分关系确定事务粒度,将关联规则挖掘算法扩展到事件层次结构上以发现候选频繁用户使用规则,在此基础上修剪冗余和无效的规则后抽取出事件间潜在的领域关系,达到丰富日志本体的目的。最后进行仿真实验,实验结果表明了该方法的可行性和有效性。  相似文献   

12.
A recommender system is an approach performed by e-commerce for increasing smooth users’ experience. Sequential pattern mining is a technique of data mining used to identify the co-occurrence relationships by taking into account the order of transactions. This work will present the implementation of sequence pattern mining for recommender systems within the domain of e-commerce. This work will execute the Systolic tree algorithm for mining the frequent patterns to yield feasible rules for the recommender system. The feature selection's objective is to pick a feature subset having the least feature similarity as well as highest relevancy with the target class. This will mitigate the feature vector's dimensionality by eliminating redundant, irrelevant, or noisy data. This work presents a new hybrid recommender system based on optimized feature selection and systolic tree. The features were extracted using Term Frequency-Inverse Document Frequency (TF-IDF), feature selection with the utilization of River Formation Dynamics (RFD), and the Particle Swarm Optimization (PSO) algorithm. The systolic tree is used for pattern mining, and based on this, the recommendations are given. The proposed methods were evaluated using the MovieLens dataset, and the experimental outcomes confirmed the efficiency of the techniques. It was observed that the RFD feature selection with systolic tree frequent pattern mining with collaborative filtering, the precision of 0.89 was achieved.  相似文献   

13.
Mining association rules and mining sequential patterns both are to discover customer purchasing behaviors from a transaction database, such that the quality of business decision can be improved. However, the size of the transaction database can be very large. It is very time consuming to find all the association rules and sequential patterns from a large database, and users may be only interested in some information.

Moreover, the criteria of the discovered association rules and sequential patterns for the user requirements may not be the same. Many uninteresting information for the user requirements can be generated when traditional mining methods are applied. Hence, a data mining language needs to be provided such that users can query only interesting knowledge to them from a large database of customer transactions. In this paper, a data mining language is presented. From the data mining language, users can specify the interested items and the criteria of the association rules or sequential patterns to be discovered. Also, the efficient data mining techniques are proposed to extract the association rules and the sequential patterns according to the user requirements.  相似文献   


14.
关联规则挖掘是数据挖掘研究中的一个重要方面,而其中一个重要问题是对挖掘出的规则的兴趣度的评估,过去的研究发现,在实际应用中往往很容易从数据源中挖掘出大量的规则,但这些规则中的大部分对用户来说是不感兴趣的,本文对规则的兴趣度度量的两个方面作了讨论:一个是主观兴趣度度量,另一个是客观兴趣度度量,最后介绍了如何利用模板进行挖掘有趣的规则。  相似文献   

15.
A large volume of research in temporal data mining is focusing on discovering temporal rules from time-stamped data. The majority of the methods proposed so far have been mainly devoted to the mining of temporal rules which describe relationships between data sequences or instantaneous events and do not consider the presence of complex temporal patterns into the dataset. Such complex patterns, such as trends or up and down behaviors, are often very interesting for the users. In this paper we propose a new kind of temporal association rule and the related extraction algorithm; the learned rules involve complex temporal patterns in both their antecedent and consequent. Within our proposed approach, the user defines a set of complex patterns of interest that constitute the basis for the construction of the temporal rule; such complex patterns are represented and retrieved in the data through the formalism of knowledge-based Temporal Abstractions. An Apriori-like algorithm looks then for meaningful temporal relationships (in particular, precedence temporal relationships) among the complex patterns of interest. The paper presents the results obtained by the rule extraction algorithm on a simulated dataset and on two different datasets related to biomedical applications: the first one concerns the analysis of time series coming from the monitoring of different clinical variables during hemodialysis sessions, while the other one deals with the biological problem of inferring relationships between genes from DNA microarray data.  相似文献   

16.
关联规则挖掘常常会产生大量的规则,这使得用户分析和利用这些规则变得十分困难。为了帮助用户做探索式分析,提出了一种基于距离的相关性关联规则优化方法,该方法从数学分析关联规则相关性概念公式的值的特点出发,通过根据关联规则结构上的相关性差别来挖掘出包括正负两种关联规则在内的更多潜在的相关规则,实验结果表明该方法有效且可靠。  相似文献   

17.
董林  舒红  李莎 《计算机应用研究》2013,30(8):2330-2333
为简化空间频繁模式挖掘的预处理步骤并提高挖掘效率, 提出一种可以直接以空间矢量和栅格图层作为输入的挖掘算法FISA(fast intersect spatial Apriori)。该算法利用图层求交和面积计算操作实现谓词集支持度计数进而实现频繁谓词集和关联规则挖掘。相对于基于事务空间关联规则挖掘算法, FISA不需要预先进行空间数据事务化处理, 并且所得结果均有对应图层, 便于实现结果的可视化; 相对于其他基于空间分析的挖掘算法, FISA支持空间数据的矢量和栅格格式, 且引入了快速求交方法以保证其可伸缩性。实验结果表明该算法可以直接从空间数据中高效正确地挖掘出频繁模式。  相似文献   

18.
In this paper, we propose a method based on association rule-mining to enhance the diagnosis of medical images (mammograms). It combines low-level features automatically extracted from images and high-level knowledge from specialists to search for patterns. Our method analyzes medical images and automatically generates suggestions of diagnoses employing mining of association rules. The suggestions of diagnosis are used to accelerate the image analysis performed by specialists as well as to provide them an alternative to work on. The proposed method uses two new algorithms, PreSAGe and HiCARe. The PreSAGe algorithm combines, in a single step, feature selection and discretization, and reduces the mining complexity. Experiments performed on PreSAGe show that this algorithm is highly suitable to perform feature selection and discretization in medical images. HiCARe is a new associative classifier. The HiCARe algorithm has an important property that makes it unique: it assigns multiple keywords per image to suggest a diagnosis with high values of accuracy. Our method was applied to real datasets, and the results show high sensitivity (up to 95%) and accuracy (up to 92%), allowing us to claim that the use of association rules is a powerful means to assist in the diagnosing task.  相似文献   

19.
Discovery of unapparent association rules based on extracted probability   总被引:1,自引:0,他引:1  
Association rule mining is an important task in data mining. However, not all of the generated rules are interesting, and some unapparent rules may be ignored. We have introduced an “extracted probability” measure in this article. Using this measure, 3 models are presented to modify the confidence of rules. An efficient method based on the support-confidence framework is then developed to generate rules of interest. The adult dataset from the UCI machine learning repository and a database of occupational accidents are analyzed in this article. The analysis reveals that the proposed methods can effectively generate interesting rules from a variety of association rules.  相似文献   

20.
Online mining of fuzzy multidimensional weighted association rules   总被引:1,自引:1,他引:0  
This paper addresses the integration of fuzziness with On-Line Analytical Processing (OLAP) based association rules mining. It contributes to the ongoing research on multidimensional online association rules mining by proposing a general architecture that utilizes a fuzzy data cube for knowledge discovery. A data cube is mainly constructed to provide users with the flexibility to view data from different perspectives as some dimensions of the cube contain multiple levels of abstraction. The first step of the process described in this paper involves introducing fuzzy data cube as a remedy to the problem of handling quantitative values of dimensional attributes in a cube. This facilitates the online mining of fuzzy association rules at different levels within the constructed fuzzy data cube. Then, we investigate combining the concepts of weight and multiple-level to mine fuzzy weighted multi-cross-level association rules from the constructed fuzzy data cube. For this purpose, three different methods are introduced for single dimension, multidimensional and hybrid (integrates the other two methods) fuzzy weighted association rules mining. Each of the three methods utilizes a fuzzy data cube constructed to suite the particular method. To the best of our knowledge, this is the first effort in this direction. We compared the proposed approach to an existing approach that does not utilize fuzziness. Experimental results obtained for each of the three methods on a synthetic dataset and on the adult data of the United States census in year 2000 demonstrate the effectiveness and applicability of the proposed fuzzy OLAP based mining approach. OLAP is one of the most popular tools for on-line, fast and effective multidimensional data analysis. In the OLAP framework, data is mainly stored in data hypercubes (simply called cubes).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号