关联规则是数据挖掘的一种常用方法,特别是用在货篮分析中,而关联规则的经典算法Apriori及其改进算法的时间复杂度和空间复杂度都比较高,对于数据库更新、用户定义最小支持度等动态数据挖掘的成本太高。针对这种情况,提出了用PC-树寻找频繁项集的算法,实现高效的动态数据挖掘。  相似文献   

传统序列模式挖掘算法往往忽略了序列模式本身的时间特性,所考查的序列项都是单一事件,无属性约束.提出了一种挖掘多属性约束事件序列关联规则的方法.此方法基于传统的Apriori和AprioriAU算法.考虑了应用环境下事件序列模式中事件之间的过渡时间,采用分层式挖掘思想,先挖掘频繁序列模式,然后从频繁事件序列中挖掘多属性约束项的关联规则.实例分析为挖掘带时间限多属性约束的序列模式提供了实施思路.  相似文献   

传统的关联规则挖掘算法对更新的数据集按平等一致的方式加以处理。提出了一种新的增量关联规则挖掘算法:引入了多级加权的更新关联规则模式,对最近更新的数据集在挖掘算法中赋予较高的权值,提高其对挖掘结果的影响力度。使得最近更新数据对当前决策拥有更高的兴趣因子得以充分体现。实验表明,该算法拥有较高灵敏度,并能及时响应事务集的发展趋势,从而为决策者提供及时、准确的信息。  相似文献   

数据挖掘是一种新兴的信息处理技术,本文将其中的关联规则运用到中药化学数据的处理,对其中的中医药效、植物科属、化学成分的活性、中药提取物现代药理等数据进行了维间关联规则的挖掘,找到了一系列的强规则,并对这些规则进行了分析,得到了其中有趣的关联规则,同时该关联规则的结果也说明了中药和西药在药效概念上的差异。该结果对于中药现代化,植物化学等相关的研究提供了一种新的思路。  相似文献   

关联规则挖掘主要用于发现事务数据集中项与项之间的关系,现有的关联规则挖掘算法多是挖掘一种静态的关联规则,实际上规则随着时间的推移可能会有很大变化,为规则建立元规则对其支持度和置信度变化趋势进行分析和预测,有利于进一步指导挖掘和决策。通过一个实例介绍了一种基于马尔可夫模型的预测和分析的元规则的具体方法,并通过与其他方法的对比说明它是一个合理的模型。  相似文献   

挖掘Web日志中的分类关联规则   总被引:1,自引:0,他引:1       下载免费PDF全文
用户分类是Web访问模式挖掘研究的一个重要任务。提出一种应用关联分类技术对Web用户进行分类的方法:首先通过对Web日志文件预处理得到训练事务数据集,然后从该事务集中挖掘分类关联规则,并利用所挖掘的规则集构建了一个分类器,从而实现了根据用户访问历史对用户进行分类。  相似文献   

为提高语义图像分类器性能,提出一种基于公理化模糊集的语义图像层次关联规则分类器。首先,为提高算法精度,在对图像数据集进行特征提取基础上,采用公理化理论(AFS)构建图像集模糊概念的AFS属性表达,提高图像集属性辨识度;其次,为提高算法计算效率,考虑采用层次结构关联规则,构建语义图像分类器,利用概念之间的本体信息,提高并行分类能力;最后,通过对算法参数及横向对比实验,显示所提算法具有较高的计算精度和计算效率。  相似文献   

The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying indicators to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder if a so-called “interesting” rule noted LHSRHS is meaningful when 30% of the LHS data are not up-to-date anymore, 20% of the RHS data are not accurate, and 15% of the LHS data come from a data source that is well-known for its bad credibility. This paper presents an overview of data quality characterization and management techniques that can be advantageously employed for improving the quality awareness of the knowledge discovery and data mining processes. We propose to integrate data quality indicators for quality aware association rule mining. We propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-Cup-98 datasets show that variations on data quality have a great impact on the cost and quality of discovered association rules and confirm our approach for the integrated management of data quality indicators into the KDD process that ensure the quality of data mining results.  相似文献   

This paper is concerned with a pairs trading rule. The idea is to monitor two historically correlated securities. When divergence is underway, i.e., one stock moves up while the other moves down, a pairs trade is entered which consists of a pair to short the outperforming stock and to long the underperforming one. Such a strategy bets the “spread” between the two would eventually converge. In this paper, a difference of the pair is governed by a mean-reverting model. The objective is to trade the pair so as to maximize an overall return. A fixed commission cost is charged with each transaction. In addition, a stop-loss limit is imposed as a state constraint. The associated HJB equations (quasi-variational inequalities) are used to characterize the value functions. It is shown that the solution to the optimal stopping problem can be obtained by solving a number of quasi-algebraic equations. We provide a set of sufficient conditions in terms of a verification theorem. Numerical examples are reported to demonstrate the results.  相似文献   

On optimal rule discovery   总被引:4,自引:0,他引:4  
In machine learning and data mining, heuristic and association rules are two dominant schemes for rule discovery. Heuristic rule discovery usually produces a small set of accurate rules, but fails to find many globally optimal rules. Association rule discovery generates all rules satisfying some constraints, but yields too many rules and is infeasible when the minimum support is small. Here, we present a unified framework for the discovery of a family of optimal rule sets and characterize the relationships with other rule-discovery schemes such as nonredundant association rule discovery. We theoretically and empirically show that optimal rule discovery is significantly more efficient than association rule discovery independent of data structure and implementation. Optimal rule discovery is an efficient alternative to association rule discovery, especially when the minimum support is low.  相似文献   

基于支持度的关联规则挖掘算法无法找到那些非频繁但效用很高的项集,基于效用的关联规则会漏掉那些效用不高但发生比较频繁、支持度和效用值的积(激励)很大的项集。提出了基于激励的关联规则挖掘问题及一种自下而上的挖掘算法HM-miner。激励综合了支持度与效用的优点,能同时度量项集的统计重要性和语义重要性。HM-miner利用激励的上界特性进行减枝,能有效挖掘高激励项集。  相似文献   

针对置信规则中规则数的"组合爆炸"问题,目前的解决方法主要是基于特征提取的规则约简方法,有效性依赖于专家知识.鉴于此,提出基于粗糙集理论的无需依赖规则库以外知识的客观方法,按照等价类划分思想逐条分析置信规则,进而消除冗余的候选值.最后,以装甲装备能力评估作为实例进行分析,分别从规则约简数、决策准确性方面与具有代表性的主观方法进行对比,结果表明,所提出方法是有效可行的,且优于现有规则约简主观方法.  相似文献   

现有的基于频繁模式树FP-tree和概念格的规则挖掘算法在构造概念格时存在重复遍历FP-tree问题,在挖掘后件约束的规则时算法构造的概念格包含冗余结点。针对这两个问题,提出了通过遍历FP-tree生成候选概念格节点的策略,并根据候选概念格节点进一步构造规则约束条件下无冗余概念格。通过实际项目中大气腐蚀数据进行算法的应用,结果表明该算法比现有算法具有更高的挖掘效率且腐蚀规则结果对材料腐蚀现状研究具有重要指导价值。  相似文献   

为了更好地获取由边界域产生的不确定性规则知识,提出最优近似粗糙集的属性约简方法,为此给出了近似空间上粗糙集最优近似集的判定与计算,引入最优近似分布协调集、最优近似分布约简概念.讨论了Pawlak属性约简、分布约简、最优近似分布约简之间关系,得到在协调决策表中它们是等价的,在不协调决策表中最优近似分布约简是分布约简子集.最后通过实例进行了验证与说明  相似文献   

The primary goal of the research reported in this paper is to identify what criteria are responsible for the good performance of a heuristic rule evaluation function in a greedy top-down covering algorithm. We first argue that search heuristics for inductive rule learning algorithms typically trade off consistency and coverage, and we investigate this trade-off by determining optimal parameter settings for five different parametrized heuristics. In order to avoid biasing our study by known functional families, we also investigate the potential of using metalearning for obtaining alternative rule learning heuristics. The key results of this experimental study are not only practical default values for commonly used heuristics and a broad comparative evaluation of known and novel rule learning heuristics, but we also gain theoretical insights into factors that are responsible for a good performance. For example, we observe that consistency should be weighted more heavily than coverage, presumably because a lack of coverage can later be corrected by learning additional rules.  相似文献   

The lift of an association rule is frequently used, both in itself and as a component in formulae, to gauge the interestingness of a rule. The range of values that lift may take is used to standardise lift so that it is more effective as a measure of interestingness. This standardisation is extended to account for minimum support and confidence thresholds. A method of visualising standardised lift, through the relationship between lift and its upper and lower bounds, is proposed. The application of standardised lift as a measure of interestingness is demonstrated on college application data and social questionnaire data. In the latter case, negations are introduced into the mining paradigm and an argument for this inclusion is put forward. This argument includes a quantification of the number of extra rules that arise when negations are considered.  相似文献   

This paper introduces a new approach to a problem of data sharing among multiple parties, without disclosing the data between the parties. Our focus is data sharing among parties involved in a data mining task. We study how to share private or confidential data in the following scenario: multiple parties, each having a private data set, want to collaboratively conduct association rule mining without disclosing their private data to each other or any other parties. To tackle this demanding problem, we develop a secure protocol for multiple parties to conduct the desired computation. The solution is distributed, i.e., there is no central, trusted party having access to all the data. Instead, we define a protocol using homomorphic encryption techniques to exchange the data while keeping it private.  相似文献   

曾安平 《计算机应用》2012,32(8):2198-2201
针对传统关联规则算法产生的规则关联性弱、种类少的缺点,结合Spearman秩相关系数,提出了一种多类关联算法。该算法在传统算法产生的强规则基础上,利用Spearman秩相关方法计算出规则中产品间的同步异步等相关性。将其作为兴趣度阈值,算法可同时产生同步正规则、异步正规则、同步负规则和异步负规则四类关联规则,且规则间联系紧密。实验结果表明了算法的有效性和优越性。  相似文献   

The optimal representative set selection problem is defined thus: given a set of test requirements and a test suite that satisfies all test requirements, find a subset of the test suite containing a minimum number of test cases that still satisfies all test requirements. Existing methods for solving the representative set selection problem do not guarantee that obtained representative sets are optimal (i.e. minimal). The enhanced zero–one optimal path set selection method [C.G. Chung, J.G. Lee, An enhanced zero–one optimal path set selection method, Journal of Systems and Software, 39(2) (1997) 145–164] solves the so-called optimal path set selection problem, and can be adapted to solve the optimal representative set selection problem by considering paths as test cases and components to be covered (e.g. branches) as test requirements.  相似文献   

在正交约束条件下,求使Fisher准则判别函数式取极大值的向量,这样的最优判别向量就是F-S最优判别向量集。基于Fisher判别准则函数式,提出了一种无约束的最优判别矢量集,并给出了求解算法。另外,当训练样本矢量数小于样本矢量维数(即小样本问题),类内散布矩阵奇异,为了使它非奇异,采取对样本进行降维的措施,那维数至少要降到多少维才能确保它非奇异,给出了计算公式。实验结果表明鉴别矢量集有良好的分类能力。  相似文献   

