首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
为了在事务数据库中发现关联规则,在现实挖掘应用中,经常采用不同的标准去判断不同项目的重要性,管理项目之间的分类关系和处理定量数据集这3个方法去处理问题,因此提出一个在定量事务数据库中采用多最小支持度,在项目集中获取隐含知识的多层模糊关联规则挖掘算法。该挖掘算法使用两种支持度约束和至上而下逐步细化的方法推导出频繁项集,同时可以发现交叉层次的模糊关联规则。通过实例证明了该挖掘算法在多最小支持度约束下推导出的多层模糊关联规则是易于理解和有意义的,具有很好的效率和伸缩性。  相似文献   

2.
Concurrency control (CC) algorithms guarantee the correctness and consistency criteria for concurrent execution of a set of transactions in a database. A precondition that is seen in many CC algorithms is that the writeset (WS) and readset (RS) of transactions should be known before the transaction execution. However, in real operational environments, we know the WS and RS only for a fraction of transaction set before execution. However, optional knowledge about WS and RS of transactions is one of the advantages of the proposed CC algorithm in this paper. If the WS and RS are known before the transaction execution, the proposed algorithm will use them to improve the concurrency and performance. On the other hand, the concurrency control algorithms often use a specific static or dynamic equation in making decision about granting a lock or detection of the winner transaction. The proposed algorithm in this paper uses an adaptive resonance theory (ART)-based neural network for such a decision making. In this way, a parameter called health factor (HF) is defined for transactions that is used for comparing the transactions and detecting the winner one in accessing the database objects. HF is calculated using ART2 neural network. Experimental results show that the proposed neural-based CC (NCC) algorithm increases the level of concurrency by decreasing the number of aborts. The performance of proposed algorithm is compared with strict two-phase locking (S2PL) algorithm, which has been used in most commercial database systems. Simulation results show that the performance of proposed NCC algorithm, in terms of number of aborts, is better than S2PL algorithm in different transaction rates.  相似文献   

3.
Incrementally fast updated frequent pattern trees   总被引:3,自引:0,他引:3  
The frequent-pattern-tree (FP-tree) is an efficient data structure for association-rule mining without generation of candidate itemsets. It was used to compress a database into a tree structure which stored only large items. It, however, needed to process all transactions in a batch way. In real-world applications, new transactions are usually inserted into databases. In this paper, we thus attempt to modify the FP-tree construction algorithm for efficiently handling new transactions. A fast updated FP-tree (FUFP-tree) structure is proposed, which makes the tree update process become easier. An incremental FUFP-tree maintenance algorithm is also proposed for reducing the execution time in reconstructing the tree when new transactions are inserted. Experimental results also show that the proposed FUFP-tree maintenance algorithm runs faster than the batch FP-tree construction algorithm for handling new transactions and generates nearly the same tree structure as the FP-tree algorithm. The proposed approach can thus achieve a good trade-off between execution time and tree complexity.  相似文献   

4.
We examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that, given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items that appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first, and then, identifying, within this candidate set, these itemsets that meet the large itemset requirement. Generally, this is done iteratively for each large k-itemset in increasing order of k, where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate sets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we develop an effective algorithm for the candidate set generation. It is a hash-based algorithm and is especially effective for the generation of a candidate set for large 2-itemsets. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. The advantage of the proposed algorithm also provides us the opportunity of reducing the amount of disk I/O required. An extensive simulation study is conducted to evaluate performance of the proposed algorithm  相似文献   

5.
A new mining approach for uncertain databases using CUFP trees   总被引:1,自引:0,他引:1  
In the past, many algorithms have been proposed to mine frequent itemsets from transactional databases, in which the presence or absence of items in transactions was certainly known. In some applications, items may also be uncertain in transactions with their existential probabilities ranging from 0 to 1 in the uncertain dataset. Apparently, the processing in uncertain datasets is quite different from those in certain datasets. The UF-tree algorithm was proposed to construct the UF-tree structure from an uncertain dataset and mine frequent itemsets from the tree. In the UF-tree construction process, however, only the same items with the same existential probabilities in transactions were merged together in the tree, thus causing many redundant nodes in the tree. In this paper, a new tree structure called the compressed uncertain frequent-pattern tree (CUFP tree) is designed to efficiently keep the related information in the mining process. In the CUFP tree, the same items will be merged in a branch of the tree even when the existential probabilities in transactions are not the same. A mining algorithm called the CUFP-mine algorithm is then proposed based on the tree structure to find uncertain frequent patterns. Experimental results show that the proposed approach has a better performance than UF-tree algorithm both in the execution time and in the number of tree nodes.  相似文献   

6.
大规模交易数据库的一种有效聚类算法   总被引:13,自引:0,他引:13  
陈宁  陈安  周龙骧 《软件学报》2001,12(4):475-484
研究大规模交易数据库的聚类问题,提出了一种二次聚类算法——CATD.该算法首先将数据库划分成若干分区,在每个分区内利用层次聚类算法进行局部聚类,把交易初步划分成若干亚聚类,亚聚类的个数由聚类间的距离参数控制.然后对所有的亚聚类进行全局聚类,同时识别出噪声.由于采用了分区方法和聚类的支持向量表示法,该算法只需扫描一次数据库,聚类过程在内存中进行,因此能处理大规模的数据库.  相似文献   

7.
Classical data mining algorithms require expensive passes over the entire database to generate frequent items and hence to generate association rules. With the increase in the size of database, it is becoming very difficult to handle large amount of data for computation. One of the solutions to this problem is to generate sample from the database that acts as representative of the entire database for finding association rules in such a way that the distance of the sample from the complete database is minimal. Choosing correct sample that could represent data is not an easy task. Many algorithms have been proposed in the past. Some of them are computationally fast while others give better accuracy. In this paper, we present an algorithm for generating a sample from the database that can replace the entire database for generating association rules and is aimed at keeping a balance between accuracy and speed. The algorithm that is proposed takes into account the average number of small, medium and large 1-itemset in the database and average weight of the transactions to define threshold condition for the transactions. Set of transactions that satisfy the threshold condition is chosen as the representative for the entire database. The effectiveness of the proposed algorithm has been tested over several runs of database generated by IBM synthetic data generator. A vivid comparative performance evaluation of the proposed technique with the existing sampling techniques for comparing the accuracy and speed has also been carried out.  相似文献   

8.
Mining associations with the collective strength approach   总被引:1,自引:0,他引:1  
The large itemset model has been proposed in the literature for finding associations in a large database of sales transactions. A different method for evaluating and finding itemsets referred to as strongly collective itemsets is proposed. We propose a criterion stressing the importance of the actual correlation of the items with one another rather than their absolute level of presence. Previous techniques for finding correlated itemsets are not necessarily applicable to very large databases. We provide an algorithm which provides very good computational efficiency, while maintaining statistical robustness. The fact that this algorithm relies on relative measures rather than absolute measures such as support also implies that the method can be applied to find association rules in data sets in which items may appear in a sizeable percentage of the transactions (dense data sets), data sets in which the items have varying density, or even negative association rules  相似文献   

9.
A concept lattice is an ordered structure between concepts. It is particularly effective in mining association rules. However, a concept lattice is not efficient for large databases because the lattice size increases with the number of transactions. Finding an efficient strategy for dynamically updating the lattice is an important issue for real-world applications, where new transactions are constantly inserted into databases. To build an efficient storage structure for mining association rules, this study proposes a method for building the initial frequent closed itemset lattice from the original database. The lattice is updated when new transactions are inserted. The number of database rescans over the entire database is reduced in the maintenance process. The proposed algorithm is compared with building a lattice in batch mode to demonstrate the effectiveness of the proposed algorithm.  相似文献   

10.
A genetic-fuzzy mining approach for items with multiple minimum supports   总被引:2,自引:2,他引:0  
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Mining association rules from transaction data is most commonly seen among the mining techniques. Most of the previous mining approaches set a single minimum support threshold for all the items and identify the relationships among transactions using binary values. In the past, we proposed a genetic-fuzzy data-mining algorithm for extracting both association rules and membership functions from quantitative transactions under a single minimum support. In real applications, different items may have different criteria to judge their importance. In this paper, we thus propose an algorithm which combines clustering, fuzzy and genetic concepts for extracting reasonable multiple minimum support values, membership functions and fuzzy association rules from quantitative transactions. It first uses the k-means clustering approach to gather similar items into groups. All items in the same cluster are considered to have similar characteristics and are assigned similar values for initializing a better population. Each chromosome is then evaluated by the criteria of requirement satisfaction and suitability of membership functions to estimate its fitness value. Experimental results also show the effectiveness and the efficiency of the proposed approach.  相似文献   

11.
The frequent pattern tree (FP-tree) is an efficient data structure for association-rule mining without generation of candidate itemsets. It was used to compress a database into a tree structure which stored only large items. It, however, needed to process all transactions in a batch way. In real-world applications, new transactions are usually incrementally inserted into databases. In the past, we proposed a Fast Updated FP-tree (FUFP-tree) structure to efficiently handle new transactions and to make the tree update process become easier. In this paper, we attempt to modify the FUFP-tree construction based on the concept of pre-large itemsets. Pre-large itemsets are defined by a lower support threshold and an upper support threshold. It does not need to rescan the original database until a number of new transactions have been inserted. The proposed approach can thus achieve a good execution time for tree construction especially when each time a small number of transactions are inserted. Experimental results also show that the proposed Pre-FUFP maintenance algorithm has a good performance for incrementally handling new transactions.  相似文献   

12.
挖掘关联规则中Apriori算法的研究   总被引:55,自引:0,他引:55  
文章是基于大型销售数据库研究了关联规则挖掘问题 .分析和探讨了 Apriori算法 ,并给出了该算法的实现思想 ,同时通过例子说明算法的执行过程  相似文献   

13.
共识机制是区块链技术的核心,能够使所有节点周期性地完成交易的验证和记录,且保持所有节点保存的区块链数据的一致。针对目前公有链共识机制的去中心化程度不高和容易临时分叉两个问题,提出一种基于哈希随机选主的最小值证明共识机制(Proof of Minimum,PoM),利用哈希算法的强混淆性提高去中心化程度,利用哈希算法的抗碰撞性降低临时分叉的概率。理论分析和实验结果表明PoM不仅提高了去中心化程度,也降低了临时分叉的概率。  相似文献   

14.
高维大数据集对现有的数据挖掘算法提出了挑战。该文把挖掘任务分解为挖掘频繁长模式与短模式2个子问题,提出一种在高维大数据集中挖掘长项集的算法,即inter-transaction。该算法利用了高维数据中长事务相交迅速变短的特性,通过事务的交集运算直接得到长闭合模式,同时采用新的减枝策略,优化了事务交集运算的方法。实验表明,该方法对高维大数据集非常有效。  相似文献   

15.
挖掘关联规则中AprioriTid算法的改进   总被引:2,自引:0,他引:2       下载免费PDF全文
针对Apriori和AprioriTid算法中存在的项集生成瓶颈问题,提出了一种基于事务集压缩、候选项集压缩和支持度布尔矩阵的改进AprioriTid算法。该算法中通过删去不必比较的事务来有效缩减数据集;优化频繁项集的自连接方式来减少生成的候选项集个数;使用支持度布尔矩阵来加快候选项集的验证速度。实验结果表明改进算法确实能有效减少相关计算量,比已有算法执行效率明显提高,同时验证了该算法在旋转机械故障诊断中的有效性。  相似文献   

16.
17.
为适应电力市场化改革趋势实现支持复杂交易形式的电力交易系统,基于区块链技术提出一种分布式电力竞价交易算法。将竞价交易分为出价及应价2种交易,针对同一笔出价交易,允许多笔应价交易存在,并由节点服务器通过对所有应价交易进行排序比价决定胜出交易。基于有序聚合签名对交易顺序及交易内容进行验证,确保交易的真实性,同时利用保序加密技术对交易内容进行保护,确保交易隐私数据的机密性。在此基础上,通过区块链存储所有交易,确保交易的不可篡改性。实验结果表明,该算法可以有效提高交易生成及验证效率,快速达成安全的电力竞价交易。  相似文献   

18.
An approach to vertical partitioning in relational databases in which the attributes of a relation are partitioned according to a set of transactions is proposed. The objective of vertical partitioning is to minimize the number of disk accesses in the system. Since transactions have more semantic meanings than attributes, this approach allows the optimization of the partitioning based on a selected set of important transactions. An optimal binary partitioning (OBP) algorithm based on the branch and bound method is presented, with the worst case complexity of O(2n), where n is the number of transactions. To handle systems with a large number of transactions, an algorithm BPi with complexity varying from O(n) to O(2n) is also developed. The experimental results reveal that the performance of vertical partitioning is sensitive to the skewness of transaction accesses. Further, BPi converges rather rapidly to OBP. Both OBP and BPi yield results comparable with that of global optimum obtained from an exhaustive search  相似文献   

19.
Concurrency control is the activity of synchronizing operations issued by concurrent executing transactions on a shared database. The aim of this control is to provide an execution that has the same effect as a serial (non-interleaved) one. The optimistic concurrency control technique allows the transactions to execute without synchronization, relying on commit-time validation to ensure serializability. Effectiveness of the optimistic techniques depends on the conflict rate of transactions. Since different systems have various patterns of conflict and the patterns may also change over time, so applying the optimistic scheme to the entire system results in degradation of performance. In this paper, a novel algorithm is proposed that dynamically selects the optimistic or pessimistic approach based on the value of conflict rate. The proposed algorithm uses an adaptive resonance theory–based neural network in making decision for granting a lock or detection of the winner transaction. In addition, the parameters of this neural network are optimized by a modified gravitational search algorithm. On the other hand, in the real operational environments we know the writeset (WS) and readset (RS) only for a fraction of transactions set before execution. So, the proposed algorithm is designed based on optional knowledge about WS and RS of transactions. Experimental results show that the proposed hybrid concurrency control algorithm results in more than 35 % reduction in the number of aborts in high-transaction rates as compared to strict two-phase locking algorithm that is used in many commercial database systems. This improvement is 13 % as compared to pure-pessimistic approach and is more than 31 % as compared to pure-optimistic approach.  相似文献   

20.
约束性相联规则发现方法及算法   总被引:47,自引:0,他引:47  
文中研究了在大型事务7库中发现有约束条件的相联规则问题,提出了有效实现约束性相联规则发现的两种方法,过滤数据库算法Filtering和频繁项集生成算法Separate,这两种可以同时并有物方法比已有算法运算效率有显著性提高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号