首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 422 毫秒
1.
Classical data mining algorithms require expensive passes over the entire database to generate frequent items and hence to generate association rules. With the increase in the size of database, it is becoming very difficult to handle large amount of data for computation. One of the solutions to this problem is to generate sample from the database that acts as representative of the entire database for finding association rules in such a way that the distance of the sample from the complete database is minimal. Choosing correct sample that could represent data is not an easy task. Many algorithms have been proposed in the past. Some of them are computationally fast while others give better accuracy. In this paper, we present an algorithm for generating a sample from the database that can replace the entire database for generating association rules and is aimed at keeping a balance between accuracy and speed. The algorithm that is proposed takes into account the average number of small, medium and large 1-itemset in the database and average weight of the transactions to define threshold condition for the transactions. Set of transactions that satisfy the threshold condition is chosen as the representative for the entire database. The effectiveness of the proposed algorithm has been tested over several runs of database generated by IBM synthetic data generator. A vivid comparative performance evaluation of the proposed technique with the existing sampling techniques for comparing the accuracy and speed has also been carried out.  相似文献   

2.
Mining sequential patterns is to discover sequential purchasing behaviours for most of the customers from a large number of customer transactions. The strategy of mining sequential patterns focuses on discovering frequent sequences. A frequent sequence is an ordered list of the itemsets purchased by a sufficient number of customers. The previous approaches for mining sequential patterns need to repeatedly scan the database so that they take a large amount of computation time to find frequent sequences. The customer transactions will grow rapidly in a short time, and some of the customer transactions may be antiquated. Consequently, the frequent sequences may be changed due to the insertion of new customer transactions or the deletion of old customer transactions from the database. It may require rediscovering all the patterns by scanning the entire updated customer transaction database. In this paper, we propose an incremental updating technique to maintain the discovered sequential patterns when transactions are inserted into or deleted from the database. Our approach partitions the database into some segments and scans the database segment by segment. For each segment scan, our approach prunes those sequences that cannot be frequent sequences any more to accelerate the finding process of the frequent sequences. Therefore, the number of database scans can be significantly reduced by our approach. The experimental results show that our algorithms are more efficient than other algorithms for the maintenance of mining sequential patterns.  相似文献   

3.
On-the-fly reading of entire databases   总被引:2,自引:0,他引:2  
A common database need is to obtain a global-read, which is a consistent read of an entire database. To avoid terminating normal system activity, and thus improve availability, we propose an on-the-fly algorithm that reads database entities incrementally and allows normal transactions to proceed concurrently. The algorithm assigns each entity a color based on whether the entity has been globally read, and a shade based on how normal transactions have accessed the entity. Serializability of execution histories is ensured by requiring normal transactions to pass both a color test and a shade test before being allowed to commit. Our algorithm improves on a color-only-based scheme from the literature; the color-only scheme does not guarantee serializability  相似文献   

4.
Mining association rules is an important task for knowledge discovery. We can analyze past transaction data to discover customer behaviors such that the quality of business decisions can be improved. Various types of association rules may exist in a large database of customer transactions. The strategy of mining association rules focuses on discovering large item sets, which are groups of items which appear together in a sufficient number of transactions. We propose a graph-based approach to generate various types of association rules from a large database of customer transactions. This approach scans the database once to construct an association graph and then traverses the graph to generate all large item sets. Empirical evaluations show that our algorithms outperform other algorithms which need to make multiple passes over the database  相似文献   

5.
Recovery from malicious transactions   总被引:7,自引:0,他引:7  
Preventive measures sometimes fail to deflect malicious attacks. We adopt an information warfare perspective, which assumes success by the attacker in achieving partial, but not complete, damage. In particular, we work in the database context and consider recovery from malicious but committed transactions. Traditional recovery mechanisms do not address this problem, except for complete rollbacks, which undo the work of benign transactions as well as malicious ones, and compensating transactions, whose utility depends on application semantics. Recovery is complicated by the presence of benign transactions that depend, directly or indirectly, on the malicious transactions. We present algorithms to restore only the damaged part of the database. We identify the information that needs to be maintained for such algorithms. The initial algorithms repair damage to quiescent databases; subsequent algorithms increase availability by allowing new transactions to execute concurrently with the repair process. Also, via a study of benchmarks, we show practical examples of how offline analysis can efficiently provide the necessary data to repair the damage of malicious transactions.  相似文献   

6.
Presents a simulation-based performance analysis of a concurrent file reorganization algorithm. We examine the effect on throughput of (a) buffer size, (b) degree of reorganization, (c) write probability of transactions, (d) multiprogramming level, and (e) degree of clustered transactions. The problem of file reorganization that we consider involves altering the placement of records on pages of a secondary storage device. In addition, we want this reorganization to be done in place, i.e. using the file's original storage space for the newly reorganized file. Our approach is appropriate for a non-in-place reorganization as well. The motivation for such a physical change, i.e. record clustering, is to improve the database system's performance, i.e. minimizing the number of page accesses made in answering a set of queries. There are numerous record clustering algorithms, but they usually do not solve the entire problem, i.e., they do not specify how to efficiently reorganize the file to reflect the clustering assignment that they determine. In previous work, we have presented an algorithm that is a companion to general record clustering algorithms, i.e. it actually transforms the file. In this work we show through simulation that our algorithm, when run concurrently with user transactions, provides an acceptable level of overall database system performance  相似文献   

7.
《Information Systems》2000,25(4):309-322
Many real-time applications have very tight time constraints which couldn't be met by disk resident databases. For those applications, main memory database where entire database is stored in main memory is the proper choice. It has been shown that coarse-granule locking is better than fine-granule locking for main-memory databases. Coarse-granule locking makes it easy to extract data access patterns correctly from canned transactions of main memory real-time database systems. In this paper, we propose two real-time transaction scheduling algorithms — CCA-ALF (Cost Conscious Approach with Average Load Factor) and EDF-CR-ALF (Earliest Deadline First-Conditional Restart with ALF) — which use both static (e.g., deadline) and dynamic information (e.g., system load) for main memory databases by utilizing data access patterns of transactions. We compare the performance of those algorithms with CCA and EDF-HP which do not use system load information at all. Our simulations on main memory databases indicate that: i) CCA-ALF is better than EDF-HP, CCA, and EDF-CR-ALF in terms of miss percent and mean lateness, and ii) CCA-ALF adapts well to the changes in the system load.  相似文献   

8.
A concept lattice is an ordered structure between concepts. It is particularly effective in mining association rules. However, a concept lattice is not efficient for large databases because the lattice size increases with the number of transactions. Finding an efficient strategy for dynamically updating the lattice is an important issue for real-world applications, where new transactions are constantly inserted into databases. To build an efficient storage structure for mining association rules, this study proposes a method for building the initial frequent closed itemset lattice from the original database. The lattice is updated when new transactions are inserted. The number of database rescans over the entire database is reduced in the maintenance process. The proposed algorithm is compared with building a lattice in batch mode to demonstrate the effectiveness of the proposed algorithm.  相似文献   

9.
The security of computers and their networks is of crucial concern in the world today. One mechanism to safeguard information stored in database systems is an Intrusion Detection System (IDS). The purpose of intrusion detection in database systems is to detect malicious transactions that corrupt data. Recently researchers are working on using data mining techniques for detecting such malicious transactions in database systems. Their approach concentrates on mining data dependencies among data items. However, the transactions not compliant with these data dependencies are identified as malicious transactions. Algorithms that these approaches use for designing their data dependency miner have limitations. For instance, they need to experimentally determine appropriate settings for minimum support and related constraints, which does not necessarily lead to strong data dependencies. In this paper we propose a new data mining algorithm, called the Optimal Data Access Dependency Rule Mining (ODADRM), for designing a data dependency miner for our database IDS. ODADRM is an extension of k-optimal rule discovery algorithm, which has been improved to be suitable in database intrusion detection domain. ODADRM avoids many limitations of previous data dependency miner algorithms. As a result, our approach is able to track normal transactions and detect malicious ones more effectively than existing approaches.  相似文献   

10.
Most approaches for discovering frequent itemsets derive association rules from a binary database. Profit, cost, and quantity are not considered in traditional association-rule mining. Utility mining was proposed to measure the utilities of purchase products to derive highutility itemsets (HUIs). Many algorithms have been proposed to efficiently find HUIs from a static database. In real-world applications, transactions are inserted, deleted, or modified in dynamic situations. Existing batch approaches have to re-process the updated database since previously discovered HUIs are not maintained. In this paper, a Fast UPdated (FUP) strategy with utility measure and a maintenance algorithm, called FUP-HUI-MOD, are developed to efficiently maintain and update discovered HUIs. When transactions are modified, the proposed algorithm partitions the transactions before and after the modification into two parts, creating four cases. Each case is maintained using a specific procedure to update the discovered HUIs. Based on the designed FUP-HUI-MOD algorithm, the original database is not required to be rescanned each time compared to the state-of-the-art high-utility itemset mining algorithms in batch mode. Experiments are conducted to show that the proposed algorithm outperforms batch algorithms in maintaining HUIs.  相似文献   

11.
Concurrency control (CC) algorithms guarantee the correctness and consistency criteria for concurrent execution of a set of transactions in a database. A precondition that is seen in many CC algorithms is that the writeset (WS) and readset (RS) of transactions should be known before the transaction execution. However, in real operational environments, we know the WS and RS only for a fraction of transaction set before execution. However, optional knowledge about WS and RS of transactions is one of the advantages of the proposed CC algorithm in this paper. If the WS and RS are known before the transaction execution, the proposed algorithm will use them to improve the concurrency and performance. On the other hand, the concurrency control algorithms often use a specific static or dynamic equation in making decision about granting a lock or detection of the winner transaction. The proposed algorithm in this paper uses an adaptive resonance theory (ART)-based neural network for such a decision making. In this way, a parameter called health factor (HF) is defined for transactions that is used for comparing the transactions and detecting the winner one in accessing the database objects. HF is calculated using ART2 neural network. Experimental results show that the proposed neural-based CC (NCC) algorithm increases the level of concurrency by decreasing the number of aborts. The performance of proposed algorithm is compared with strict two-phase locking (S2PL) algorithm, which has been used in most commercial database systems. Simulation results show that the performance of proposed NCC algorithm, in terms of number of aborts, is better than S2PL algorithm in different transaction rates.  相似文献   

12.
关联规则的几种开采算法及其比较分析   总被引:14,自引:0,他引:14  
关联规则的发现是数据开采的一个重要方面,目前有许多人正致力于关联规则的快速开采集法,本文介绍几种开采大型事务数据库中所有关联规则的算法,并比较它们的效率。  相似文献   

13.
Incrementally mining high utility patterns based on pre-large concept   总被引:1,自引:1,他引:0  
In traditional association rule mining, most algorithms are designed to discover frequent itemsets from a binary database. Utility mining was thus proposed to measure the utility values of purchased items for revealing high utility itemsets from a quantitative database. In the past, a two-phase high utility mining algorithm was thus proposed for efficiently discovering high utility itemsets from a quantitative database. In dynamic data mining, transactions may be inserted, deleted, or modified from a database. In this case, a batch mining procedure must rescan the whole updated database to maintain the up-to-date information. Designing an efficient approach for handling dynamic databases is thus a critical research issue in utility mining. In this paper, an incremental mining algorithm is proposed for efficiently maintaining discovered high utility itemsets based on pre-large concepts. Itemsets are first partitioned into three parts according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database and in inserted transactions. Individual procedures are then executed for each part. Experimental results show that the proposed incremental high utility mining algorithm outperforms existing algorithms.  相似文献   

14.
In this paper, we study the issues of mining and maintaining association rules in a large database of customer transactions. The problem of mining association rules can be mapped into the problems of finding large itemsets which are sets of items brought together in a sufficient number of transactions. We revise a graph-based algorithm to further speed up the process of itemset generation. In addition, we extend our revised algorithm to maintain discovered association rules when incremental or decremental updates are made to the databases. Experimental results show the efficiency of our algorithms. The revised algorithm is a significant improvement over the original one on mining association rules. The algorithms for maintaining association rules are more efficient than re-running the mining algorithms for the whole updated database and outperform previously proposed algorithms that need multiple passes over the database. Received 4 August 1999 / Revised 18 March 2000 / Accepted in revised form 18 October 2000  相似文献   

15.
Epidemic algorithms for replicated databases   总被引:1,自引:0,他引:1  
We present a family of epidemic algorithms for maintaining replicated database systems. The algorithms are based on the causal delivery of log records where each record corresponds to one transaction instead of one operation. The first algorithm in this family is a pessimistic protocol that ensures serializability and guarantees strict executions. Since we expect the epidemic algorithms to be used in environments with low probability of conflicts among transactions, we develop a variant of the pessimistic algorithm which is optimistic in that transactions commit as soon as they terminate locally and inconsistencies are detected asynchronously as the effects of committed transactions propagate through the system. The last member of the family of epidemic algorithms is pessimistic and uses voting with quorums to resolve conflicts and improve transaction response time. A simulation study evaluates the performance of the protocols.  相似文献   

16.
实时数据库事务的正确性及实现算法   总被引:2,自引:2,他引:0  
实时数据库系统中事务可以有定时限制(典型地为截止期),事务超过截止期可能给系统带来灾难性后果,事务不光要满足数据库的完整性与一致性,而且要满足在时间上的正确性以及事务之间在结构上的正确性,传统的事务处理方法仅着眼于事务存取数据库的正确性,对于时间正确性与结构正确性无能为力,详细讲座了实时事务的正确性,包含结果正确性、时间正确性、行为正确性及结构正确性,已有的研究成果中大多用不央的算法及策略来保证不同的正确性要求;给出了保证实时事务正确性的一个统一的图论算法。  相似文献   

17.
移动复制数据库系统冲突检测及消解策略   总被引:9,自引:0,他引:9  
复制技术是提高移动库系统性能的一项关键技术,该文提出了一种新的移动复制数据库系统模型-事务级吉果集传递(TLRSP)移动复制模型,重点分析了该模型中的冲突检测及消解策略,并给出具体的实现算法,TLRSP移动复制模型允许移动用户在系统断连时存取数据库的本地副本并提交事务,重新连接时进行冲突的检测及消解,同时进行事务结果集的合并,最后通过增量刷新的方式进行同步处理,使得系统最终收敛于一致性的状态。此外,通过引入简化的事务日志,数据牌本号以及权限控制等概念,TLRSP模型有效地降低了移动数据库系统的资源消耗,保证了数据库的一致性,从而为移动数据库系统复制提供了一个可行的解决方案。  相似文献   

18.
吴钧 《计算机工程》1992,18(6):42-48
并发控制算法是保证多用户并发执行下的数据库的一致性。目前有许多并发控制算法,本文概要介绍了最常见的几种方法,并提出了一个综合几种算法的一体化算法。  相似文献   

19.
The performance of electronic commerce systems has a major impact on their acceptability to users. Different users also demand different levels of performance from the system, that is, they will have different Quality of Service (QoS) requirements. Electronic commerce systems are the integration of several different types of servers and each server must contribute to meeting the QoS demands of the users. In this paper we focus on the role, and the performance, of a database server within an electronic commerce system. We examine the characteristics of the workload placed on a database server by an electronic commerce system and suggest a range of QoS requirements for the database server based on this analysis of the workload. We argue that a database server must be able to dynamically reallocate its resources in order to meet the QoS requirements of different transactions as the workload changes. We describe Quartermaster, which is a system to support dynamic goal-oriented resource management in database management systems, and discuss how it can be used to help meet the QoS requirements of the electronic commerce database server. We provide an example of the use of Quartermaster that illustrates how the dynamic reallocation of memory resources can be used to meet the QoS requirements of a set of transactions similar to transactions found in an electronic commerce workload. We briefly describe the memory reallocation algorithms used by Quartermaster and present experiments to show the impact of the reallocations on the performance of the transactions. Published online: 22 August 2001  相似文献   

20.
在无线数据广播环境中,从移动客户机到服务器的上行带宽非常有限,传统的并发控制协议不适合这种不对称通信环境.本文提出一种可变乐观并发控制协议,服务器周期性向移动客户机广播数据对象,同时把一个广播周期分成若干个子周期,在两个连续的子周期间,一个保留空间被预留来存放服务器更新事务在第一个子周期开始后修改的所有数据对象,移动只读事务通过比较服务器更新事务提交的写集合和自己读集合来自主进行一致性验证,如果只读事务不能通过部分验证,它不会被草率地夭折并重新启动,相反,应用一个改进的向前验证策略,帮助移动只读事务争取更多提交机会.最后,本文进行充分的实验对提出的算法性能进行了评价.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号