首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The science of bioinformatics has been accelerating at a fast pace, introducing more features and handling bigger volumes. However, these swift changes have, at the same time, posed challenges to data mining applications, in particular efficient association rule mining. Many data mining algorithms for high-dimensional datasets have been put forward, but the sheer numbers of these algorithms with varying features and application scenarios have complicated making suitable choices. Therefore, we present a general survey of multiple association rule mining algorithms applicable to high-dimensional datasets. The main characteristics and relative merits of these algorithms are explained, as well, pointing out areas for improvement and optimization strategies that might be better adapted to high-dimensional datasets, according to previous studies. Generally speaking, association rule mining algorithms that merge diverse optimization methods with advanced computer techniques can better balance scalability and interpretability.  相似文献   

2.
In this paper, we developed a binary particle swarm optimization (BPSO) based association rule miner. Our BPSO based association rule miner generates the association rules from the transactional database by formulating a combinatorial global optimization problem, without specifying the minimum support and minimum confidence unlike the a priori algorithm. Our algorithm generates the best M rules from the given database, where M is a given number. The quality of the rule is measured by a fitness function defined as the product of support and confidence. The effectiveness of our algorithm is tested on a real life bank dataset from commercial bank in India and three transactional datasets viz. books database, food items dataset and dataset of the general store taken from literature. Based on the results, we infer that our algorithm can be used as an alternative to the a priori algorithm and the FP-growth algorithm.  相似文献   

3.
日志挖掘为WAP增值业务运营和策略调整提供了数据依据.介绍了WAP增值业务中日志预处理.引入关联规则的概念到WAP增值业务日志挖掘中,分析了经典数据挖掘Apfiori算法.从两方面做了改进:利用修剪技术,由一项频繁集生成二项候选集,减少大量二项候选集;用扫描内存代替扫描数据库,减少大量扫描时间.实验表明这两种改进方法能快速完成WAP增值业务中素材关联的挖掘.  相似文献   

4.
三峡库区岩性植被关联规则挖掘   总被引:1,自引:0,他引:1       下载免费PDF全文
三峡库区属于南方高植被覆盖区域,岩石上部覆盖着较厚的土壤和茂密的植被,因此岩性分析比较困难,尚无成熟的方法可循。三峡库区遥感岩性分析的关键在于分析表层植被与岩性的关系,寻找消除表层植被的影响直接提取岩性信息的方法。针对三峡库区这一地形复杂、地质灾害频繁、土壤植被发育的地区分析和挖掘出岩性和植被的关联规则;通过将遥感影像与地质图叠加,计算植被指数NDVI图像,在各地层内随机选点,分析各点的岩性与NDVI值的关系,基于概念格算法和规则提取,挖掘出三峡库区嘉陵江组二段T1j2,嘉陵江组三段T1j3,巴东组一段T2b1,巴东组二段T2b2,大冶组T1d等地层的岩性和植被的关联规则。  相似文献   

5.
基于人工免疫系统的关联规则挖掘算法   总被引:4,自引:0,他引:4  
给出了一个基于人工免疫系统的关联规则挖掘算法。将训练数据作为抗原,候选模式作为人工识别球(ARB),通过免疫学习生成频繁模式并以免疫记忆的形式加以保存,最终生成关联规则。所给的应用实例说明本算法是可行、有效的。  相似文献   

6.
基于模拟退火遗传算法的关联规则挖掘   总被引:10,自引:0,他引:10  
将模拟退火遗传算法加以改进,应用于关联规则挖掘,提出一种新的基于改进的模拟退火遗传算法的关联规则挖掘算法,并在该算法中,采用自适应方式动态选取交叉和变异概率,有效地抑制了早熟收敛现象,实验结果显示该方法能高效地解决关联规则挖掘问题。  相似文献   

7.
Server-centric architectures such as the Web's suffer from well-known problems related to application size and increasing user requests. Peer-to-peer systems can help address some of the key challenges, but this survey of several current P2P systems shows that dependability remains an open issue. To perform in Internet-scale applications, P2P systems must address the four major properties of dependable systems: scalability, fault-tolerance, security, and anonymity. An output of the comparison provided is an attempt to move toward common terms and definitions. Because the models underlying current P2P systems must be understood to support a thorough investigation of dependability properties, we briefly examine the most popular P2P systems and then compare how these systems address dependability.  相似文献   

8.
Peer-to-peer (P2P) offers good solutions for many applications such as large data sharing and collaboration in social networks. Thus, it appears as a powerful paradigm to develop scalable distributed applications, as reflected by the increasing number of emerging projects based on this technology. However, building trustworthy P2P applications is difficult because they must be deployed on a large number of autonomous nodes, which may refuse to answer to some requests and even leave the system unexpectedly. This volatility of nodes is a common behavior in P2P systems and may be interpreted as a fault during tests (i.e., failed node). In this work, we present a framework and a methodology for testing P2P applications. The framework is based on the individual control of nodes, allowing test cases to precisely control the volatility of nodes during their execution. We validated this framework through implementation and experimentation on an open-source P2P system. The experimentation tests the behavior of the system on different conditions of volatility and shows how the tests were able to detect complex implementation problems.  相似文献   

9.
亚复杂系统中动力学干预规则挖掘技术研究进展   总被引:3,自引:1,他引:2  
唐常杰  张悦  唐良  李川  陈瑜 《计算机应用》2008,28(11):2732-2736
亚复杂系统干预规则挖掘是数据挖掘领域的新内容。综述了亚复杂系统干预规则研究背景和典型问题,通过实例,描述了干预规则挖掘领域一些基本概念和术语,如干预相关度、传递相关度、干预分型和干预代数等;介绍了在亚复杂系统干预规则挖掘的初步探索和成果,包括关于朴素干预规则和数值型干预规则挖掘算法,以及基于密度的数据流干预分析模型及相关结果。  相似文献   

10.
信息的爆炸式增长使数据挖掘分析过程更加困难,针对普通关联规则挖掘算法很难在短运行时间和低关联度的前提下完成大型数据库中变量关系的评估和发现的问题,提出利用强化学习算法改进treap的大型数据库关联规则挖掘算法。提出的算法首先计算数据库中每个变量的优先级;然后,在优先级模型中利用强化学习算法改进的build-treap程序构建treap数据结构;最后,通过遍历程序和generateRule程序完成数据库中所需的关系查找。在对提出的算法进行稳定性分析后进行了仿真验证实验,实验结果表明,提出的算法在其最次和最佳案例分析中分别能够完成O(n log n)次和O(n 2)次挖掘,能够在较短时间内完成低关联度的大型数据库中变量关系挖掘任务,相对于改进型Apriori算法和改进型FP生长算法有较大提升。  相似文献   

11.
Location awareness in unstructured peer-to-peer systems   总被引:7,自引:0,他引:7  
Peer-to-peer (P2P) computing has emerged as a popular model aiming at further utilizing Internet information and resources. However, the mechanism of peers randomly choosing logical neighbors without any knowledge about underlying physical topology can cause a serious topology mismatch between the P2P overlay network and the physical underlying network. The topology mismatch problem brings great stress in the Internet infrastructure. It greatly limits the performance gain from various search or routing techniques. Meanwhile, due to the inefficient overlay topology, the flooding-based search mechanisms cause a large volume of unnecessary traffic. Aiming at alleviating the mismatching problem and reducing the unnecessary traffic, we propose a location-aware topology matching (LTM) technique. LTM builds an efficient overlay by disconnecting slow connections and choosing physically closer nodes as logical neighbors while still retaining the search scope and reducing response time for queries. LTM is scalable and completely distributed in the sense that it does not require any global knowledge of the whole overlay network. The effectiveness of LTM is demonstrated through simulation studies.  相似文献   

12.
Peer-to-peer systems are prone to faults; Therefore, it is extremely important to design peer-to-peer systems that automatically regain consistency or, in other words, are self-stabilizing. In order to achieve the above, we present a deterministic structure that defines the entire (IP) pointers structure among the machines, for every n machines; i.e., defines the next hop for the insert, delete, and search procedures of the peer-to-peer system. Thus, the consistency of the system is easily defined, monitored, verified, and repaired. We present the HyperTree (distributed) structure, which supports the peer-to-peer procedures while ensuring that the out-degree and the in-degree (the number of outgoing/ incoming pointers) are b log b n where n is the actual number of machines and b is an integer parameter greater than 1. Moreover, the HyperTree ensures that the maximal number of hops involved in each procedure is bounded by log b n. A self-stabilizing peer-to- peer distributed algorithm based on the HyperTree is presented. This work was partially supported by IBM Faculty Award, NSF Grant 0098305, the Israeli Ministry of Trade and Industry, the Rita Altura Trust Chair in Computer Sciences and the Lynne and William Frankel Center for Computer Sciences. The work was done while Ronen I. Kat was a PhD student at Ben-Gurion University of the Negev. An preliminary version was published in the proceedings of the third IEEE International Symposium on Network Computing and Applications (NCA’04).  相似文献   

13.
基于支持度的关联规则挖掘算法无法找到那些非频繁但效用很高的项集,基于效用的关联规则会漏掉那些效用不高但发生比较频繁、支持度和效用值的积(激励)很大的项集。提出了基于激励的关联规则挖掘问题及一种自下而上的挖掘算法HM-miner。激励综合了支持度与效用的优点,能同时度量项集的统计重要性和语义重要性。HM-miner利用激励的上界特性进行减枝,能有效挖掘高激励项集。  相似文献   

14.
在对真实VoD/P2P系统的存储过程进行模块划分的基础上:在客户端,提出了一种通用的存储算法模型VSVR,它可以归纳目前绝大多数的存储策略并可能推导出新的设计;在服务器端,给出了存储调度的主要目标与基本原理。其工作可作为VoD/P2P系统存储设计的重要参考。  相似文献   

15.
针对基于Hopfield神经网络的最大频繁项集挖掘(HNNMFI)算法存在的挖掘结果不准确的问题,提出基于电流阈值自适应忆阻器(TEAM)模型的Hopfield神经网络的改进关联规则挖掘算法。首先,使用TEAM模型设计实现突触,利用阈值忆阻器的忆阻值随方波电压连续变化的能力来设定和更新突触权值,自适应关联规则挖掘算法的输入。其次,改进原算法的能量函数以对齐标准能量函数,并用忆阻值表示权值,放大权值和偏置。最后,设计由最大频繁项集生成关联规则的算法。使用10组大小在30以内的随机事务集进行1000次仿真实验,实验结果表明,与HNNMFI算法相比,所提算法在关联挖掘结果准确率上提高33.9个百分点以上,说明忆阻器能够有效提高Hopfield神经网络在关联规则挖掘中的结果准确率。  相似文献   

16.
《Parallel Computing》2014,40(10):768-785
Association rule mining (ARM) is an important task in data mining with many practical applications. Current methods for association rule mining have shown unstable performance for different database types and under-utilize the benefits of multi-core shared memory machines. In this paper, we address these issues by presenting a novel parallel method for finding frequent patterns, the most computational intensive phase of ARM. Our proposed method, named ShaFEM, combines two mining strategies and applies the most appropriate one to each data subset of the database to efficiently adapt to the data characteristics and run fast on both sparse and dense databases. In addition, our newlock-free design minimizes the synchronization needs and maximizes the data independence to enhance the scalability. The new structure lends itself well to dynamic job scheduling resulting in a well-balanced load on the new multi-core shared memory architectures. We have evaluated ShaFEM on 12-core multi-socket servers and found that our method run up to 5.8 times faster and consumes memory up to 7.1 times less than the state-of-the-art parallel method. For some test cases, ShaFEM can save up to 4.9 days of execution time over the compared method.  相似文献   

17.
Until now, the analysis of fault tolerance of peer-to-peer systems usually only covers random faults of some kind. Contrary to traditional algorithmic research, faults as well as joins and leaves occurring in a worst-case manner are hardly considered. In this article, we devise techniques to build dynamic peer-to-peer systems which remain fully functional in spite of an adversary who continuously adds and removes peers. We exemplify our algorithms on hypercube and pancake topologies and present a system which maintains small peer degree and network diameter.  相似文献   

18.
This paper investigates the sick and healthy factors which contribute to heart disease for males and females. Association rule mining, a computational intelligence approach, is used to identify these factors and the UCI Cleveland dataset, a biological database, is considered along with the three rule generation algorithms – Apriori, Predictive Apriori and Tertius. Analyzing the information available on sick and healthy individuals and taking confidence as an indicator, females are seen to have less chance of coronary heart disease then males. Also, the attributes indicating healthy and sick conditions were identified. It is seen that factors such as chest pain being asymptomatic and the presence of exercise-induced angina indicate the likely existence of heart disease for both men and women. However, resting ECG being either normal or hyper and slope being flat are potential high risk factors for women only. For men, on the other hand, only a single rule expressing resting ECG being hyper was shown to be a significant factor. This means, for women, resting ECG status is a key distinct factor for heart disease prediction. Comparing the healthy status of men and women, slope being up, number of coloured vessels being zero, and oldpeak being less than or equal to 0.56 indicate a healthy status for both genders.  相似文献   

19.
选择地质灾害较为发育的巴东县为研究区,并以该区灾害点为数据样本,利用GIS将灾害点与地层岩性、高程、坡度、坡向、水系组合、遥感影像土地利用分类结果等6个影响因子进行叠加分析,选取灾害点的灾害类型、灾害规模、灾害体的物质类型、高程差、水系岸别等5个属性与叠加分析结果利用Apriori算法进行关联规则挖掘,最后挖掘出灾害规模与水系组合间关系等单因素间关联以及不同灾害属性与各因子间的关系等多因素间关联。通过与前人的相关研究成果对比分析,证明得出的规则具有合理性并符合实际情况,可为地质灾害分析决策提供先验知识。  相似文献   

20.
Association rule hiding   总被引:9,自引:0,他引:9  
Large repositories of data contain sensitive information that must be protected against unauthorized access. The protection of the confidentiality of this information has been a long-term goal for the database security research community and for the government statistical agencies. Recent advances in data mining and machine learning algorithms have increased the disclosure risks that one may encounter when releasing data to outside parties. A key problem, and still not sufficiently investigated, is the need to balance the confidentiality of the disclosed data with the legitimate needs of the data users. Every disclosure limitation method affects, in some way, and modifies true data values and relationships. We investigate confidentiality issues of a broad category of rules, the association rules. In particular, we present three strategies and five algorithms for hiding a group of association rules, which is characterized as sensitive. One rule is characterized as sensitive if its disclosure risk is above a certain privacy threshold. Sometimes, sensitive rules should not be disclosed to the public since, among other things, they may be used for inferring sensitive data, or they may provide business competitors with an advantage. We also perform an evaluation study of the hiding algorithms in order to analyze their time complexity and the impact that they have in the original database.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号