共查询到20条相似文献,搜索用时 15 毫秒
1.
Dongmei Ai Hongfei Pan Xiaoxin Li Yingxin Gao Di He 《Artificial Life and Robotics》2018,23(3):420-427
The science of bioinformatics has been accelerating at a fast pace, introducing more features and handling bigger volumes. However, these swift changes have, at the same time, posed challenges to data mining applications, in particular efficient association rule mining. Many data mining algorithms for high-dimensional datasets have been put forward, but the sheer numbers of these algorithms with varying features and application scenarios have complicated making suitable choices. Therefore, we present a general survey of multiple association rule mining algorithms applicable to high-dimensional datasets. The main characteristics and relative merits of these algorithms are explained, as well, pointing out areas for improvement and optimization strategies that might be better adapted to high-dimensional datasets, according to previous studies. Generally speaking, association rule mining algorithms that merge diverse optimization methods with advanced computer techniques can better balance scalability and interpretability. 相似文献
2.
K.N.V.D. Sarath Vadlamani Ravi 《Engineering Applications of Artificial Intelligence》2013,26(8):1832-1840
In this paper, we developed a binary particle swarm optimization (BPSO) based association rule miner. Our BPSO based association rule miner generates the association rules from the transactional database by formulating a combinatorial global optimization problem, without specifying the minimum support and minimum confidence unlike the a priori algorithm. Our algorithm generates the best M rules from the given database, where M is a given number. The quality of the rule is measured by a fitness function defined as the product of support and confidence. The effectiveness of our algorithm is tested on a real life bank dataset from commercial bank in India and three transactional datasets viz. books database, food items dataset and dataset of the general store taken from literature. Based on the results, we infer that our algorithm can be used as an alternative to the a priori algorithm and the FP-growth algorithm. 相似文献
3.
日志挖掘为WAP增值业务运营和策略调整提供了数据依据.介绍了WAP增值业务中日志预处理.引入关联规则的概念到WAP增值业务日志挖掘中,分析了经典数据挖掘Apfiori算法.从两方面做了改进:利用修剪技术,由一项频繁集生成二项候选集,减少大量二项候选集;用扫描内存代替扫描数据库,减少大量扫描时间.实验表明这两种改进方法能快速完成WAP增值业务中素材关联的挖掘. 相似文献
4.
5.
Server-centric architectures such as the Web's suffer from well-known problems related to application size and increasing user requests. Peer-to-peer systems can help address some of the key challenges, but this survey of several current P2P systems shows that dependability remains an open issue. To perform in Internet-scale applications, P2P systems must address the four major properties of dependable systems: scalability, fault-tolerance, security, and anonymity. An output of the comparison provided is an attempt to move toward common terms and definitions. Because the models underlying current P2P systems must be understood to support a thorough investigation of dependability properties, we briefly examine the most popular P2P systems and then compare how these systems address dependability. 相似文献
6.
Eduardo Cunha de Almeida Gerson Sunyé Yves Le Traon Patrick Valduriez 《Empirical Software Engineering》2010,15(4):346-379
Peer-to-peer (P2P) offers good solutions for many applications such as large data sharing and collaboration in social networks.
Thus, it appears as a powerful paradigm to develop scalable distributed applications, as reflected by the increasing number
of emerging projects based on this technology. However, building trustworthy P2P applications is difficult because they must
be deployed on a large number of autonomous nodes, which may refuse to answer to some requests and even leave the system unexpectedly.
This volatility of nodes is a common behavior in P2P systems and may be interpreted as a fault during tests (i.e., failed
node). In this work, we present a framework and a methodology for testing P2P applications. The framework is based on the
individual control of nodes, allowing test cases to precisely control the volatility of nodes during their execution. We validated
this framework through implementation and experimentation on an open-source P2P system. The experimentation tests the behavior
of the system on different conditions of volatility and shows how the tests were able to detect complex implementation problems. 相似文献
7.
Location awareness in unstructured peer-to-peer systems 总被引:7,自引:0,他引:7
Yunhao Liu Li Xiao Xiaomei Liu Ni L.M. Xiaodong Zhang 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(2):163-174
Peer-to-peer (P2P) computing has emerged as a popular model aiming at further utilizing Internet information and resources. However, the mechanism of peers randomly choosing logical neighbors without any knowledge about underlying physical topology can cause a serious topology mismatch between the P2P overlay network and the physical underlying network. The topology mismatch problem brings great stress in the Internet infrastructure. It greatly limits the performance gain from various search or routing techniques. Meanwhile, due to the inefficient overlay topology, the flooding-based search mechanisms cause a large volume of unnecessary traffic. Aiming at alleviating the mismatching problem and reducing the unnecessary traffic, we propose a location-aware topology matching (LTM) technique. LTM builds an efficient overlay by disconnecting slow connections and choosing physically closer nodes as logical neighbors while still retaining the search scope and reducing response time for queries. LTM is scalable and completely distributed in the sense that it does not require any global knowledge of the whole overlay network. The effectiveness of LTM is demonstrated through simulation studies. 相似文献
8.
Peer-to-peer systems are prone to faults; Therefore, it is extremely important to design peer-to-peer systems that automatically regain consistency or, in other words, are self-stabilizing. In order to achieve the above, we present a deterministic structure that defines the entire (IP) pointers structure among the machines, for every n machines; i.e., defines the next hop for the insert, delete, and search procedures of the peer-to-peer system. Thus, the consistency of the system is easily defined, monitored, verified, and repaired. We present the HyperTree (distributed) structure, which supports the peer-to-peer procedures while ensuring that the out-degree and the in-degree (the number of outgoing/ incoming pointers) are b log b n where n is the actual number of machines and b is an integer parameter greater than 1. Moreover, the HyperTree ensures that the maximal number of hops involved in each procedure is bounded by log b n. A self-stabilizing peer-to- peer distributed algorithm based on the HyperTree is presented. This work was partially supported by IBM Faculty Award, NSF Grant 0098305, the Israeli Ministry of Trade and Industry, the Rita Altura Trust Chair in Computer Sciences and the Lynne and William Frankel Center for Computer Sciences. The work was done while Ronen I. Kat was a PhD student at Ben-Gurion University of the Negev. An preliminary version was published in the proceedings of the third IEEE International Symposium on Network Computing and Applications (NCA’04). 相似文献
9.
Until now, the analysis of fault tolerance of peer-to-peer systems usually only covers random faults of some kind. Contrary
to traditional algorithmic research, faults as well as joins and leaves occurring in a worst-case manner are hardly considered.
In this article, we devise techniques to build dynamic peer-to-peer systems which remain fully functional in spite of an adversary
who continuously adds and removes peers. We exemplify our algorithms on hypercube and pancake topologies and present a system
which maintains small peer degree and network diameter. 相似文献
10.
Jesmin Nahar Tasadduq Imam Kevin S. Tickle Yi-Ping Phoebe Chen 《Expert systems with applications》2013,40(4):1086-1093
This paper investigates the sick and healthy factors which contribute to heart disease for males and females. Association rule mining, a computational intelligence approach, is used to identify these factors and the UCI Cleveland dataset, a biological database, is considered along with the three rule generation algorithms – Apriori, Predictive Apriori and Tertius. Analyzing the information available on sick and healthy individuals and taking confidence as an indicator, females are seen to have less chance of coronary heart disease then males. Also, the attributes indicating healthy and sick conditions were identified. It is seen that factors such as chest pain being asymptomatic and the presence of exercise-induced angina indicate the likely existence of heart disease for both men and women. However, resting ECG being either normal or hyper and slope being flat are potential high risk factors for women only. For men, on the other hand, only a single rule expressing resting ECG being hyper was shown to be a significant factor. This means, for women, resting ECG status is a key distinct factor for heart disease prediction. Comparing the healthy status of men and women, slope being up, number of coloured vessels being zero, and oldpeak being less than or equal to 0.56 indicate a healthy status for both genders. 相似文献
11.
Association rule hiding 总被引:9,自引:0,他引:9
Verykios V.S. Elmagarmid A.K. Bertino E. Saygin Y. Dasseni E. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(4):434-447
Large repositories of data contain sensitive information that must be protected against unauthorized access. The protection of the confidentiality of this information has been a long-term goal for the database security research community and for the government statistical agencies. Recent advances in data mining and machine learning algorithms have increased the disclosure risks that one may encounter when releasing data to outside parties. A key problem, and still not sufficiently investigated, is the need to balance the confidentiality of the disclosed data with the legitimate needs of the data users. Every disclosure limitation method affects, in some way, and modifies true data values and relationships. We investigate confidentiality issues of a broad category of rules, the association rules. In particular, we present three strategies and five algorithms for hiding a group of association rules, which is characterized as sensitive. One rule is characterized as sensitive if its disclosure risk is above a certain privacy threshold. Sometimes, sensitive rules should not be disclosed to the public since, among other things, they may be used for inferring sensitive data, or they may provide business competitors with an advantage. We also perform an evaluation study of the hiding algorithms in order to analyze their time complexity and the impact that they have in the original database. 相似文献
12.
《Journal of Network and Computer Applications》2007,30(3):1216-1227
This paper introduces a new approach to a problem of data sharing among multiple parties, without disclosing the data between the parties. Our focus is data sharing among parties involved in a data mining task. We study how to share private or confidential data in the following scenario: multiple parties, each having a private data set, want to collaboratively conduct association rule mining without disclosing their private data to each other or any other parties. To tackle this demanding problem, we develop a secure protocol for multiple parties to conduct the desired computation. The solution is distributed, i.e., there is no central, trusted party having access to all the data. Instead, we define a protocol using homomorphic encryption techniques to exchange the data while keeping it private. 相似文献
13.
提出了基于属性重要性的关联分类方法.与传统算法不同的是根据属性重要性程度生成类别关联规则;并且在构造分类器时改进了CBA算法中对于具有相同支持度、置信度规则选择时的随机性.实验结果证明,用该方法得到的分类规则与传统的关联分类算法相比,复杂度低,且有效提高了分类效果. 相似文献
14.
In the field of data mining, an important issue for association rules generation is frequent itemset discovery, which is the key factor in implementing association rule mining. Therefore, this study considers the user’s assigned constraints in the mining process. Constraint-based mining enables users to concentrate on mining itemsets that are interesting to themselves, which improves the efficiency of mining tasks. In addition, in the real world, users may prefer recording more than one attribute and setting multi-dimensional constraints. Thus, this study intends to solve the multi-dimensional constraints problem for association rules generation.The ant colony system (ACS) is one of the newest meta-heuristics for combinatorial optimization problems, and this study uses the ant colony system to mine a large database to find the association rules effectively. If this system can consider multi-dimensional constraints, the association rules will be generated more effectively. Therefore, this study proposes a novel approach of applying the ant colony system for extracting the association rules from the database. In addition, the multi-dimensional constraints are taken into account. The results using a real case, the National Health Insurance Research Database, show that the proposed method is able to provide more condensed rules than the Apriori method. The computational time is also reduced. 相似文献
15.
To improve the efficiency of peer-to-peer (P2P) systems while adapting to changing environmental conditions, static peer-to-peer protocols can be replaced by adaptive plans. The resulting systems are inherently complex, which makes their development and characterization a challenge for traditional methods. Here we propose the design and analysis of adaptive P2P systems using measures of complexity, emergence, self-organization, and homeostasis based on information theory. These measures allow the evaluation of adaptive P2P systems and thus can be used to guide their design. We evaluate the proposal with a P2P computing system provided with adaptation mechanisms. We show the evolution of the system with static and also changing workload, using different fitness functions. When the adaptive plan forces the system to converge to a predefined performance level, the nodes may result in highly unstable configurations, which correspond to a high variance in time of the measured complexity. Conversely, if the adaptive plan is less “aggressive”, the system may be more stable, but the optimal performance may not be achieved. 相似文献
16.
A survey on peer-to-peer video streaming systems 总被引:2,自引:1,他引:2
Video-over-IP applications have recently attracted a large number of users on the Internet. Traditional client-server based
video streaming solutions incur expensive bandwidth provision cost on the server. Peer-to-Peer (P2P) networking is a new paradigm
to build distributed network applications. Recently, several P2P streaming systems have been deployed to provide live and
on-demand video streaming services on the Internet at low server cost. In this paper, we provide a survey on the existing
P2P solutions for live and on-demand video streaming. Representative P2P streaming systems, including tree, multi-tree and
mesh based systems are introduced. We describe the challenges and solutions of providing live and on-demand video streaming
in P2P environment. Open research issues on P2P video streaming are also discussed.
相似文献
Chao LiangEmail: |
17.
We consider the problem of designing an efficient and robust distributed random number generator for peer-to-peer systems that is easy to implement and works even if all communication channels are public. A robust random number generator is crucial for avoiding adversarial join–leave attacks on peer-to-peer overlay networks. We show that our new generator together with a light-weight rule recently proposed in [B. Awerbuch, C. Scheideler, Towards a scalable and robust DHT, in: Proc. of the 18th ACM Symp. on Parallel Algorithms and Architectures, SPAA, 2006. See also http://www14.in.tum.de/personen/scheideler] for keeping peers well distributed can keep various structured overlay networks in a robust state even under a constant fraction of adversarial peers. 相似文献
18.
In real-world applications, transactions usually consist of quantitative values. Many fuzzy data mining approaches have thus been proposed for finding fuzzy association rules with the predefined minimum support from the give quantitative transactions. However, the common problems of those approaches are that an appropriate minimum support is hard to set, and the derived rules usually expose common-sense knowledge which may not be interesting in business point of view. In this paper, an algorithm for mining fuzzy coherent rules is proposed for overcoming those problems with the properties of propositional logic. It first transforms quantitative transactions into fuzzy sets. Then, those generated fuzzy sets are collected to generate candidate fuzzy coherent rules. Finally, contingency tables are calculated and used for checking those candidate fuzzy coherent rules satisfy the four criteria or not. If yes, it is a fuzzy coherent rule. Experiments on the foodmart dataset are also made to show the effectiveness of the proposed algorithm. 相似文献
19.
Providing scalable video services in a peer-to-peer (P2P) environment is challenging. Since videos are typically large and
require high communication bandwidth for delivery, many peers may be unwilling to cache them in whole to serve others. In
this paper, we address two fundamental research problems in providing scalable P2P video services: (1) how a host can find
enough video pieces, which may scatter among the whole system, to assemble a complete video; and (2) given a limited buffer
size, what part of a video a host should cache and what existing data should be expunged to make necessary space. We address
these problems with two new ideas: Cell caching collaboration and Controlled Inverse Proportional (CIP) cache allocation. The Cell concept allows cost-effective caching collaboration in a fully distributed environment and
can dramatically reduce video lookup cost. On the other hand, CIP cache allocation challenges the conventional caching wisdom
by caching unpopular videos in higher priority. Our approach allows the system to retain many copies of popular videos to
avoid creating hot spots and at the same time, prevent unpopular videos from being quickly evicted from the system. We have
implemented a Gnutella-like simulation network and use it as a testbed to evaluate the proposed technique. Our extensive study
shows convincingly the performance advantage of the new scheme.
相似文献
Wallapak TavanapongEmail: |
20.
Jeng-Long ChiangAuthor VitaeYin-Yeh TsengAuthor Vitae Wen-Tsuen ChenAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(6):879-888
BitTorrent is a popular peer-to-peer file sharing system and a target file shared through BitTorrent is partitioned into pieces and downloaded from multiple peers in parallel in order to shorten the download process. However, due to peer dynamics in P2P networks, rare pieces may be lost and thus lead to the so-called last piece problem. BitTorrent employs rarest-first piece selection algorithm to deal with this problem, but its efficacy is limited because each peer only has a local view of piece rareness. In this paper, we propose an Interest-Intended Piece Selection (IIPS) algorithm aiming at better alleviating the last piece problem while maintaining stable cooperation between peers. IIPS is named interest intended in that every IIPS peer favors pieces that, if downloaded, would increase the probability of being interesting to its cooperating peers. Simulation results show that IIPS achieves less occurrences of piece loss under tough conditions and slightly outperforms the BitTorrent’s rarest-first algorithm in terms of higher piece diversity. 相似文献