期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

自适应区间配置在关联规则并行采掘中的作用 总被引：1，自引：0，他引：1

胡侃张伟荦夏绍玮《软件学报》2000,11(2):159-172

现行的采掘关联规则的并行算法基于经典的层次算法.该方法在每一次重复扫描数据库时都需要一次同步,这种同步运算对于共享内存多处理器并行机来说极大地降低了采掘性能,这种低效主要源于对共享的I/O通道的竞争.该文提出了在共享内存多处理机上采掘关联规则的异步算法APM.在APM中,所有参与计算的处理器能独立地产生备选集和计算支持度.而且,APM所需的扫描数据库的次数比层次方法所需的更少.该文还提出了一种增强APM的技术,使得该算法的性能对于数据分布更具有鲁棒性.文中实现了APM的变种算法,还实现了Apriori的并行版本Count Distribution算法.在SGI Power Challenge SMP并行机上,进行了性能分析,结果表明所提出的异步算法APM具有更好的性能和可扩展性. 相似文献

2.

Efficient mining of maximal frequent itemsets from databases on a cluster of workstations 总被引：2，自引：2，他引：0

Soon M. Chung Congnan Luo 《Knowledge and Information Systems》2008,16(3):359-391

In this paper, we propose two parallel algorithms for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. One parallel algorithm is named distributed max-miner (DMM), and it requires very low communication and synchronization overhead in distributed computing systems. DMM has the local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. This global mining phase using the prefix tree can work with any local mining algorithm. Another parallel algorithm, named parallel max-miner (PMM), is a parallel version of the sequential max-miner algorithm (Proc of ACM SIGMOD Int Conf on Management of Data, 1998, pp 85–93). Most of existing mining algorithms discover the frequent k-itemsets on the kth pass over the databases, and then generate the candidate (k + 1)-itemsets for the next pass. Compared to those level-wise algorithms, PMM looks ahead at each pass and prunes more candidate itemsets by checking the frequencies of their supersets. Both DMM and PMM were implemented on a cluster of workstations, and their performance was evaluated for various cases. They demonstrate very good performance and scalability even when there are large maximal frequent itemsets (i.e., long patterns) in databases.

Congnan LuoEmail:

相似文献

3.

A shared memory algorithm and proof for the generalized alternative construct in CSP

Richard M. Fujimoto Hwa-chung Feng 《International journal of parallel programming》1987,16(3):215-241

Communicating Sequential Processes (CSP) is a paradigm for communication and synchronization among distributed processes. The alternative construct is a key feature of CSP that allows nondeterministic selection of one among several possible communicants. A generalized version of Hoare's original alternative construct that allows output commands to be included in guards has been proposed. Previous algorithms for this construct assume a message passing architecture and are not appropriate for multiprocessor systems that feature shared memory. This paper describes a distributed algorithm for the generalized alternative construct that exploits the capabilities of a parallel computer with shared memory. A correctness proof of the proposed algorithm is presented to show that the algorithm conforms to somesatefy andliveness criteria. Extensions to allow termination of processes and to ensure fairness in guard selection are also given.This work was supported by ONR Contract Number N00014-87-K-0184. 相似文献

4.

Using processor affinity in loop scheduling on shared-memorymultiprocessors

Markatos E.P. LeBlanc T.J. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(4):379-400

Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. The authors consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. They show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. They propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. They compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. The authors conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds 相似文献

5.

Fast synchronization on shared-memory multiprocessors: An architectural approach

《Journal of Parallel and Distributed Computing》2005,65(10):1158-1170

Synchronization is a crucial operation in many parallel applications. Conventional synchronization mechanisms are failing to keep up with the increasing demand for efficient synchronization operations as systems grow larger and network latency increases.The contributions of this paper are threefold. First, we revisit some representative synchronization algorithms in light of recent architecture innovations and provide an example of how the simplifying assumptions made by typical analytical models of synchronization mechanisms can lead to significant performance estimate errors. Second, we present an architectural innovation called active memory that enables very fast atomic operations in a shared-memory multiprocessor. Third, we use execution-driven simulation to quantitatively compare the performance of a variety of synchronization mechanisms based on both existing hardware techniques and active memory operations. To the best of our knowledge, synchronization based on active memory outforms all existing spinlock and non-hardwired barrier implementations by a large margin. 相似文献

6.

Parallel Data Mining for Association Rules on Shared-Memory Systems 总被引：11，自引：1，他引：10

S. Parthasarathy M. J. Zaki M. Ogihara W. Li 《Knowledge and Information Systems》2001,3(1):1-29

In this paper we present a new parallel algorithm for data mining of association rules on shared-memory multiprocessors. We study the degree of parallelism, synchronization, and data locality issues, and present optimizations for fast frequency computation. Experiments show that a significant improvement of performance is achieved using our proposed optimizations. We also achieved good speed-up for the parallel algorithm. A lot of data-mining tasks (e.g. association rules, sequential patterns) use complex pointer-based data structures (e.g. hash trees) that typically suffer from suboptimal data locality. In the multiprocessor case shared access to these data structures may also result in false sharing. For these tasks it is commonly observed that the recursive data structure is built once and accessed multiple times during each iteration. Furthermore, the access patterns after the build phase are highly ordered. In such cases locality and false sharing sensitive memory placement of these structures can enhance performance significantly. We evaluate a set of placement policies for parallel association discovery, and show that simple placement schemes can improve execution time by more than a factor of two. More complex schemes yield additional gains. Received 24 May 1999 / Revised 20 June 2000 / Accepted in revised form 6 July 2000 相似文献

7.

The join algorithms on a shared-memory multiprocessor databasemachine

Qadah G.Z. Irani K.B. 《IEEE transactions on pattern analysis and machine intelligence》1988,14(11):1668-1683

The authors develop and present a large set of parallel algorithms for implementing the join operation on a shared-memory multiprocessor database machine. The development of these algorithms follows a structured approach. The major steps involved in the processing of the join operation by the machine are first identified. Then, alternative join algorithms are constructed by concatenating the different ways of performing these steps. A study of the performance of the proposed algorithms is presented. This study shows, among other things, that for a given hardware configuration there is not just one overall best performing join algorithm, but rather different algorithms score the best performance, depending on the characteristics of the data participating in the join operation 相似文献

8.

Fast,contention-free combining tree barriers for shared-memory multiprocessors 总被引：1，自引：0，他引：1

Michael L. Scott John M. Mellor-Crummey 《International journal of parallel programming》1994,22(4):449-481

In a previous article,⁽¹⁾ Gupta and Hill introduced anadaptive combining tree algorithm for busy-wait barrier synchronization on shared-memory multiprocessors. The intent of the algorithm was to achieve a barrier in logarithmic time when processes arrive simultaneously, and in constant time after the last arrival when arrival times are skewed. Afuzzy ⁽²⁾ version of the algorithm allows a process to perform useful work between the point at which it notifies other processes of its arrival and the point at which it waits for all other processes to arrive. Unfortunately, adaptive combining tree barriers as originally devised perform a large amount of work at each node of the tree, including the acquisition and release of locks. They also perform an unbounded number of accesses to nonlocal locations, inducing large amounts of memory and interconnect contention. We present new adaptive combining tree barriers that eliminate these problems. We compare the performance of the new algorithms to that of other fast barriers on a 64-node BBN Butterfly 1 multiprocessor, a 35-node BBN TC2000, and a 126-node KSR 1. The results reveal scenarios in which our algorithms outperform all known alternatives, and suggest that both adaptation and the combination of fuzziness with tree-style synchronization will be of increasing importance on future generations of shared-memory multiprocessors. At the University of Rochester, this work was supported in part by NSF Institutional Infrastructure grant number CDA-8822724 and ONR research contract number N00014-92-J-1801 (in conjunction with the ARPA Research in Information Science and Technology—High Performance Computing, Software Science and Technology program, ARPA Order No. 8930). At Rice University, this work was supported in part by NSF Cooperative Agreements CCR-8809615 and CCR-912008. 相似文献

9.

Greedy partitioned algorithms for the shortest-path problem 总被引：1，自引：0，他引：1

P. Adamson E. Tick 《International journal of parallel programming》1991,20(4):271-298

A partitioned, priority-queue algorithm for solving the single-source best-path problem is defined and evaluated. Finding single-source paths for sparse graphs is notable because of its definitelack of parallelism-no known algorithms are scalable. Qualitatively, we discuss the close relationships between our algorithm and previous work by Quinn, Chikayama, and others. Performance measurements of variations of the algorithm, implemented both in concurrent and imperative programming languages on a shared-memory multiprocessor, are presented. This quantitative analysis of the algorithms provides insights into the tradeoffs between complexity and overhead in graph-searching executed in high-level parallel languages with automatic task scheduling.Presently at Intel-PCED. 相似文献

10.

Analysis of a parallel volume rendering system based on theshear-warp factorization

Lacroute P. 《IEEE transactions on visualization and computer graphics》1996,2(3):218-231

This paper presents a parallel volume rendering algorithm that can render a 256×256×225 voxel medical data set at over 15 Hz and a 512×512×334 voxel data set at over 7 Hz on a 32-processor Silicon Graphics Challenge. The algorithm achieves these results by minimizing each of the three components of execution time: computation time, synchronization time, and data communication time. Computation time is low because the parallel algorithm is based on the recently-reported shear-warp serial volume rendering algorithm which is over five times faster than previous serial algorithms. The algorithm uses run-length encoding to exploit coherence and an efficient volume traversal to reduce overhead. Synchronization time is minimized by using dynamic load balancing and a task partition that minimizes synchronization events. Data communication costs are low because the algorithm is implemented for shared-memory multiprocessors, a class of machines with hardware support for low-latency fine-grain communication and hardware caching to hide latency. We draw two conclusions from our implementation. First, we find that on shared-memory architectures data redistribution and communication costs do not dominate rendering time. Second, we find that cache locality requirements impose a limit on parallelism in volume rendering algorithms. Specifically, our results indicate that shared-memory machines with hundreds of processors would be useful only for rendering very large data sets 相似文献

11.

Efficient External Memory Algorithms by Simulating Coarse-Grained Parallel Algorithms

Dehne Dittrich Hutchinson 《Algorithmica》2008,36(2):97-122

Abstract. External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACM Working Group on Storage I/ O for Large-Scale Computing. In this paper we provide a simulation technique which produces efficient parallel EM algorithms from efficient BSP-like parallel algorithms. The techniques obtained can accommodate one or multiple processors on the EM target machine, each with one or more disks, and they also adapt to the disk blocking factor of the target machine. When applied to existing BSP-like algorithms, our simulation technique produces improved parallel EM algorithms for a large number of problems. 相似文献

12.

快速挖掘频繁项集的并行算法 总被引：3，自引：0，他引：3

何波王华秋刘贞王越《计算机应用》2006,26(2):391-0392

传统的挖掘频繁项集的并行算法存在数据偏移、通信量大、同步次数较多和扫描数据库次数较多等问题。针对这些问题，提出了一种快速挖掘频繁项集的并行算法(FPMFI)。FPMFI算法让各计算机节点独立地计算局部频繁项集，然后与中心节点交互实现数据汇总，最终获得全局频繁项集。理论分析和实验结果表明FPMFI算法是有效的。 相似文献

13.

Parallel consistent labeling algorithms

Ashok Samal Tom Henderson 《International journal of parallel programming》1987,16(5):341-364

Mackworth and Freuder have analyzed the time complexity of several constraint satisfaction algorithms.⁽¹⁾ Mohr and Henderson have given new algorithms, AC-4 and PC-3, for arc and path consistency, respectively, and have shown that the arc consistency algorithm is optimal in time complexity and of the same order space complexity as the earlier algorithms.⁽²⁾ In this paper, we give parallel algorithms for solving node and arc consistency. We show that any parallel algorithm for enforcing are consistency in the worst case must have O(na) sequential steps, wheren is number of nodes, anda is the number of labels per node. We give several parallel algorithms to do arc consistency. It is also shown that they all have optimal time complexity. The results of running the parallel algorithms on a BBN Butterfly multiprocessor are also presented.This work was partially supported by NSF Grants MCS-8221750, DCR-8506393, and DMC-8502115. 相似文献

14.

Efficient strategies for parallel mining class association rules

《Expert systems with applications》2014,41(10):4716-4729

Mining class association rules (CARs) is an essential, but time-intensive task in Associative Classification (AC). A number of algorithms have been proposed to speed up the mining process. However, sequential algorithms are not efficient for mining CARs in large datasets while existing parallel algorithms require communication and collaboration among computing nodes which introduces the high cost of synchronization. This paper addresses these drawbacks by proposing three efficient approaches for mining CARs in large datasets relying on parallel computing. To date, this is the first study which tries to implement an algorithm for parallel mining CARs on a computer with the multi-core processor architecture. The proposed parallel algorithm is theoretically proven to be faster than existing parallel algorithms. The experimental results also show that our proposed parallel algorithm outperforms a recent sequential algorithm in mining time. 相似文献

15.

Trends in multiprocessor and distributed operating systems designs

Anand R. Tripathi Neeran M. Karnik 《The Journal of supercomputing》1995,9(1-2):23-49

This paper presents an overview of the developments in operating systems technology for distributed computing systems and multiprocessor machines. We focus on those design principles that are now widely accepted as useful design paradigms. Approaches common to distributed and multiprocessor operating systems are identified. Issues discussed include process scheduling and synchronization, load balancing, virtual and shared-memory management and parallel file systems. The task-thread model and the object model of computing are also discussed. 相似文献

16.

Synchronised execution on shared memory multiprocessors

Rhys Francis Ian Mathieson 《Parallel Computing》1988,8(1-3):165-175

Threads provides a mechanism for simulating the execution of parallel algorithms on a simplified model of a shared-memory multiprocessor. The algorithms can be expressed in a high-level block-structured language, which supports multiple threads of execution within a common body of program code. Results show an ability to achieve good speedup for small problems using algorithms derived by simple modifications of sequential algorithms. As well, a sibling thread synchronisation feature provides the basis for the synchronous execution of threads. k-parallel algorithms tailored to the machine size and implemented as synchronously executing iterations, can provide near linear speedup as the problem size is increased. The techniques described in this paper seem to promise an effective synchronous execution mode for shared-memory MIMD architectures. 相似文献

17.

Parallel mining of maximal sequential patterns using multiple samples

Congnan Luo Soon M. Chung 《The Journal of supercomputing》2012,59(2):852-881

In this paper, we propose a new parallel algorithm, named PMSPX, which mines maximal frequent sequences by using multiple samples to exclude infrequent candidates effectively. A frequent sequence is maximal if none of its supersequences is frequent. Unlike the traditional single-sample methods developed for mining frequent itemsets, PMSPX uses multiple samples. Thus, it can avoid or alleviate some problems inherent in the single-sample methods. We theoretically analyzed how to increase the minimum support level to prevent misestimating infrequent candidates as frequent in the mining of samples. PMSPX is a parallel version of our sequential MSPX algorithm, and it is developed on a cluster of workstations. In PMSPX, each processing node uses MSPX to find a candidate set of local maximal frequent sequences first, independently from other processing nodes. Then, a top-down search is performed, starting with all the candidates, in a synchronous manner to identify real maximal frequent sequences. This asynchronous local mining followed by synchronous global mining approach minimizes the synchronization and communication among the processing nodes. Three database partitioning methods are proposed to distribute the database across the processing nodes, so that their workloads are balanced and the data skewness of the whole database is preserved in the data partition of each node. A comprehensive analysis was performed on PMSPX and existing parallel sequence mining algorithms, and extensive experiments were conducted on PMSPX. PMSPX demonstrates very good speedup and scaleup properties. It also requires less communication and synchronization than other parallel algorithms. 相似文献

18.

稀疏有限元线性系统的并行算法实现

下载免费PDF全文

张哲《计算机工程与应用》2010,46(29):47-49

在对称多处理机系统上,提出了一种求解稀疏对称有限元线性系统的正规化精确并行逆算法。该算法以一种避免数据依赖的反对角运动方法为基础,使用OpenMP编译指导来实现。诸如加速比和效率等数值实验结果的推出,说明在一个对称多处理机系统上,所提出的算法求解方法能更好地提高性能,获得更大的加速。相似文献

19.

一种有效的并行频繁项集挖掘算法* 总被引：1，自引：0，他引：1

王丹阳田卫东胡学钢《计算机应用研究》2008,25(11):3332-3334

传统的挖掘频繁项集的并行算法存在各节点间负载不均衡、同步开销过大、通信量大等问题。针对这些问题,提出了一种多次传送重新分配数据的并行算法(MRPD)。MRPD算法在第l步时将数据库重新划分成若干组,并根据各节点的需要多次传送分组;各节点获得完整分组后异步地计算频繁项集;所有节点计算完成后,得到全部频繁项集。理论分析和实验结果表明MRPD算法是有效的。相似文献

20.

Two algorithms for barrier synchronization 总被引：5，自引：0，他引：5

Debra Hensgen Raphael Finkel Udi Manber 《International journal of parallel programming》1988,17(1):1-17

We describe two new algorithms for implementing barrier synchronization on a shared-memory multicomputer. Both algorithms are based on a method due to Brooks. We first improve Brooks' algorithm by introducing double buffering. Our dissemination algorithm replaces Brook's communication pattern with an information dissemination algorithm described by Han and Finkel. Our tournament algorithm uses a different communication pattern and generally requires fewer total instructions. The resulting algorithms improve Brook's original barrier by a factor of two when the number of processes is a power of two. When the number of processes is not a power of two, these algorithms improve even more upon Brooks' algorithm because absent processes need not be simulated. These algorithms share with Brooks' barrier the limitation that each of then processes meeting at the barrier must be assigned identifiersi such that 0i<n. 相似文献