期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Accelerated multiplicative updates and hierarchical ALS algorithms for nonnegative matrix factorization

Gillis N Glineur F 《Neural computation》2012,24(4):1085-1105

Nonnegative matrix factorization (NMF) is a data analysis technique used in a great variety of applications such as text mining, image processing, hyperspectral data analysis, computational biology, and clustering. In this letter, we consider two well-known algorithms designed to solve NMF problems: the multiplicative updates of Lee and Seung and the hierarchical alternating least squares of Cichocki et al. We propose a simple way to significantly accelerate these schemes, based on a careful analysis of the computational cost needed at each iteration, while preserving their convergence properties. This acceleration technique can also be applied to other algorithms, which we illustrate on the projected gradient method of Lin. The efficiency of the accelerated algorithms is empirically demonstrated on image and text data sets and compares favorably with a state-of-the-art alternating nonnegative least squares algorithm. 相似文献

2.

A Parallel Genetic Algorithm Based on Spark for Pairwise Test Suite Generation

下载免费PDF全文

Rong-Zhi Qi Zhi-Jian Wang Shui-Yan Li 《计算机科学技术学报》2016,31(2):417-427

Pairwise testing is an effective test generation technique that requires all pairs of parameter values to be covered by at least one test case. It has been proven that generating minimum test suite is an NP-complete problem. Genetic algorithms have been used for pairwise test suite generation by researchers. However, it is always a time-consuming process, which leads to significant limitations and obstacles for practical use of genetic algorithms towards large-scale test problems. Parallelism will be an effective way to not only enhance the computation performance but also improve the quality of the solutions. In this paper, we use Spark, a fast and general parallel computing platform, to parallelize the genetic algorithm to tackle the problem. We propose a two-phase parallelization algorithm including fitness evaluation parallelization and genetic operation parallelization. Experimental results show that our algorithm outperforms the sequential genetic algorithm and competes with other approaches in both test suite size and computational performance. As a result, our algorithm is a promising improvement of the genetic algorithm for pairwise test suite generation. 相似文献

3.

Parallel Algorithms for Dynamic Shortest Path Problems

Ismail Chabini & Sridevi Ganugapati 《International Transactions in Operational Research》2002,9(3):279-302

The development of intelligent transportation systems (ITS) and the resulting need for the solution of a variety of dynamic traffic network models and management problems require faster‐than‐real‐time computation of shortest path problems in dynamic networks. Recently, a sequential algorithm was developed to compute shortest paths in discrete time dynamic networks from all nodes and all departure times to one destination node. The algorithm is known as algorithm DOT and has an optimal worst‐case running‐time complexity. This implies that no algorithm with a better worst‐case computational complexity can be discovered. Consequently, in order to derive algorithms to solve all‐to‐one shortest path problems in dynamic networks, one would need to explore avenues other than the design of sequential solution algorithms only. The use of commercially‐available high‐performance computing platforms to develop parallel implementations of sequential algorithms is an example of such avenue. This paper reports on the design, implementation, and computational testing of parallel dynamic shortest path algorithms. We develop two shared‐memory and two message‐passing dynamic shortest path algorithm implementations, which are derived from algorithm DOT using the following parallelization strategies: decomposition by destination and decomposition by transportation network topology. The algorithms are coded using two types of parallel computing environments: a message‐passing environment based on the parallel virtual machine (PVM) library and a multi‐threading environment based on the SUN Microsystems Multi‐Threads (MT) library. We also develop a time‐based parallel version of algorithm DOT for the case of minimum time paths in FIFO networks, and a theoretical parallelization of algorithm DOT on an ‘ideal’ theoretical parallel machine. Performances of the implementations are analyzed and evaluated using large transportation networks, and two types of parallel computing platforms: a distributed network of Unix workstations and a SUN shared‐memory machine containing eight processors. Satisfactory speed‐ups in the running time of sequential algorithms are achieved, in particular for shared‐memory machines. Numerical results indicate that shared‐memory computers constitute the most appropriate type of parallel computing platforms for the computation of dynamic shortest paths for real‐time ITS applications. 相似文献

4.

Parallelism and evolutionary algorithms 总被引：13，自引：0，他引：13

Alba E. Tomassini M. 《Evolutionary Computation, IEEE Transactions on》2002,6(5):443-462

This paper contains a modern vision of the parallelization techniques used for evolutionary algorithms (EAs). The work is motivated by two fundamental facts: 1) the different families of EAs have naturally converged in the last decade while parallel EAs (PEAs) are still lack of unified studies; and 2) there is a large number of improvements in these algorithms and in their parallelization that raise the need for a comprehensive survey. We stress the differences between the EA model and its parallel implementation throughout the paper. We discuss the advantages and drawbacks of PEAs. Also, successful applications are mentioned and open problems are identified. We propose potential solutions to these problems and classify the different ways in which recent results in theory and practice are helping to solve them. Finally, we provide a highly structured background relating to PEAs in order to make researchers aware of the benefits of decentralizing and parallelizing an EA 相似文献

5.

An Algorithm Template for Domain-Based Parallel Irregular Algorithms

Carlos H. González Basilio B. Fraguela 《International journal of parallel programming》2014,42(6):948-967

The parallelization of irregular algorithms has not been as widely studied as the one of regular codes. In particular, while there are many proposals of parallel skeletons and libraries very well suited to regular algorithms, this is not the case for irregular ones. This is probably due to the complexity of finding common patterns, behaviors and semantics in these algorithms. This is unfortunate, as the parallelization of irregular algorithms would benefit even more than that of regular codes from the higher degree of abstraction provided by skeletons. This work proposes to exploit the concept of domain defined on some property of the elements to process in order to enable the simple and effective parallelization of irregular applications. Namely, we propose to use such domains both to decompose the computations in parallel tasks and to detect and avoid conflicts between these tasks. A generic C++ library providing a skeleton for multicore systems built on this idea is described and evaluated. Our experimental results show that this library is a very practical tool for the parallelization of irregular algorithms with little programming effort. 相似文献

6.

Repacking helps in bounded space on-line bind-packing

G. Galambos G. J. Woeginger 《Computing》1993,49(4):329-338

We consider a version of the on-line bounded-space bin-packing problem where repacking the items within the active bins is allowed. For this problem, the 1.69103 lower bound of Lee and Lee [7] for the worst case ratios of bounded-space approximation algorithms still applies. We present a polynomial time approximation algorithm that reaches the best possible worst case ratio matching the Lee and Lee lower bound while using onlythree active bins. 相似文献

7.

Hybrid MPI + OpenMP parallelization of an FFT-based 3D Poisson solver with one periodic direction 总被引：1，自引：0，他引：1

A. Gorobets F.X. Trias R. Borrell O. Lehmkuhl A. Oliva 《Computers & Fluids》2011,49(1):101-109

This work is devoted to the development of efficient parallel algorithms for the direct numerical simulation (DNS) of incompressible flows on modern supercomputers. In doing so, a Poisson equation needs to be solved at each time-step to project the velocity field onto a divergence-free space. Due to the non-local nature of its solution, this elliptic system is the part of the algorithm that is most difficult to parallelize. The Poisson solver presented here is restricted to problems with one uniform periodic direction. It is a combination of a block preconditioned Conjugate Gradient (PCG) and an FFT diagonalization. The latter decomposes the original system into a set of mutually independent 2D systems that are solved by means of the PCG algorithm. For the most ill-conditioned systems, that correspond to the lowest Fourier frequencies, the PCG is replaced by a direct Schur-complement based solver.The previous version of the Poisson solver was conceived for single-core (also dual-core) processors and therefore, the distributed memory model with message-passing interface (MPI) was used. The irruption of multi-core architectures motivated the use of a two-level hybrid MPI + OpenMP parallelization with the shared memory model on the second level. Advantages and implementation details for the additional OpenMP parallelization are presented and discussed in this paper. Numerical experiments show that, within its range of efficient scalability, the previous MPI-only parallelization is slightly outperformed by the MPI + OpenMP approach. But more importantly, the hybrid parallelization has allowed to significantly extend the range of efficient scalability. Here, the solver has been successfully tested up to 12800 CPU cores for meshes with up to 10⁹ grid points. However, estimations based on the presented results show that this range can be potentially stretched up until 200,000 cores approximately. Finally, several examples of DNS simulations are briefly presented to illustrate some potential applications of the solver. 相似文献

8.

A Memory and Computation Efficient Sparse Level-Set Method

Wladimir J. van der Laan Andrei C. Jalba Jos B. T. M. Roerdink 《Journal of scientific computing》2011,46(2):243-264

Since its introduction, the level set method has become the favorite technique for capturing and tracking moving interfaces, and found applications in a wide variety of scientific fields. In this paper we present efficient data structures and algorithms for tracking dynamic interfaces through the level set method. Several approaches which address both computational and memory requirements have been very recently introduced. We show that our method is up to 8.5 times faster than these recent approaches. More importantly, our algorithm can greatly benefit from both fine- and coarse-grain parallelization by leveraging SIMD and/or multi-core parallel architectures. 相似文献

9.

Scheduling with Conflicts on Bipartite and Interval Graphs

Sandy Irani Vitus Leung 《Journal of Scheduling》2003,6(3):287-307

In this paper, we consider the on-line scheduling of jobs that may be competing for mutually exclusive resources. We model the conflicts between jobs with a conflict graph, so that the set of all concurrently running jobs must form an independent set in the graph. This model is natural and general enough to have applications in a variety of settings; however, we are motivated by the following two specific applications: traffic intersection control and session scheduling in high speed local area networks with spatial reuse. Our results focus on two special classes of graphs motivated by our applications: bipartite graphs and interval graphs. The cost function we use is maximum response time. In all of the upper bounds, we devise algorithms which maintain a set of invariants which bound the accumulation of jobs on cliques (in the case of bipartite graphs, edges) in the graph. The lower bounds show that the invariants maintained by the algorithms are tight to within a constant factor. For a specific graph which arises in the traffic intersection control problem, we show a simple algorithm which achieves the optimal competitive ratio. 相似文献

10.

An efficient dynamic scheduling algorithm for multiprocessorreal-time systems

Manimaran G. Murthy C.S.R. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(3):312-319

Many time-critical applications require predictable performance and tasks in these applications have deadlines to be met. In this paper, we propose an efficient algorithm for nonpreemptive scheduling of dynamically arriving real-time tasks (aperiodic tasks) in multiprocessor systems. A real-time task is characterized by its deadline, resource requirements, and worst case computation time on p processors, where p is the degree of parallelization of the task. We use this parallelism in tasks to meet their deadlines and, thus, obtain better schedulability compared to nonparallelizable task scheduling algorithms. To study the effectiveness of the proposed scheduling algorithm, we have conducted extensive simulation studies and compared its performance with the myopic scheduling algorithm. The simulation studies show that the schedulability of the proposed algorithm is always higher than that of the myopic algorithm for a wide variety of task parameters 相似文献

11.

流域变换建模及其算法研究的新进展 总被引：9，自引：0，他引：9

下载免费PDF全文

周海芳蒋艳凰杨学军《中国图象图形学报》2003,8(1):8-14

流域变换是数学形态学中用于图象分割的一种经典方法 .虽然流域变换曾因运算量大、效率低而使得其研究工作遭到冷遇 ,但也因此出现了一些新的理论和算法 ,并随着并行手段的引入 ,又使其重新成为研究的热点 ;同时就近期许多研究成果而言 ,形式化模型的多样性 ,使得流域变换的定义、算法和实现 ,尚缺乏统一的描述和全面的总结 .针对这一情况 ,首先给出了连续域流域变换的严格数学模型和两种离散情况下典型的形式化定义 ;然后分类总结了近年来 ,流域变换算法实现的新进展 ;最后提出了有待进一步研究的问题 . 相似文献

12.

Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination

《Parallel Computing》2016

We present block algorithms and their implementation for the parallelization of sub-cubic Gaussian elimination on shared memory architectures. Contrarily to the classical cubic algorithms in parallel numerical linear algebra, we focus here on recursive algorithms and coarse grain parallelization. Indeed, sub-cubic matrix arithmetic can only be achieved through recursive algorithms making coarse grain block algorithms perform more efficiently than fine grain ones. This work is motivated by the design and implementation of dense linear algebra over a finite field, where fast matrix multiplication is used extensively and where costly modular reductions also advocate for coarse grain block decomposition. We incrementally build efficient kernels, for matrix multiplication first, then triangular system solving, on top of which a recursive PLUQ decomposition algorithm is built. We study the parallelization of these kernels using several algorithmic variants: either iterative or recursive and using different splitting strategies. Experiments show that recursive adaptive methods for matrix multiplication, hybrid recursive–iterative methods for triangular system solve and tile recursive versions of the PLUQ decomposition, together with various data mapping policies, provide the best performance on a 32 cores NUMA architecture. Overall, we show that the overhead of modular reductions is more than compensated by the fast linear algebra algorithms and that exact dense linear algebra matches the performance of full rank reference numerical software even in the presence of rank deficiencies. 相似文献

13.

A Parallelization Approach for Hard Real-Time Systems and Its Application on Two Industrial Programs

Martin?Frieb Email author Ralf?Jahr Haluk?Ozaktas Andreas?Hugl Hans?Regler Theo?Ungerer 《International journal of parallel programming》2016,44(6):1296-1336

Applications in industry often have grown and improved over many years. Since their performance demands increase, they also need to benefit from the availability of multi-core processors. However, a reimplementation from scratch and even a restructuring of these industrial applications is very expensive, often due to high certification efforts. Therefore, a strategy for a systematic parallelization of legacy code is needed. We present a parallelization approach for hard real-time systems, which ensures a high reusage of legacy code and preserves timing analysability. To show its applicability, we apply it on the core algorithm of an avionics application as well as on the control program of a large construction machine. We create models of the legacy programs showing the potential of parallelism, optimize them and change the source codes accordingly. The parallelized applications are placed on a predictable multi-core processor with up to 18 cores. For evaluation, we compare the worst case execution times and their speedups. Furthermore, we analyse limitations coming up at the parallelization process. 相似文献

14.

A generalized divergence measure for nonnegative matrix factorization

Kompass R 《Neural computation》2007,19(3):780-791

This letter presents a general parametric divergence measure. The metric includes as special cases quadratic error and Kullback-Leibler divergence. A parametric generalization of the two different multiplicative update rules for nonnegative matrix factorization by Lee and Seung (2001) is shown to lead to locally optimal solutions of the nonnegative matrix factorization problem with this new cost function. Numeric simulations demonstrate that the new update rule may improve the quadratic distance convergence speed. A proof of convergence is given that, as in Lee and Seung, uses an auxiliary function known from the expectation-maximization theoretical framework. 相似文献

15.

Optimizing parcel exchange among landowners: A soft alternative to land consolidation

《Computers, Environment and Urban Systems》2020

For decades, public policy has favored the use of land consolidation to reduce the fragmentation of land ownership. Private actors, on the other hand, have focused on the purchase, rental and exchange of land plots. Plot exchange can be very useful in the restructuring of holdings, particularly when a large number of owners participate; however, the number of possible exchange combinations grows very quickly with the number of participating landowners and parcels. Finding an acceptable exchange solution can easily become challenging. In this paper we evaluate the practical use of a support system for land exchange processes. The system is based on the use of genetic algorithms, a particular kind of heuristics that loosely replicate the rules of evolution and natural selection. We assess the influence of the geometric distribution of parcels in the quality of the solution, as well as usefulness and performance of the system, via parallelization techniques. The proposed algorithm (GA-PE, Genetic Algorithm for Parcel Exchange) is tested with regards to several parameters, from several alternatives for certain steps of the algorithm to the resource distribution for the parallelizations implemented. We tested the algorithm in 6 different real and representative test cases, and provide results with different metrics. With the positive results obtained, we argue that land exchange is a process worth considering for private actors, and that genetic algorithms can be used to propose fair exchanges, even in complex scenarios, shortening in a meaningful way the time usually required to perform administrative procedures associated to land fragmentation problems. 相似文献

16.

Parallel Position Weight Matrices algorithms

Mathieu Giraud Jean-Stéphane Varré 《Parallel Computing》2011,37(8):466-478

Position Weight Matrices (PWMs) are broadly used in computational biology. The basic problems, Scan and MultipleScan, aim to find all the occurrences of a given PWM or a set of PWMs in long sequences. Some other PWM tasks share a common NP-hard subproblem, ScoreDistribution. The existing algorithms rely on the enumeration on a large set of scores or words, and they are mostly not suitable for parallelization. We propose a new algorithm, BucketScoreDistribution, that is both very efficient and suitable for parallelization. We bound the error induced by this algorithm. We realized a GPU prototype for Scan, MultipleScan and BucketScoreDistribution with the CUDA libraries, and report for the different problems speedups larger than 10× on several Nvidia cards. 相似文献

17.

On Optimization and Parallelization of the Little Algorithm for Solving the Travelling Salesman Problem

V. V. Vasilchikov 《Automatic Control and Computer Sciences》2017,51(7):551-557

The paper describes some ways to speed up solution of the NP-complete Traveling Salesman Problem. The classic Little algorithm, belonging to the class of branch-and-bound methods, can solve it both for directed and undirected graphs. For undirected graphs, however, its speed can be increased by eliminating the branches examined earlier from further consideration. We propose changes to be made in the key operations of the algorithm to speed up execution. In addition, we describe the results of an experiment with a significant increase in the speed of solving of the problem by the advanced algorithm. Another way to speed up the solution procedure is to parallelize the algorithm. For problems of this kind, it is difficult to decompose the task into a sufficient number of subtasks that have comparable complexity. Their parallelism arises dynamically during the execution. For such problems, it seems reasonable to use parallel-recursive algorithms. In our case, the RPM_ParLib library developed by the author is a good approach, enabling the development of high-performance applications for parallel computing on a local network using any.NET-compatible programming language. We selected C# as the programming language. Parallel applications were developed to implement the basic and modified algorithms, as well as to compare them in terms of speed. Experiments were performed for the graphs with up to 45 vertexes and up to 16 network computers. We also investigated the speed increase that can be achieved by parallelizing the basic Little algorithm for directed graphs. The results of these experiments are also presented in the paper. 相似文献

18.

Extracting Multilingual Lexicons from Parallel Corpora

Dan Tufi Ana Maria Barbu Radu Ion 《Computers and the Humanities》2004,38(2):163-189

The paper describes our recent developments in automatic extraction of translation equivalents from parallel corpora. We describe three increasingly complex algorithms: a simple baseline iterative method, and two non-iterative more elaborated versions. While the baseline algorithm is mainly described for illustrative purposes, the non-iterative algorithms outline the use of different working hypotheses which may be motivated by different kinds of applications and to some extent by the languages concerned. The first two algorithms rely on cross-lingual POS preservation, while with the third one POS invariance is not an extraction condition. The evaluation of the algorithms was conducted on three different corpora and several pairs of languages. 相似文献

19.

Massively Parallel Hierarchical Scene Processing with Applications in Rendering

Marek Vinkler Jiří Bittner Vlastimil Havran Michal Hapala 《Computer Graphics Forum》2013,32(8):13-25

We present a novel method for massively parallel hierarchical scene processing on the GPU, which is based on sequential decomposition of the given hierarchical algorithm into small functional blocks. The computation is fully managed by the GPU using a specialized task pool which facilitates synchronization and communication of processing units. We present two applications of the proposed approach: construction of the bounding volume hierarchies and collision detection based on divide‐and‐conquer ray tracing. The results indicate that using our approach we achieve high utilization of the GPU even for complex hierarchical problems which pose a challenge for massive parallelization. The results indicate that using our approach we achieve high utilization of the GPU even for complex hierarchical problems which pose a challenge for massive parallelization. 相似文献

20.

Communication latency tolerant parallel algorithm for particle swarm optimization 总被引：1，自引：0，他引：1

Bo Li Koichi Wada 《Parallel Computing》2011,37(1):1-10

Particle swarm optimization (PSO) algorithm is a population-based algorithm for finding the optimal solution. Because of its simplicity in implementation and fewer adjustable parameters compared to the other global optimization algorithms, PSO is gaining attention in solving complex and large scale problems. However, PSO often requires long execution time to solve those problems. This paper proposes a parallel PSO algorithm, called delayed exchange parallelization, which improves performance of PSO on distributed environment by hiding communication latency efficiently. By overlapping communication with computation, the proposed algorithm extracts parallelism inherent in PSO. The performance of our proposed parallel PSO algorithm was evaluated using several applications. The results of evaluation showed that the proposed parallel algorithm drastically improved the performance of PSO, especially in high-latency network environment. 相似文献