首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
广义Hermitian特征问题并行求解器的性能依赖于所选择的并行算法和矩阵的分布策略等诸多方面.基于块存储和快算法策略,提出了一个新的标准化转化的并行算法,该并行算法将Cholesky分解结合到广义特征问题标准化转换中, 降低了已有并行算法的通信开销,并增加了算法的并行性.新算法可显著改善已有并行算法的性能和可扩展性.另外给出了一个有效求解具有多个右端项的三角矩阵方程AX=B的并行块算法.通过自主开发的特征问题并行软件包PSEPS的测试结果表明,并行算法比传统的并行算法快大约1倍,并具有较好的可扩展性.  相似文献   

2.
In this paper, several efficient rearrangement algorithms are proposed to find the optimal shape and topology for elliptic eigenvalue problems with inhomogeneous structures. The goal is to solve minimization and maximization of the k-th eigenvalue and maximization of spectrum ratios of the second order elliptic differential operator. Physically, these problems are motivated by the frequency control based on density distribution of vibrating membranes. The methods proposed are based on Rayleigh quotient formulation of eigenvalues and rearrangement algorithms which can handle topology changes automatically. Due to the efficient rearrangement strategy, the new proposed methods are more efficient than classical level set approaches based on shape and/or topological derivatives. Numerous numerical examples are provided to demonstrate the robustness and efficiency of new approach.  相似文献   

3.
This paper continues the 2012 STACS contribution by Diekert, Ushakov, and the author as well as the 2012 IJAC publication by the same authors. We extend the results published there in two ways. First, we show that the data structure of power circuits can be generalized to work with arbitrary bases q≥2. This results in a data structure that can hold huge integers, arising by iteratively forming powers of q. We show that the properties of power circuits known for q=2 translate to the general case. This generalization is non-trivial and additional techniques are required to preserve the time bounds of arithmetic operations that were shown for the case q=2. The extended power circuit model permits us to conduct operations in the Baumslag-Solitar group BS(1,q) as efficiently as in BS(1,2). This allows us to solve the word problem in the generalization H 4(1,q) of Higman’s group in polynomial time. The group H 4(1,q) is an amalgamated product of four copies of the Baumslag-Solitar group BS(1,q) rather than BS(1,2) in the original group H 4=H 4(1,2). As a second result, we allow arbitrary numbers f≥4 of copies of BS(1,q), leading to an even more generalized notion of Higman groups H f (1,q). We prove that the word problem of the latter can still be solved within the \({\mathcal {O}}(n^{6})\) time bound that was shown for H 4(1,2).  相似文献   

4.
Efficient Algorithms for MultiPolynomial Resultant   总被引:2,自引:0,他引:2  
Manocha  D. 《Computer Journal》1993,36(5):485-496
  相似文献   

5.
We investigate in this paper the performance of parallel algorithms for computing the controllable part of a control linear system, with application to the computation of minimal realizations. Our approach is based on a method that transforms the matrices of the system to block Hessenberg form by using rank-revealing orthogonal factorizations.The experimental analysis on a high performance architecture includes two rank-revealing numerical tools: the SVD and the rank-revealing QR factorizations. Results are also reported, using the rank-revealing QR factorizations, on a parallel distributed architecture.  相似文献   

6.
The Burrows-Wheeler transformation is used for effective data compression, e.g., in the well known program bzip2. Compression and decompression are done in a block-wise fashion; larger blocks usually result in better compression rates. With the currently used algorithms for decompression, 4n bytes of auxiliary memory for processing a block of n bytes are needed, 0<n<232. This may pose a problem in embedded systems (e.g., mobile phones), where RAM is a scarce resource.  相似文献   

7.
8.
针对不同干扰和噪声情况下的量子状态估计和滤波问题,分别提出相应的高效量子状态密度矩阵重构凸优化算法.对于稀疏状态干扰和测量噪声同时存在的情况,提出量子状态滤波算法.对分别存在稀疏状态干扰和测量噪声的情况,提出相应两种不同的量子状态估计算法.在5量子位的状态密度矩阵估计仿真实验中分析不同采样率下的3种算法性能.实验表明,3种算法均具有较低的计算复杂度、较快的收敛速度和较低的估计误差.  相似文献   

9.
Atallah  Chen  Daescu 《Algorithmica》2003,35(3):194-215
Planar st -graphs find applications in a number of areas. In this paper we present efficient parallel algorithms for solving several fundamental problems on planar st -graphs. The problems we consider include all-pairs shortest paths in weighted planar st -graphs, single-source shortest paths in weighted planar layered digraphs (which can be reduced to single-source shortest paths in certain special planar st -graphs), and depth-first search in planar st -graphs. Our parallel shortest path techniques exploit the specific geometric and graphic structures of planar st -graphs, and involve schemes for partitioning planar st -graphs into subgraphs in a way that ensures that the resulting path length matrices have a monotonicity property [1], [2]. The parallel algorithms we obtain are a considerable improvement over the previously best known solutions (when they are applied to these st -graph problems), and are in fact relatively simple. The parallel computational models we use are the CREW PRAM and EREW PRAM.  相似文献   

10.
In this paper, we present optimal O(log n) time, O(n/log n) processor EREW PRAM parallel algorithms for finding the connected components, cut vertices, and bridges of a permutation graph. We also present an O(log n) time, O(n) processor, CREW PRAM model parallel algorithm for finding a Breadth First Search (BFS) spanning tree of a permutation graph rooted at vertex 1 and use the same to derive an efficient parallel algorithm for the All Pairs Shortest Path problem on permutation graphs.  相似文献   

11.
Algorithms for determining quality/cost/price tradeoffs in saturated markets are considered. A product is modeled by d real-valued qualities whose sum determines the unit cost of producing the product. This leads to the following optimization problem: given a set of n customers, each of whom has certain minimum quality requirements and a maximum price they are willing to pay, design a new product and select a price for that product in order to maximize the resulting profit.  相似文献   

12.
RSA密码系统有效实现算法   总被引:7,自引:0,他引:7  
本文提出了实现RSA算法的一种快速、适合于硬件实现的方案,在该方案中,我们作用加法链将求幂运算转化为求平方和乘法运算并大大降低了运算的次数,使用Montgomery算法将模N乘法转化为模R(基数)的算法,模R乘积的转化,以及使用一种新的数母加法器作为运算部件的基础。  相似文献   

13.
Efficient Algorithms for Interface Timing Verification   总被引:2,自引:0,他引:2  
This paper presents algorithms for computing separations between events that are constrained to obey prespecified relationships in their relative time of occurrence. The algorithms are useful for interface timing verification, where event separations are checked against timing requirements. The first algorithm computes separations when only linear and max constraints exist. The algorithm must converge to correct maximum separation values in a finite number of steps, or report an inconsistence of the constraints, irrespective of the existence of infinite constraint bounds or infinite event separations. It is conjectured to run in time, where V is the number of events, and E is the number of relationships between them. The other algorithms extend the first, and compute event separations in the NP-complete version of the problem where min constraints exist. Experiments demonstrate the algorithms are efficient in practice.  相似文献   

14.
基于复合域上的椭圆曲线密码体制的计算算法   总被引:3,自引:0,他引:3  
基于有限域上椭圆曲经公开密钥协议的离散对数计算算法正日益成为热点,其基本的操作是标量乘法:即用一整数乘以椭圆曲线上给定的点P。协议的主要开锁在于椭圆曲线的标量乘操作上,本文给出了3个逄法进行椭圆曲线密码系统的有效计算,第一个算法采用加-减法链的方法处理标量乘法问题;第二个算法给出了正整数n的NAF形式;第三个算法采用窗口的方法处理NAF(n)从而进一步提高加-减法链的效率,这三个算法的有机结合从银大程度上提高了椭圆曲线密码体制的加/解密速度。  相似文献   

15.
This paper considers the Byzantine agreement problem in a completely connected network of anonymous processors. In this network model the processors have no identifiers and can only detect the link through which a message is delivered. We present a polynomial-time agreement algorithm that requires 3(nt)t/(n−2t)+4 rounds, where n>3t is the number of processors and t is the maximal number of faulty processors that the algorithm can tolerate. We also present an early-stopping variant of the algorithm.  相似文献   

16.
There are several reconfiguring-network models of parallel computation that are considered in the published literature, depending on their switching capabilities. Can these reconfigurable models be the basis for the design of massively parallel computers? Perhaps the most fundamental related issue is virtual parallelism, or the self-simulation problem: Given an algorithm which is designed for a large reconfigurable mesh, can it be executed efficiently on a smaller reconfigurable mesh? In this work, we give several positive answers to the self-simulation problem. We show that the simulation of a reconfiguring mesh by a smaller one can be carried optimally and using standard methods on the model in which buses are established along rows or along columns. A novel technique is shown to achieve asymptotically optimal self simulation on models which allow buses to switch column and row edges, provided that a bus is a "linear" path of connected edges. Finally, for models in which a bus is any subgraph of the underlying mesh, efficient simulations are presented, paying by an extra factor which is polylogarithmic in the size of the simulated mesh. Although the self-simulation algorithms are complex and require extensive bookkeeping operations, the required space is asymptotically optimal.  相似文献   

17.
唐勇  许金玲 《微处理机》2007,28(3):63-65
大整数模幂乘运算一直是制约RSA广泛应用的瓶颈,在对传统算法剖析的基础上,提出了一种新的快速模乘算法,借鉴生成Wallace tree的思想,结合查找表和并行乘法运算进行RSA模幂运算。理论分析和试验证明新算法时间复杂度降低到O(logn)。  相似文献   

18.
We study the problem of computing the k maximum sum subsequences. Given a sequence of real numbers and an integer parameter k, the problem involves finding the k largest values of for The problem for fixed k = 1, also known as the maximum sum subsequence problem, has received much attention in the literature and is linear-time solvable. Recently, Bae and Takaoka presented a -time algorithm for the k maximum sum subsequences problem. In this paper we design an efficient algorithm that solves the above problem in time in the worst case. Our algorithm is optimal for and improves over the previously best known result for any value of the user-defined parameter k < 1. Moreover, our results are also extended to the multi-dimensional versions of the k maximum sum subsequences problem; resulting in fast algorithms as well.  相似文献   

19.
树上推广的Multicut问题的近似算法   总被引:3,自引:0,他引:3  
给定边上有费用的树T,终端集合族X={S\\-1,S\\-2,…,S\\-l},推广的Multicut问题询问费用最小的边集,使得在树上删除边集中的边能够断开每一个终端集.推广的Multicut问题有其独立的研究意义,因为该问题分别是经典的Multicut问题和Multiway Cut问题的自然推广,同时也是推广的Steiner Forest问题的对偶问题.树上推广的Multicut问题的完全版本可以归约到树上经典的Multicut问题近似求解.对于该问题的Prize-collecting版本,给出了原始对偶的3倍近似算法.对于该问题的k版本,通过非一致的途径给出了近似比为min{2(l-k+1),k}的近似算法.以及找到了该问题的k版本与k-稠密子图问题之间的一个有趣的联系,从而证明了将k版本的树上推广的Multicut问题近似到O(n\\+\\{16-ε\\})以内是困难的(对某个小的常数ε>0).  相似文献   

20.
The block-cyclic data distribution is commonly used to organize array elements over the processors of a coarse-grained distributed memory parallel computer. In many scientific applications, the data layout must be reorganized at run-time in order to enhance locality and reduce remote memory access overheads. In this paper we present a general framework for developing array redistribution algorithms. Using this framework, we have developed efficient algorithms that redistribute an array from one block-cyclic layout to another. Block-cyclic redistribution consists of index set computation , wherein the destination locations for individual data blocks are calculated, and data communication , wherein these blocks are exchanged between processors. The framework treats both these operations in a uniform and integrated way. We have developed efficient and distributed algorithms for index set computation that do not require any interprocessor communication. To perform data communication in a conflict-free manner, we have developed direct indirect and hybrid algorithms. In the direct algorithm, a data block is transferred directly to its destination processor. In an indirect algorithm, data blocks are moved from source to destination processors through intermediate relay processors. The hybrid algorithm is a combination of the direct and indirect algorithms. Our framework is based on a generalized circulant matrix formalism of the redistribution problem and a general purpose distributed memory model of the parallel machine. Our algorithms sustain excellent performance over a wide range of problem and machine parameters. We have implemented our algorithms using MPI, to allow for easy portability across different HPC platforms. Experimental results on the IBM SP-2 and the Cray T3D show superior performance over previous approaches. When the block size of the cyclic data layout changes by a factor of K , the redistribution can be performed in O( log K) communication steps. This is true even when K is a prime number. In contrast, previous approaches take O(K) communication steps for redistribution. Our framework can be used for developing scalable redistribution libraries, for efficiently implementing parallelizing compiler directives, and for developing parallel algorithms for various applications. Redistribution algorithms are especially useful in signal processing applications, where the data access patterns change significantly between computational phases. They are also necessary in linear algebra programs, to perform matrix transpose operations. Received June 1, 1997; revised March 10, 1998.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号