首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 793 毫秒
1.
This paper deals with the problem of one-to-one mapping of 2n task modules of a parallel program to an n-dimensional hypercube multicomputer so as to minimize the total communication cost during the execution of the task. The problem of finding an optimal mapping has been proven to be NP-complete. First we show that the mapping problem in a hypercube multicomputer can be transformed into the problem of finding a set of maximum cutsets on a given task graph using a graph modification technique. Then we propose a repeated mapping scheme, using an existing graph bipartitioning algorithm, for the effective mapping of task modules onto the processors of a hypercube multicomputer. The repeated mapping scheme is shown to be highly effective on a number of test task graphs; it increasingly outperforms the greedy and recursive mapping algorithms as the number of processors increases. Our repeated mapping scheme is shown to be very effective for regular graphs, such as hypercube-isomorphic or ‘almost’ isomorphic graphs and meshes; it finds optimal mappings on almost all the regular task graphs considered.  相似文献   

2.
We systematized and developed some procedures for the modular design of externally-linear internally nonlinear (ELIN) circuits resulting in a general LIN↔ELIN transformation procedure. This one was also extended to analysis of these types of circuits. The procedure is exemplified on log-domain circuits. In the design one starts with the linear block diagram (LIN) described by transfer functions and one substitutes directly each linear building block by a corresponding nonlinear one. The parameters of each nonlinear component depend on the given parameters of its linear correspondent. Input F −1 and output F blocks are added. In the analysis one identifies the nonlinear basic circuit components and each of them is substituted by its corresponding linear building block. Input and output F −1-F cells are removed. The ideal transfer function can be calculated on the linear block diagram now. The LIN↔ELIN transformations make a direct connection between equivalent linear and ELIN circuits, simplify their design and analysis procedures and permit the development of CAD procedures.  相似文献   

3.
The multi‐carrier transmission signal in Multi‐Carrier Code Division Multiple Access (MC‐CDMA) has a high peak‐to‐average power ratio (PAPR), which results in nonlinear distortion and deteriorative system performance. An n‐tuple selective mapping method is proposed to reduce the PAPR, in this paper. This method generates 2n sequences of an original data sequence by adding n‐tuple of n PAPR control bits to it followed by an interleaver and error‐control code (ECC) to reduce its PAPR. The convolutional, Golay, and Hamming codes are used as ECCs in the proposed scheme. The proposed method uses different numbers of the n PAPR control bits to accomplish a noteworthy PAPR reduction and also avoids the need for a side‐information transmission. The simulation results authenticate the effectiveness of the proposed method.  相似文献   

4.
In this paper, a fully-pipeline linear systolic array based on adjusted Montgomery's algorithm is presented to perform modular multiplication at extremely high speed. The processing element (PE) consists of only 4 full-adders and 14 flip-flops. Three-stage internal pipelined PE results in a very short critical path with only a one-bit full-adder delay. Thus, it can run at a very high cycle rate. The total execution time for an n-bit modular multiplication is 2n + 11 cycles with only (n/2 + 2) PEs. A modular exponentiation based on it takes (3n + 16.5)n cycles in average. Compared with most published VLSI modular multipliers, the hardware complexity is greatly reduced while keeping very high throughput. Therefore it is a good candidate of the arithmetic units used in the many public-key crypto-systems, e.g. RSA, Elliptic Curve and so on, especially for the embedded applications concerning information security.  相似文献   

5.
In the area of automatic parallelization of programs, analyzing and transforming loop nests with parametric affine loop bounds requires fundamental mathematical results. The most common geometrical model of iteration spaces, called the polytope model, is based on mathematics dealing with convex and discrete geometry, linear programming, combinatorics and geometry of numbers.In this paper, we present automatic methods for computing the parametric vertices and the Ehrhart polynomial, i.e., a parametric expression of the number of integer points, of a polytope defined by a set of parametric linear constraints.These methods have many applications in analysis and transformations of nested loop programs. The paper is illustrated with exact symbolic array dataflow analysis, estimation of execution time, and with the computation of the maximum available parallelism of given loop nests.  相似文献   

6.
An architecture based on the RSA public key cryptography algorithm is presented. The circuit includes two components, one for modular squaring and one for modular multiplication. Each component is based on the Montgomery algorithm and implements the modular operations using two modified serial-parallel multipliers. A full modular exponentiation is completed every n(n + 3) clock cycles. All circuits are systolic, operate with 100% efficiency and their maximum combinational delay is equal to one gated Full-Adder. Thus, high-speed performance is achieved while the low cell hardware complexity enables an efficient VLSI implementation.  相似文献   

7.
Many important algorithms can be described by n-dimensional uniform recurrences. The computations are then indexed by integral vectors of length n and the data dependencies between computations can be described by the difference vector of the corresponding indexes which are independent of the indexes. This paper addresses the following optimization problem: Given an n-dimensional uniform recurrence whose computation indexes are mapped by a linear function onto the processors of an array processor embedded in k-space (1 k n). Find an optimal linear function for the computation indexes. We study a continuous approximation of this problem by passing from linear to quasi-linear timing functions. The resultant problem formulation is then a quadratic programming problem which can be solved by standard algorithms for quadratic or general nonlinear optimization problems. We demonstrate the effectiveness of our approach by several nontrivial test examples.  相似文献   

8.
A forward-backward training algorithm for parallel, self-organizing hierarchical neural networks (PSHNNs) is described. Using linear algebra, it is shown that the forward-backward training of ann-stage PSHNN until convergence is equivalent to the pseudo-inverse solution for a single, total network designed in the least-squares sense with the total input vector consisting of the actual input vector and its additional nonlinear transformations. These results are also valid when a single long input vector is partitioned into smaller length vectors. A number of advantages achieved are: small modules for easy and fast learning, parallel implementation of small modules during testing, faster convergence rate, better numerical error-reduction, and suitability for learning input nonlinear transformations by other neural networks. The backpropagation (BP) algorithm is proposed for learning input nonlinearitics. Better performance in terms of deeper minimum of the error function and faster convergence rate is achieved when a single BP network is replaced by a PSHNN of equal complexity in which each stage is a BP network of smaller complexity than the single BP network.  相似文献   

9.
10.
A system of design automation computer programs is described which is capable of assigning blocks of a logic design to modules so as to satisfy certain constraints specified on the assignment. System features which enable designer-computer cooperation are discussed, and quality of solutions obtained with the system are compared to manual solutions for the same tasks. Three conclusions are reached. First, these computer programs make it possible to perform partitioning and mapping experiments which were not possible before. Second, for one-level partitions (e.g., logic gates on chips), highly automatic solutions obtained by the system are at least as good as manual solutions and are less costly to obtain. Third, for multilevel partitions (e.g., logic gates on chips on cards) or for mappings, the solutions obtained with the program are again at least as good as manual solutions; furthermore, the system allows a designer to try more alternatives than he could manually, so that he can trade off the time and cost of trying additional alternatives against the value of a better solution.  相似文献   

11.
In this paper the problems involved with high-level synthesis of ASIC regular arrays for real-time signal processing systems will be outlined. It will be shown that novel nonlinear, behavior preserving, transformations and extended affine mapping techniques are of key importance in mapping nonuniform recurrence equations on regular arrays with realistic constraints on area, throughput and I/O bandwidth.  相似文献   

12.
We propose and describe new error control algorithms for redundant residue number systems (RRNSs) and residue number system product codes. These algorithms employ search techniques for obtaining error values from within a set of values (that contains all possible error values). For a given RRNS, the error control algorithms have a computational complexity of t·O(log2 n + log2 m?) comparison operations, where t denotes the error correcting capability, n denotes the number of moduli, and m? denotes the geometric average of moduli. These algorithms avoid most modular operations. We describe a refinement to the proposed algorithms that further avoids the modular operation required in their respective first steps, with an increase of ?log2 n? to their computational complexity. The new algorithms provide significant computational advantages over existing methods.  相似文献   

13.
We present a new serial-parallel concurrent modular-multiplication algorithm and architecture suitable for standard RSA encryption. In the new scheme, multiplication is performed modulo a multiple of the RSA modulus n, which has a diminished-radix form 2 k -v, where k and v are positive integers and v < n. This design is the first concurrent modular multiplier to use a diminished-radix algorithm and to pipeline concurrent modular-reduction to optimize the clock rate. For a modular multiplier of order ranging from 1 to 10 (number of multiplier bits per clock cycle), a faster clock rate and throughput is possible than with other known designs including those of Brickell, Morita, Sedlak and Golze, and Miyaguchi. Throughput estimates for 512-bit RSA decryption range from 100 kbit/s in a serial mode to 650 kbit/s with a modular multiplier of order 10, at a clock rate of 20 MHz on 1.5 m CMOS.  相似文献   

14.
Modeling of chaotic DC-DC converters by iterated nonlinear mappings   总被引:22,自引:0,他引:22  
In parameter ranges where conventional methods break down, DC-DC converters may be described by iterated mappings, a nonlinear discrete modeling technique. The underlying principles are explained and are applied to the example of a PWM-controlled buck converter. Stable behavior and bifurcations to chaos are predicted by numerical evaluation of the governing mapping and are confirmed by experiment  相似文献   

15.
In this paper a new and efficient method is presented for optimizing the mapping ofnonuniform recurrence equations on regular array architectures. The method is based on applyingnonlinear transformations on theindices of the recurrence equations by reindexing groups of operations based on a chosen group communication scheme. The main result of this paper is that the presented method provides a means to map real life high throughput algorithms onto ASIC regular array architectures under real constraints.  相似文献   

16.
A reconfigurable network termed as the reconfigurable multi-ring network (RMRN) is described. The RMRN is shown to be a truly scalable network in that each node in the network has a fixed degree of connectivity and the reconfiguration mechanism ensures a network diameter of O(log2 N) for an N-processor network. Algorithms for the two-dimensional mesh and the SIMD or SPMD n-cube are shown to map very elegantly onto the RMRN. Basic message passing and reconfiguration primitives for the SIMD/SPMD RMRN are designed for use as building blocks for more complex parallel algorithms. Elsewhere, the RMRN is shown to be a viable architecture for image processing and computer vision problems. In this paper, the RMRN is proved to be very useful for the implementation of numerical algorithms. We describe the implementation of a nontrivial numerical scheme on the RMRN. This numerical scheme is based on the inverse scattering transform and is used to study the role of nonlinear terms in Korteweg–de Vries like equations. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

17.
Modular inverse arithmetic plays an important role in elliptic curve cryptography. Based on the analysis of Montgomery modular inversion algorithm, this paper presents a new dual-field modular inversion algorithm, and a novel scalable and unified architecture for Montgomery inverse hardware in finite fields GF(p) and GF(2 n ) is proposed. Furthermore, this architecture based on the new modular inversion algorithm has been verified by modeling it in Verilog-HDL, and accomplished it under 0.18 μm CMOS technology. The result indicates that our work has better performance and flexibility than other works.  相似文献   

18.
In this paper, we describe a fixed‐threshold sequential minimal optimization (FSMO) for structured SVM problems. FSMO is conceptually simple, easy to implement, and faster than the standard support vector machine (SVM) training algorithms for structured SVM problems. Because FSMO uses the fact that the formulation of structured SVM has no bias (that is, the threshold b is fixed at zero), FSMO breaks down the quadratic programming (QP) problems of structured SVM into a series of smallest QP problems, each involving only one variable. By involving only one variable, FSMO is advantageous in that each QP sub‐problem does not need subset selection. For the various test sets, FSMO is as accurate as an existing structured SVM implementation (SVM‐Struct) but is much faster on large data sets. The training time of FSMO empirically scales between O ( n ) and O(n1.2), while SVM‐Struct scales between O(n1.5) and O(n1.8).  相似文献   

19.
Properties of the nonlinear transfer function are studied in this paper and then-th order stability,n-th order frequency response andn-th order sensitivity as well as a novel theory and implementation on the sensitivity of nonlinear transfer system are proposed.  相似文献   

20.
The optimum architecture design and mapping of QRD-RLS adaptive filters can be achieved through filter architecture selections, look-ahead transformations, and hierarchical pipelining/folding transformations. In this paper, a relaxed annihilation-reordering look-ahead (RARL) architecture is proposed, and shown to be more power and area efficient than pipelined processing architecture which was considered the most area efficient. The filters with this architecture are based on relaxed weight-update through filtering approximation, where a filter tap weight is updated upon arrival of every block of input data, and are speeded up with annihilation-reordering look-ahead transformation. As a result of the computational complexity reduction, this architecture does not change the iteration bound and filter clock frequency, and leads to speed up with linear increase in power consumption, while the pipelined processing architectures result in speedup with quadratic increase in power consumption. Upon hardware mapping, this architecture is also more advantageous to achieve low area designs. Two design examples are presented to illustrate mapping optimization using above transformations. These results are important for mapping designs onto ASICs, FPGAs or parallel computing machines. The results show significant improvements in throughput, power consumption and hardware requirement. It is also interesting to show through mathematics and simulations that the RARL QRD-RLS filters have no performance degradation in terms of convergence rate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号