期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Repeated Mapping Scheme of Task Modules with Minimum Communication Cost in Hypercube Multicomputers

Joo-Man Kim Cheol-Hoon Lee 《ETRI Journal》1998,20(4):327-345

This paper deals with the problem of one-to-one mapping of 2ⁿ task modules of a parallel program to an n-dimensional hypercube multicomputer so as to minimize the total communication cost during the execution of the task. The problem of finding an optimal mapping has been proven to be NP-complete. First we show that the mapping problem in a hypercube multicomputer can be transformed into the problem of finding a set of maximum cutsets on a given task graph using a graph modification technique. Then we propose a repeated mapping scheme, using an existing graph bipartitioning algorithm, for the effective mapping of task modules onto the processors of a hypercube multicomputer. The repeated mapping scheme is shown to be highly effective on a number of test task graphs; it increasingly outperforms the greedy and recursive mapping algorithms as the number of processors increases. Our repeated mapping scheme is shown to be very effective for regular graphs, such as hypercube-isomorphic or ‘almost’ isomorphic graphs and meshes; it finds optimal mappings on almost all the regular task graphs considered. 相似文献

2.

Modular analysis and design of log-domain circuits based on LIN↔ELIN transformations

L. Feştilă R. Groza M. Cîrlugea A. Fazakas 《Analog Integrated Circuits and Signal Processing》2007,50(3):231-249

We systematized and developed some procedures for the modular design of externally-linear internally nonlinear (ELIN) circuits resulting in a general LIN↔ELIN transformation procedure. This one was also extended to analysis of these types of circuits. The procedure is exemplified on log-domain circuits. In the design one starts with the linear block diagram (LIN) described by transfer functions and one substitutes directly each linear building block by a corresponding nonlinear one. The parameters of each nonlinear component depend on the given parameters of its linear correspondent. Input F ⁻¹ and output F blocks are added. In the analysis one identifies the nonlinear basic circuit components and each of them is substituted by its corresponding linear building block. Input and output F ⁻¹-F cells are removed. The ideal transfer function can be calculated on the linear block diagram now. The LIN↔ELIN transformations make a direct connection between equivalent linear and ELIN circuits, simplify their design and analysis procedures and permit the development of CAD procedures. 相似文献

3.

Peak‐to‐Average Power Ratio Reduction Using N‐tuple Selective Mapping Method for MC‐CDMA

下载免费PDF全文

Sajjad Ali Zhe Chen Fuliang Yin 《ETRI Journal》2015,37(2):338-347

The multi‐carrier transmission signal in Multi‐Carrier Code Division Multiple Access (MC‐CDMA) has a high peak‐to‐average power ratio (PAPR), which results in nonlinear distortion and deteriorative system performance. An n‐tuple selective mapping method is proposed to reduce the PAPR, in this paper. This method generates 2ⁿ sequences of an original data sequence by adding n‐tuple of n PAPR control bits to it followed by an interleaver and error‐control code (ECC) to reduce its PAPR. The convolutional, Golay, and Hamming codes are used as ECCs in the proposed scheme. The proposed method uses different numbers of the n PAPR control bits to accomplish a noteworthy PAPR reduction and also avoids the need for a side‐information transmission. The simulation results authenticate the effectiveness of the proposed method. 相似文献

4.

A Fully-Pipeline Linear Systolic Architecture for Modular Multiplier in Public-Key Crypto-Systems

Xingjun Wu Hongyi Chen Yihe Sun Weixin Gai 《The Journal of VLSI Signal Processing》2003,33(1-2):191-197

In this paper, a fully-pipeline linear systolic array based on adjusted Montgomery's algorithm is presented to perform modular multiplication at extremely high speed. The processing element (PE) consists of only 4 full-adders and 14 flip-flops. Three-stage internal pipelined PE results in a very short critical path with only a one-bit full-adder delay. Thus, it can run at a very high cycle rate. The total execution time for an n-bit modular multiplication is 2n + 11 cycles with only (n/2 + 2) PEs. A modular exponentiation based on it takes (3n + 16.5)n cycles in average. Compared with most published VLSI modular multipliers, the hardware complexity is greatly reduced while keeping very high throughput. Therefore it is a good candidate of the arithmetic units used in the many public-key crypto-systems, e.g. RSA, Elliptic Curve and so on, especially for the embedded applications concerning information security. 相似文献

5.

Parametric Analysis of Polyhedral Iteration Spaces

Philippe Clauss Vincent Loechner 《The Journal of VLSI Signal Processing》1998,19(2):179-194

In the area of automatic parallelization of programs, analyzing and transforming loop nests with parametric affine loop bounds requires fundamental mathematical results. The most common geometrical model of iteration spaces, called the polytope model, is based on mathematics dealing with convex and discrete geometry, linear programming, combinatorics and geometry of numbers.In this paper, we present automatic methods for computing the parametric vertices and the Ehrhart polynomial, i.e., a parametric expression of the number of integer points, of a polytope defined by a set of parametric linear constraints.These methods have many applications in analysis and transformations of nested loop programs. The paper is illustrated with exact symbolic array dataflow analysis, estimation of execution time, and with the computation of the maximum available parallelism of given loop nests. 相似文献

6.

A Systolic, High Speed Architecture for an RSA Cryptosystem

K.Z. Pekmestzi N.K. Moshopoulos 《The Journal of VLSI Signal Processing》2002,32(3):223-235

An architecture based on the RSA public key cryptography algorithm is presented. The circuit includes two components, one for modular squaring and one for modular multiplication. Each component is based on the Montgomery algorithm and implements the modular operations using two modified serial-parallel multipliers. A full modular exponentiation is completed every n(n + 3) clock cycles. All circuits are systolic, operate with 100% efficiency and their maximum combinational delay is equal to one gated Full-Adder. Thus, high-speed performance is achieved while the low cell hardware complexity enables an efficient VLSI implementation. 相似文献

7.

On Time Optimal Implementation of Uniform Recurrences onto Array Processors via Quadratic Programming

Karl-Heinz Zimmermann Wolfgang Achtziger 《The Journal of VLSI Signal Processing》1998,19(1):19-38

Many important algorithms can be described by n-dimensional uniform recurrences. The computations are then indexed by integral vectors of length n and the data dependencies between computations can be described by the difference vector of the corresponding indexes which are independent of the indexes. This paper addresses the following optimization problem: Given an n-dimensional uniform recurrence whose computation indexes are mapped by a linear function onto the processors of an array processor embedded in k-space (1 k n). Find an optimal linear function for the computation indexes. We study a continuous approximation of this problem by passing from linear to quasi-linear timing functions. The resultant problem formulation is then a quadratic programming problem which can be solved by standard algorithms for quadratic or general nonlinear optimization problems. We demonstrate the effectiveness of our approach by several nontrivial test examples. 相似文献

8.

Parallel,self-organizing,hierarchical neural networks with forward-backward training

S. -W. Deng O. K. Ersoy 《Circuits, Systems, and Signal Processing》1993,12(2):223-246

A forward-backward training algorithm for parallel, self-organizing hierarchical neural networks (PSHNNs) is described. Using linear algebra, it is shown that the forward-backward training of ann-stage PSHNN until convergence is equivalent to the pseudo-inverse solution for a single, total network designed in the least-squares sense with the total input vector consisting of the actual input vector and its additional nonlinear transformations. These results are also valid when a single long input vector is partitioned into smaller length vectors. A number of advantages achieved are: small modules for easy and fast learning, parallel implementation of small modules during testing, faster convergence rate, better numerical error-reduction, and suitability for learning input nonlinear transformations by other neural networks. The backpropagation (BP) algorithm is proposed for learning input nonlinearitics. Better performance in terms of deeper minimum of the error function and faster convergence rate is achieved when a single BP network is replaced by a PSHNN of equal complexity in which each stage is a BP network of smaller complexity than the single BP network. 相似文献

9.

An approach for fixed coefficient RNS-based FIR filter

Kotha Srinivasa Reddy Subhendu Kumar Sahoo 《International Journal of Electronics》2013,100(8):1358-1376

相似文献

10.

A computer-based-design approach to partitioning and mapping of computer logic graphs

《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1972,60(1):28-34

A system of design automation computer programs is described which is capable of assigning blocks of a logic design to modules so as to satisfy certain constraints specified on the assignment. System features which enable designer-computer cooperation are discussed, and quality of solutions obtained with the system are compared to manual solutions for the same tasks. Three conclusions are reached. First, these computer programs make it possible to perform partitioning and mapping experiments which were not possible before. Second, for one-level partitions (e.g., logic gates on chips), highly automatic solutions obtained by the system are at least as good as manual solutions and are less costly to obtain. Third, for multilevel partitions (e.g., logic gates on chips on cards) or for mappings, the solutions obtained with the program are again at least as good as manual solutions; furthermore, the system allows a designer to try more alternatives than he could manually, so that he can trade off the time and cost of trying additional alternatives against the value of a better solution. 相似文献

11.

Synthesis of ASIC regular arrays for real-time image processing systems

Micha?l Swaaij Jan Rosseel Francky Catthoor Hugo Man 《The Journal of VLSI Signal Processing》1991,3(3):183-192

In this paper the problems involved with high-level synthesis of ASIC regular arrays for real-time signal processing systems will be outlined. It will be shown that novel nonlinear, behavior preserving, transformations and extended affine mapping techniques are of key importance in mapping nonuniform recurrence equations on regular arrays with realistic constraints on area, throughput and I/O bandwidth. 相似文献

12.

New Error Control Algorithms for Residue Number System Codes

下载免费PDF全文

Hanshen Xiao Hari Krishna Garg Jianhao Hu Guoqiang Xiao 《ETRI Journal》2016,38(2):326-336

We propose and describe new error control algorithms for redundant residue number systems (RRNSs) and residue number system product codes. These algorithms employ search techniques for obtaining error values from within a set of values (that contains all possible error values). For a given RRNS, the error control algorithms have a computational complexity of t·O(log₂ n + log₂ m?) comparison operations, where t denotes the error correcting capability, n denotes the number of moduli, and m? denotes the geometric average of moduli. These algorithms avoid most modular operations. We describe a refinement to the proposed algorithms that further avoids the modular operation required in their respective first steps, with an increase of ?log₂ n? to their computational complexity. The new algorithms provide significant computational advantages over existing methods. 相似文献

13.

A design of a fast pipelined modular multiplier based on a diminished-radix algorithm

Glenn Orton Lloyd Peppard Stafford Tavares 《Journal of Cryptology》1993,6(4):183-208

We present a new serial-parallel concurrent modular-multiplication algorithm and architecture suitable for standard RSA encryption. In the new scheme, multiplication is performed modulo a multiple of the RSA modulus n, which has a diminished-radix form 2^k-v, where k and v are positive integers and v < n. This design is the first concurrent modular multiplier to use a diminished-radix algorithm and to pipeline concurrent modular-reduction to optimize the clock rate. For a modular multiplier of order ranging from 1 to 10 (number of multiplier bits per clock cycle), a faster clock rate and throughput is possible than with other known designs including those of Brickell, Morita, Sedlak and Golze, and Miyaguchi. Throughput estimates for 512-bit RSA decryption range from 100 kbit/s in a serial mode to 650 kbit/s with a modular multiplier of order 10, at a clock rate of 20 MHz on 1.5 m CMOS. 相似文献

14.

Modeling of chaotic DC-DC converters by iterated nonlinear mappings 总被引：22，自引：0，他引：22

Hamill D.C. Deane J.H.B. Jefferies D.J. 《Power Electronics, IEEE Transactions on》1992,7(1):25-36

In parameter ranges where conventional methods break down, DC-DC converters may be described by iterated mappings, a nonlinear discrete modeling technique. The underlying principles are explained and are applied to the example of a PWM-controlled buck converter. Stable behavior and bifurcations to chaos are predicted by numerical evaluation of the governing mapping and are confirmed by experiment 相似文献

15.

Nonlinear transformations for high level regular array ASIC synthesis

Michaël F. X. B. van Swaaij Francky V. M. Catthoor Hugo J. De Man 《The Journal of VLSI Signal Processing》1992,4(4):259-268

In this paper a new and efficient method is presented for optimizing the mapping ofnonuniform recurrence equations on regular array architectures. The method is based on applyingnonlinear transformations on theindices of the recurrence equations by reindexing groups of operations based on a chosen group communication scheme. The main result of this paper is that the presented method provides a means to map real life high throughput algorithms onto ASIC regular array architectures under real constraints. 相似文献

16.

A parallel numerical algorithm on a reconfigurable multi-ring network

Arabnia Hamid R. Taha Thiab R. 《Telecommunication Systems》1998,10(1-2):185-202

A reconfigurable network termed as the reconfigurable multi-ring network (RMRN) is described. The RMRN is shown to be a truly scalable network in that each node in the network has a fixed degree of connectivity and the reconfiguration mechanism ensures a network diameter of O(log₂ N) for an N-processor network. Algorithms for the two-dimensional mesh and the SIMD or SPMD n-cube are shown to map very elegantly onto the RMRN. Basic message passing and reconfiguration primitives for the SIMD/SPMD RMRN are designed for use as building blocks for more complex parallel algorithms. Elsewhere, the RMRN is shown to be a viable architecture for image processing and computer vision problems. In this paper, the RMRN is proved to be very useful for the implementation of numerical algorithms. We describe the implementation of a nontrivial numerical scheme on the RMRN. This numerical scheme is based on the inverse scattering transform and is used to study the role of nonlinear terms in Korteweg–de Vries like equations. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

17.

Design and implementation of dual-field modular inversion algorithm

Yang Xiaohui Yang Tongjie Dai Zibin Zhang Yongfu 《电子科学学刊(英文版)》2010,27(4):531-537

Modular inverse arithmetic plays an important role in elliptic curve cryptography. Based on the analysis of Montgomery modular inversion algorithm, this paper presents a new dual-field modular inversion algorithm, and a novel scalable and unified architecture for Montgomery inverse hardware in finite fields GF(p) and GF(2ⁿ) is proposed. Furthermore, this architecture based on the new modular inversion algorithm has been verified by modeling it in Verilog-HDL, and accomplished it under 0.18 μm CMOS technology. The result indicates that our work has better performance and flexibility than other works. 相似文献

18.

Fast Training of Structured SVM Using Fixed‐Threshold Sequential Minimal Optimization

Changki Lee Myung‐Gil Jang 《ETRI Journal》2009,31(2):121-128

In this paper, we describe a fixed‐threshold sequential minimal optimization (FSMO) for structured SVM problems. FSMO is conceptually simple, easy to implement, and faster than the standard support vector machine (SVM) training algorithms for structured SVM problems. Because FSMO uses the fact that the formulation of structured SVM has no bias (that is, the threshold b is fixed at zero), FSMO breaks down the quadratic programming (QP) problems of structured SVM into a series of smallest QP problems, each involving only one variable. By involving only one variable, FSMO is advantageous in that each QP sub‐problem does not need subset selection. For the various test sets, FSMO is as accurate as an existing structured SVM implementation (SVM‐Struct) but is much faster on large data sets. The training time of FSMO empirically scales between O ( n ) and O(n^1.2), while SVM‐Struct scales between O(n^1.5) and O(n^1.8). 相似文献

19.

Sensitivity analysis of nonlinear circuits and systems—Nonlinear transfer function approach

Jiao Licheng 《电子科学学刊(英文版)》1989,6(4):306-313

Properties of the nonlinear transfer function are studied in this paper and then-th order stability,n-th order frequency response andn-th order sensitivity as well as a novel theory and implementation on the sensitivity of nonlinear transfer system are proposed. 相似文献

20.

Relaxed Annihilation-Reordering Look-Ahead QRD-RLS Adaptive Filters

Lijun Gao Keshab K. Parhi Jun Ma 《The Journal of VLSI Signal Processing》2003,35(2):119-135

The optimum architecture design and mapping of QRD-RLS adaptive filters can be achieved through filter architecture selections, look-ahead transformations, and hierarchical pipelining/folding transformations. In this paper, a relaxed annihilation-reordering look-ahead (RARL) architecture is proposed, and shown to be more power and area efficient than pipelined processing architecture which was considered the most area efficient. The filters with this architecture are based on relaxed weight-update through filtering approximation, where a filter tap weight is updated upon arrival of every block of input data, and are speeded up with annihilation-reordering look-ahead transformation. As a result of the computational complexity reduction, this architecture does not change the iteration bound and filter clock frequency, and leads to speed up with linear increase in power consumption, while the pipelined processing architectures result in speedup with quadratic increase in power consumption. Upon hardware mapping, this architecture is also more advantageous to achieve low area designs. Two design examples are presented to illustrate mapping optimization using above transformations. These results are important for mapping designs onto ASICs, FPGAs or parallel computing machines. The results show significant improvements in throughput, power consumption and hardware requirement. It is also interesting to show through mathematics and simulations that the RARL QRD-RLS filters have no performance degradation in terms of convergence rate. 相似文献