期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Hardware Support for Interval Arithmetic 总被引：1，自引：0，他引：1

Reinhard Kirchner Ulrich W. Kulisch 《Reliable Computing》2006,12(3):225-237

A hardware unit for interval arithmetic (including division by an interval that contains zero) is described in this paper. After a brief introduction an instruction set for interval arithmetic is defined which is attractive from the mathematical point of view. These instructions consist of the basic arithmetic operations and comparisons for intervals including the relevant lattice operations. To enable high speed, the case selections for interval multiplication (9 cases) and interval division (14 cases) are done in hardware. The lower bound of the result is computed with rounding downwards and the upper bound with rounding upwards by parallel units simultaneously. The rounding mode must be an integral part of the arithmetic operation. Also the basic comparisons for intervals together with the corresponding lattice operations and the result selection in more complicated cases of multiplication and division are done in hardware. There they are executed by parallel units simultaneously. The circuits described in this paper show that with modest additional hardware costs interval arithmetic can be made almost as fast as simple floating-point arithmetic. 相似文献

2.

非法计算故障的静态测试

曹文静宫云战《计算机辅助设计与图形学学报》2007,19(1):119-124

针对C/C 程序中的非法计算,形式化定义了非法计算故障;建立了表达式区间运算模型、变量取值区间集产生模型和非法计算故障模型,并将其作为静态识别非法计算的基础;提出了非法计算自动测试算法.实验结果表明,文中方法具有较高的故障检测准确率和测试效率. 相似文献

3.

A parallel algorithm for graph matching and its MasParimplementation

Allen R. Cinque L. Tanimoto S. Shapiro L. Yasuda D. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(5):490-501

Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to be processed are usually huge, necessitating long computation times, pruning heuristics, or massively parallel processing. We present an algorithm that reduces the computation time for graph matching by employing both branch-and-bound pruning of the search tree and massively-parallel search of the as-yet-unpruned portions of the space. Most research on parallel search has assumed that a multiple-instruction-stream/multiple-data-stream (MIMD) parallel computer is available. Since massively parallel stream (SIMD) computers are much less expensive than MIMD systems with equal numbers of processors, the question arises as to whether SIMD systems can efficiently handle state-space search problems. We demonstrate that the answer is yes, and in particular, that graph matching has a natural and efficient implementation on SIMD machines 相似文献

4.

Accurate arithmetic for vector processors

《Journal of Parallel and Distributed Computing》1988,5(3):250-270

In addition to the four elementary arithmetic operations, more advanced electronic computers such as vector and parallel computers often provide a number of compound operations as additional elementary operations. If pipelined compound operations like “multiply and add,” “accumulate,” and “multiply and accumulate” contribute essentially to the high speed of the system. Accuracy requirements lead to very similar operations. We identify a set of operations which meet both requirements: high speed and accuracy. After a brief discussion of implementation techniques for the simpler of these operations we present two methods and circuits which allow a fast and correct computation of the more complicated of these operations: “accumulate” and “multiply and accumulate.” The first method computes sums and dot products by making use of a matrix-shaped and pipelined arrangement of adders which cover the full floating-point range. The second method requires some local memory on the arithmetic unit. It permits a drastic reduction in the number of adders required. Both methods can also be used to build a fast arithmetic unit for microcomputers in VLSI technology. 相似文献

5.

Least significant bit evaluation of arithmetic expressions in single-precision

Dr. S. M. Rump Dipl.-Math H. Böhm 《Computing》1983,30(3):189-199

Single-precision floatingpoint computations may yield an arbitrary false result due to cancellation and rounding errors. This is true even for very simple, structured arithmetic expressions such as Horner's scheme for polynomial evaluation. A simple procedure will be presented for fast calculation of the value of an arithmetic expression to least significant bit accuracy in single precision computation. For this purpose in addition to the floating-point arithmetic only a precise scalar product (cf. [2]) is required. If the initial floatingpoint approximation is not too bad, the computing time of the new algorithm is approximately the same as for usual floating-point computation. If not, the essential progress of the presented algorithm is that the inaccurate approximation is recognized and corrected. The algorithm achieves high accuracy, i.e. between the left and the right bound of the result there is at most one more floating-point number. A rigorous estimation of all rounding errors introduced by floating-point arithmetic is given for general triangular linear systems. The theorem is applied to the evaluation of arithmetic expressions. 相似文献

6.

The exact dot product as basic tool for long interval arithmetic

Ulrich Kulisch Van Snyder 《Computing》2011,91(3):307-313

Computing with guarantees is based on two arithmetical features. One is fixed (double) precision interval arithmetic. The other one is dynamic precision interval arithmetic, here also called long interval arithmetic. The basic tool to achieve high speed dynamic precision arithmetic for real and interval data is an exact multiply and accumulate operation and with it an exact dot product. Pipelining allows to compute it at the same high speed as vector operations on conventional vector processors. Long interval arithmetic fully benefits from such high speed. Exactitude brings very high accuracy, and thereby stability into computation. This document, which has been incorporated into the draft standard for interval arithmetic being developed by IEEE P1788, specifies the implementation of an exact multiply and accumulate operation. 相似文献

7.

大整数Comba和Karatsuba乘法的多核并行化研究

蒋丽娟刘芳芳赵玉文杨超蔡颖《计算机系统应用》2016,25(11):232-236

大整数运算广泛地应用于公钥加密算法、大规模科学计算中高精度浮点数运算类以及构建大特征值等领域,然而其大部分算法空间和时间开销都很大,尤其对于核心运算之一的大整数乘法,当数据达到一定规模时,超长的串行计算时间已成为制约算法应用的巨大瓶颈.近几年来,伴随着多核、众核芯片的迅猛发展,通过充分挖掘算法本身的并行度以利用并行处理器的强大计算能力,进而高效地提升算法性能,成为一种研究趋势.本文基于通用多核并行计算平台,研究了大整数乘法Comba及Karatsuba快速算法的并行化,提出了高效的多核并行算法.在算法实现及性能优化上,采用了OpenMP+SIMD的多级并行技术,使性能获得巨大提升.在性能测试上,我们使用优化的并行算法与原始串行算法进行对比试验,结果显示,8线程并行Comba算法和Karatsuba算法相比串行对应算法分别实现了5.85倍以及6.14倍的性能加速比提升. 相似文献

8.

A reliable linear algebra library for transputer networks

Christian P. Ullrich Roman Reith 《Reliable Computing》1995,1(2):173-187

This paper presents a collection of linear algebra subroutines for transputer networks. The developed pilot library is intended to form a basis of a complete parallel linear algebra library for validating computations, whose routines will deliver (as accurately as necessary) either the best possible result, or a corresponding inclusion based on controlled rounding and an optimal scalar product. So far, as a first step, we have produced code for interval arithmetic, scalar products and simple vector-matrix operations with maximum accuracy. For the solution of triangular systems and the LU decomposition of dense matrices, new versions of classical methods are implemented which allow the application of optimal scalar products as a single, invidisible operation and optimize the overlap of communication and computation. The library also contains routines for computing inclusions of unstructured, dense linear systems of equations. Network topology dependency is avoided in all numerical routines by the use of general communication routines. This way the user is able to work with different topologies like ring structures, binary trees and hypercubes. 相似文献

9.

采用DSP的工控机高速运算协处理器模板

戴先中周卫中《工业控制计算机》1997,(6)

本文介绍了一种以DSP单片机TMS320C25为核心的高速运算协处理器模板，采用该模板可以大幅度提高现有STD工控机的算术运算速度，从而为高性能复杂控制算法在工业现场的实际应用提供了一条新的实用途径。相似文献

10.

Odd Memory Systems: A New Approach

Seznec A. Lenfant J. 《Journal of Parallel and Distributed Computing》1995,26(2)

To reject the use of a prime (or odd) number N of memory banks in a vector processor, it is generally advanced that address computation for such a memory system would require systematic Euclidean division by the number N. We first show that the Chinese Remainder Theorem allows one to define a very simple mapping of data onto the memory banks for which address computation does not require any Euclidean division. Massively parallel SIMD computers may have thousands of processors. When the memory on such a machine is globally shared, routing vectors from memory to the processors is a major difficulty; the control for the interconnection network cannot be generally computed at execution time. When the number of memory banks and processors is a product of prime numbers, the family of permutations needed for routing vectors from memory to the processors through the interconnection network has very specific properties. The Chinese Remainder Network presented in the paper is able to execute all these permutations in a single path and may be easily controlled. 相似文献

11.

Arithmetic applied mathematics

Donald Greenspan 《Computers & Mathematics with Applications》1977,3(4):253-270

In this paper it will be shown how modern digital computers allow one to develop fundamental areas in applied mathematics by use only of arithmetic. Attention will be directed primarily to theoretical Newtonian mechanics and to theoretical special relativistic mechanics, two of the most substantial areas in applied mathematical study. Using only arithmetic, we will establish all the usual conservation laws in exactly the same form in which they appear in continuous mechanics. In addition, new, viable nonlinear models of complex physical phenomena will emerge and related computations will be described. 相似文献

12.

Estimating interlock and improving balance for pipelined architectures

《Journal of Parallel and Distributed Computing》1988,5(4):334-358

Pipelining is now a standard technique for increasing the speed of computers, particularly for floating-point arithmetic. Single-chip, pipelined floating-point functional units are available as “off the shelf” components. Addressing arithmetic can be done concurrently with floating-point operations to construct a fast processor that can exploit fine-grain parallelism. This paper describes a metric to estimate the optimal execution time of DO loops on particular processors. This metric is parameterized by the memory bandwidth and peak floating-point rate of the processor, as well as the length of the pipelines used in the functional units. Data dependence analysis provides information about the execution order constraints of the operations in the DO loop and is used to estimate the amount of pipeline interlock required by a loop. Several transformations are investigated to determine their impact on loops under this metric. 相似文献

13.

Implementation of universal computer arithmetic with optimal accuracy

Dr. K. Grüner 《Computing》1980,24(2-3):181-193

Mathematical structures such as intervals, vectors and matrices are only insufficiently considered by the usual computer arithmetic. In this paper a general and modular concept for computer arithmetic is proposed which finally covers all spaces occurring in numerical computations (see figure 1). The operations are performed following a uniform construction principle. It induces a set of mapping properties between real and rounded operations which by the way guarantees optimal accuracy. 相似文献

14.

图象处理中多边形拟合的快速算法 总被引：2，自引：0，他引：2

张帆翟志华张新红《电脑开发与应用》2001,14(10):4-5,8

提出了一种简单高效的多边形拟合算法 ,不必采用递归调用的方法 ,仅在对目标图象的边界数据的一次遍历中 ,即可计算出所有的多边形拟合点 ,避免了递归调用中的重复运算 ,有利于计算机图象的实时处理和在线检测。实践证明 ,采用这种算法可得到满意的结果。相似文献

15.

Experiments on the evaluation of functional ranges using a random interval arithmetic

《Mathematics and computers in simulation》2001,56(1):17-34

A software tool using standard and special interval arithmetic operations together with an idea which is developed in the discrete stochastic arithmetic (DSA) approach for round-off error evaluation is proposed in this paper for a statistical computation of functional ranges. The CESTAC method is a Monte Carlo method which uses DSA and provides the accuracy on any computed result with a high probability. On the other hand, interval computation gives a guaranteed interval containing the result but this interval may be in some cases useless because much too wide. Here it is proposed to combine both approaches to obtain a smaller but only highly probable interval containing the range of a rational function for given interval data. Various numerical experiments are given. 相似文献

16.

A benchmark study based on the parallel computation of the vector outer-product A = uvT operation

Rudnei Dias Da Cunha 《Concurrency and Computation》1997,9(8):803-819

In this paper we benchmark the performance of the Cray T3D, IBM 9076 SP/1 and Intel Paragon XP/S parallel computers, using implementations of parallel algorithms for the computation of the vector outer-product A = uv^T operation. The vector outer-product operation, although very simple in nature, requires the computation of a large number of floating-point operations and its parallelization induces a great level of communication between the processors. It is thus suited to measure the relative speed of the processor, memory subsystem and network capabilities of a parallel computer. It should not be considered a ‘toy problem’, since it arises in numerical methods in the context of the solution of systems of non-linear equations – still a difficult problem to solve. We present algorithms for both the explicit shared-memory and message-passing programming models together with theoretical computation models for those algorithms. Actual experiments were run on those computers, using Fortran 77 implementations of the algorithms. The results obtained with these experiments show that due to the high degree of communication between the processors one needs a parallel computer with fast communications and carefully implemented data exchange routines. The theoretical computation model allows prediction of the speed-up to be obtained for some problem size on a given number of processors. © 1997 John Wiley & Sons, Ltd. 相似文献

17.

Parallel, high-speed PC fuzzy control

Jaramillo-Botero A. Miyake Y. 《Micro, IEEE》1995,15(6):63

To overcome the inefficiencies in processing fuzzy rules on sequential digital computers and the inflexibility of purely analog processors, we introduce a parallel architecture using analog processors with a digital interface. Our architecture for fuzzy processing supports ISA-based PC platforms. Single or multiple fuzzy units with analog processing cores can operate as stand-alone MISO or MIMO fuzzy logic controllers supporting a digital interface with a master PC (for setup, monitoring, and/or relational MIMO support). The journal issue contains a concise summary of this article. The complete article is linked to Micro's home page on the World Wide Web (http://www.computer.org/pubs/micro/micro.htm) 相似文献

18.

New Products

Michalopouios D.A. 《Computer》1978,11(5):92-98

EAI's new hybrid computer system, called Hyshare, is aimed at the multi-user, multi-task application demands of larger- scale simulation and scientific computation laboratories. Consisting of an EAI 3200 digital computer and up to six EAI analog processors, the analog/digital and digital/ analog communications interface employs on-line, dynamic resource allocation techniques which allow analog processors to be assigned to separate tasks or linked together to meet specific application requirements. 相似文献

19.

Feed-Forward Support Vector Machine Without Multipliers

《Neural Networks, IEEE Transactions on》2006,17(5):1328-1331

In this letter, we propose a coordinate rotation digital computer (CORDIC)-like algorithm for computing the feed-forward phase of a support vector machine (SVM) in fixed-point arithmetic, using only shift and add operations and avoiding resource-consuming multiplications. This result is obtained thanks to a hardware-friendly kernel, which greatly simplifies the SVM feed-forward phase computation and, at the same time, maintains good classification performance respect to the conventional Gaussian kernel. 相似文献

20.

高性能子字并行运算单元的设计与实现

下载免费PDF全文

董冕吴丹饶金理黄威戴葵邹雪城《计算机工程》2012,38(16):249-252

通过硬件共享的方式实现一套高性能子字并行运算单元,运算单元采用流水线设计,可以一个周期进行1个64-bit、2个32-bit、4个16-bit或8个8-bit定点运算,1个双精度或2个单精度浮点运算。运算单元采用Verilog HDL设计,在0.18 μm 标准CMOS工艺库下实现,并针对实际多媒体应用程序基于ESCA系统进行性能评测。实验结果表明,该运算单元可以在硬件开销和性能上获得较好的平衡。相似文献