共查询到20条相似文献,搜索用时 15 毫秒
1.
K. A. Batuzov 《Programming and Computer Software》2017,43(6):366-372
The complexity of software is ever increasing, and it requires more and more computational resources for its execution. A way to satisfy these requirements is the use of vector instructions that can operate with fixed-length vectors of data of the same. A method for representing vector instructions of one processor architecture in terms of the vector instructions of another architecture during the dynamic binary translation is proposed. An implementation of this method that includes the translation of vector addition and memory access increased the performance of the QEMU emulator by a factor greater than three on an artificial example and 12% on a real-life application. 相似文献
2.
The architecture of a biologically motivated visual-information processor that can perform a variety of tasks associated with the early stages of machine vision is described. The computational operations performed by the processor emulate the spatiotemporal information-processing capabilities of certain neural-activity fields found along the human visual pathway. The state-space model of the neurovision processor is a two-dimensional nural network of densely interconnected nonlinear processing elements PE's. An individual PE represents the dynamic activity exhibited by a spatially localized population of excitatory and inhibitory nerve cells. Each PE may receive inputs from an external signal space as well as from the neighboring PE's within the network. The information embedded within the signal space is extracted by the feedforward subnet. The feedback subnet of the neurovision processor generates useful steady-state and temporal-response characteristics that can be used for spatiotemporal filtering, short-term visual memory, spatiotemporal stabilization, competitive feedback interaction, and content-addressable memory. To illustrate the versatility of the multitask processor design for machine-vision applications, a computer simulation of a simplified vision system for filtering, storing, and classifying noisy gray-level images in presented. 相似文献
3.
4.
5.
6.
《国际计算机数学杂志》2012,89(1-4):71-89
The preconditioned conjugate gradient method is well established for solving linear systems of equations that arise from the discretization of partial differential equations. Point and block Jacobi preconditioning are both common preconditioning techniques. Although it is reasonable to expect that block Jacobi preconditioning is more effective, block preconditioning requires the solution of triangular systems of equations that are difficult to vectorize. We present an implementation of block Jacobi for vector computers, especially for the Cray Y-MP/264, and discuss several techniques to improve vectorization. We present these in a progression to show the effect on performance. For the model problem, resulting from a self-adjoint operator, the final implementation of one block Jacobi step uses almost the same amount of time as one point Jacobi step on the Cray Y-MP/264 despite the solution of triangular systems. 相似文献
7.
8.
9.
10.
M.J. Kascic 《Parallel Computing》1984,1(1):35-44
The raw performance of vector processors such as the CDC CYBER-205 has been well documented. The ability to apply this raw power to ever more complex algebraic algorithms has been reported in [9]. The final step in making computers of this class truly the revolutionary tools they are claimed to be is to develop whole applications that perform at a significant fraction of the raw power. This involves two distinct subclasses of problems. On the one hand, there are those pre-existing applications that must be mapped onto vector processors in such a way that not only is performance maintained, but also a (sometimes vague) set of computational boundary conditions of the user community is satisfied. On the other hand, there are those models which are developed ab initio with machines such as the CYBER-205 in mind. The development of solutions to problems in the former class involves psychology and politics as well as mathematics and computer science. We limit ourselves here to reporting on an example of the latter class, viz. a model to study a particular fluid-dynamic phenomenon, that was specifically designed with the CYBER-205 in mind. 相似文献
11.
Hardware and software codesign and flexibility requirements often necessitate embedded application-specific instruction-set processors in system-on-chip designs. Spaceman, a reusable stack-processor virtual component, offers a customer-configurable instruction set; parameterizable bus widths, stack depths, and stack access ranges; and selectable bus interfaces 相似文献
12.
13.
多态并行处理器的数据通信和路由器的设计 总被引:2,自引:1,他引:2
随着多核技术的发展,核间通信问题面临新的挑战,核间通信性能决定了整个多核处理器的性能。通过分析多核处理器的数据通信需求,提出了一种适用于多态并行处理器的数据通信结构。该结构采用邻接共享寄存器实现的核间近邻通信和路由器硬件加速结构实现的远程通信两种数据通信方式,远程通信机制的路由器使用输入缓存机制实现,采用经典的确定性路由算法——XY路由算法实现了路由计算,加入多播和容错技术,采用专用的仲裁机制简化了设计复杂度。这些改进降低了处理器的核间通信延迟和功耗,提高了多态并行处理器的性能。 相似文献
14.
15.
Zahari Zlatev 《Parallel Computing》1988,8(1-3):301-312
Computations involving symmetric, positive definite and band matrices are kernel operations in the numerical treatment of many models arising in science and engineering. It is desirable to achieve a high level of performance when such operations are to be carried out on a vector processor. If the operations are performed by rows or columns (as in the EXTENDED BLAS subroutines), then the loops are vectorized but the speed of computations, measured in Mflops, is not very high, because the arrays involved are normally short. Therefore the computations should be organized by diagonals. Furthermore, some special devices are to be applied in order to unrol the loops. Finally, one should be careful with the storage scheme. It is demonstrated that if (i) the computations are organized by diagonals, (ii) the main loops are unrolled and (iii) the storage scheme is such that the work with some zero-elements is avoided, then the speed of computations is nearly the same as that obtained in the computations with dense matrices. If a particular vector machine is in use (in our case a CRAY X-MP computer), then the speed can be increased further by (iv) coding some basic operations in machine language and (v) using the different processors of the vector computer in parallel. The efficiency of the exploitation of the special features of the particular computer that is to be used is also illustrated by numerical examples.
Kernel subroutines performing matrix-vector multiplications are described. Representative tests are used to demonstrate the efficiency of these kernels. 相似文献
16.
17.
18.