共查询到20条相似文献,搜索用时 0 毫秒
1.
K. A. Batuzov 《Programming and Computer Software》2017,43(6):366-372
The complexity of software is ever increasing, and it requires more and more computational resources for its execution. A way to satisfy these requirements is the use of vector instructions that can operate with fixed-length vectors of data of the same. A method for representing vector instructions of one processor architecture in terms of the vector instructions of another architecture during the dynamic binary translation is proposed. An implementation of this method that includes the translation of vector addition and memory access increased the performance of the QEMU emulator by a factor greater than three on an artificial example and 12% on a real-life application. 相似文献
2.
《Computers & Structures》1986,24(4):625-635
Linear and nonlinear finite element software development considerations for vector processors are presented. Areas of discussion include performance measurement, data management, element level calculations and nonlinear problem solution. An example problem which demonstrates software performance is also presented.Incorporation of the methods presented in this paper can lead to finite element software which requires approximately one tenth the CPU time and as little as one-hundredth the I/O effort of conventional software. 相似文献
3.
The architecture of a biologically motivated visual-information processor that can perform a variety of tasks associated with the early stages of machine vision is described. The computational operations performed by the processor emulate the spatiotemporal information-processing capabilities of certain neural-activity fields found along the human visual pathway. The state-space model of the neurovision processor is a two-dimensional nural network of densely interconnected nonlinear processing elements PE's. An individual PE represents the dynamic activity exhibited by a spatially localized population of excitatory and inhibitory nerve cells. Each PE may receive inputs from an external signal space as well as from the neighboring PE's within the network. The information embedded within the signal space is extracted by the feedforward subnet. The feedback subnet of the neurovision processor generates useful steady-state and temporal-response characteristics that can be used for spatiotemporal filtering, short-term visual memory, spatiotemporal stabilization, competitive feedback interaction, and content-addressable memory. To illustrate the versatility of the multitask processor design for machine-vision applications, a computer simulation of a simplified vision system for filtering, storing, and classifying noisy gray-level images in presented. 相似文献
4.
5.
6.
7.
《国际计算机数学杂志》2012,89(1-4):71-89
The preconditioned conjugate gradient method is well established for solving linear systems of equations that arise from the discretization of partial differential equations. Point and block Jacobi preconditioning are both common preconditioning techniques. Although it is reasonable to expect that block Jacobi preconditioning is more effective, block preconditioning requires the solution of triangular systems of equations that are difficult to vectorize. We present an implementation of block Jacobi for vector computers, especially for the Cray Y-MP/264, and discuss several techniques to improve vectorization. We present these in a progression to show the effect on performance. For the model problem, resulting from a self-adjoint operator, the final implementation of one block Jacobi step uses almost the same amount of time as one point Jacobi step on the Cray Y-MP/264 despite the solution of triangular systems. 相似文献
8.
9.
10.
11.
Xia Peisu Fang Xinwo Wang Yuxiang Yan Kaiming Zhang Tingjun Liu Yulan Zhao Chunying Sun Jizhong 《计算机科学技术学报》1987,2(3):163-173
A new type of high performance array processor system is presented in this paper.Unlikethe conventional host-peripheral array processor systems,this system is designed with afunctionally distributed approach.The design philosophy is described first.Then the hardwareorganizations of two concrete systems,namely:150-AP and GF-10/12,including thecommunication between processors are shown.Some attractive system performances for usersprograms are also given. 相似文献
12.
M.J. Kascic 《Parallel Computing》1984,1(1):35-44
The raw performance of vector processors such as the CDC CYBER-205 has been well documented. The ability to apply this raw power to ever more complex algebraic algorithms has been reported in [9]. The final step in making computers of this class truly the revolutionary tools they are claimed to be is to develop whole applications that perform at a significant fraction of the raw power. This involves two distinct subclasses of problems. On the one hand, there are those pre-existing applications that must be mapped onto vector processors in such a way that not only is performance maintained, but also a (sometimes vague) set of computational boundary conditions of the user community is satisfied. On the other hand, there are those models which are developed ab initio with machines such as the CYBER-205 in mind. The development of solutions to problems in the former class involves psychology and politics as well as mathematics and computer science. We limit ourselves here to reporting on an example of the latter class, viz. a model to study a particular fluid-dynamic phenomenon, that was specifically designed with the CYBER-205 in mind. 相似文献
13.
Hardware and software codesign and flexibility requirements often necessitate embedded application-specific instruction-set processors in system-on-chip designs. Spaceman, a reusable stack-processor virtual component, offers a customer-configurable instruction set; parameterizable bus widths, stack depths, and stack access ranges; and selectable bus interfaces 相似文献
14.
15.
《Computer Networks》2003,41(5):623-640
We present a design methodology for a modular network processor architecture that leads to a balanced, service-defined mix between programmable processor cores, configurable hardware assists, and specialized coprocessors. Whereas the processor cores address the flexibility and extendibility needs of the networking market, the hardware components offload the processors, or even allow them to be bypassed for certain network processor-typical tasks to optimize chip area, performance, and power efficiency. We describe the rationale behind the selected functional partitioning in hardware and software components and discuss the challenges of designing the hardware components, and of organizing and integrating the programmable cores. We quantify our approach with a performance evaluation of the overall system. 相似文献
16.
17.
18.
Zahari Zlatev 《Parallel Computing》1988,8(1-3):301-312
Computations involving symmetric, positive definite and band matrices are kernel operations in the numerical treatment of many models arising in science and engineering. It is desirable to achieve a high level of performance when such operations are to be carried out on a vector processor. If the operations are performed by rows or columns (as in the EXTENDED BLAS subroutines), then the loops are vectorized but the speed of computations, measured in Mflops, is not very high, because the arrays involved are normally short. Therefore the computations should be organized by diagonals. Furthermore, some special devices are to be applied in order to unrol the loops. Finally, one should be careful with the storage scheme. It is demonstrated that if (i) the computations are organized by diagonals, (ii) the main loops are unrolled and (iii) the storage scheme is such that the work with some zero-elements is avoided, then the speed of computations is nearly the same as that obtained in the computations with dense matrices. If a particular vector machine is in use (in our case a CRAY X-MP computer), then the speed can be increased further by (iv) coding some basic operations in machine language and (v) using the different processors of the vector computer in parallel. The efficiency of the exploitation of the special features of the particular computer that is to be used is also illustrated by numerical examples.
Kernel subroutines performing matrix-vector multiplications are described. Representative tests are used to demonstrate the efficiency of these kernels. 相似文献
19.
多态并行处理器的数据通信和路由器的设计 总被引:2,自引:1,他引:2
随着多核技术的发展,核间通信问题面临新的挑战,核间通信性能决定了整个多核处理器的性能。通过分析多核处理器的数据通信需求,提出了一种适用于多态并行处理器的数据通信结构。该结构采用邻接共享寄存器实现的核间近邻通信和路由器硬件加速结构实现的远程通信两种数据通信方式,远程通信机制的路由器使用输入缓存机制实现,采用经典的确定性路由算法——XY路由算法实现了路由计算,加入多播和容错技术,采用专用的仲裁机制简化了设计复杂度。这些改进降低了处理器的核间通信延迟和功耗,提高了多态并行处理器的性能。 相似文献
20.
针对企业生产过程中存在大量原始数据需要实时处理的问题,设计并实现了一个基于自定义架构的局部处理机.在设计之初以Hadoop的并行架构为参考,对MapReduce的工作原理和缓存方式进行了分析,在此基础上根据实际生产环境设计了一个"多类线程协同处理"的程序架构,并辅以两类自定义的数据缓存方式,保证了分布式系统中的局部处理机在接收、计算、上传各环节的并发性和正确性.该系统投入实际生产并连续使用一年有余,实现了将企业多个车间生成的原始数据进行实时处理的预期目标,具有很好的稳定性、有效性和可扩展性.实际应用结果表明,自定义的程序架构和有效的缓存方式能实现大量数据的同步处理及分析. 相似文献