期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

KAIST image computing system (KICS): A parallel architecture for real-time multimedia data processing

JaeHo Hyung-Sun GeonYoung HyunWook 《Journal of Systems Architecture》2000,46(15):1403-1418

An efficient parallel architecture is proposed for high-performance multimedia data processing using multiple multimedia video processors (MVP; TMS320C80), which are fully programmable general digital signal processors (DSP). This paper describes several requirements for a multimedia data processing system and the system architecture of an image computing system called the KAIST Image Computing System (KICS). The performance of the KICS is evaluated in terms of its I/O bandwidth and the execution time for some image processing functions. An application of the KICS to the real-time Moving Picture Expert Group 2 (MPEG-2) encoder is introduced. The programmability and the high-speed data-access capability of the KICS are its most important features as a high-performance system for real-time multimedia data processing. 相似文献

2.

可编程多媒体专用微处理器MVP

郑飞《微处理机》1995,(3):12-17

可编程多媒体微处理器是当今热门的多媒体处理系统的核心，ＴＩ公司新近推出的ＭＶＰ微处理器正是这样一个集成了多个并行操作的处理器，大量在片存储器及视频控制器与智能ＤＭＡ控制器等功能单元的全可编程多媒体专用微处理器，本文首先结合ＭＶＰ的设计思想介绍其总体结构，然后分别详细讨论ＭＶＰ集成的各个片内处理器的组织结构，处理器间的互连结构与实现以及传送控制与视频控制等单元，最后略述ＭＶＰ微处理器的应用前景。相似文献

3.

A single-chip multiprocessor

Nayfeh B.A. Olukotun K. 《Computer》1997,30(9):79-85

Presents the case for billion-transistor processor architectures that will consist of chip multiprocessors (CMPs): multiple (four to 16) simple, fast processors on one chip. In their proposal, each processor is tightly coupled to a small, fast, level-one cache, and all processors share a larger level-two cache. The processors may collaborate on a parallel job or run independent tasks (as in the SMT proposal). The CMP architecture lends itself to simpler design, faster validation, cleaner functional partitioning, and higher theoretical peak performance. However for this architecture to realize its performance potential, either programmers or compilers will have to make code explicitly parallel. Old ISAs will be incompatible with this architecture (although they could run slowly on one of the small processors) 相似文献

4.

A consistency-free memory architecture for sort-last parallel rendering processors

《Journal of Systems Architecture》2007,53(5-6):272-284

Current rendering processors are aiming to process triangles as fast as possible and they have the tendency of equipping with multiple rasterizers to be capable of handling a number of triangles in parallel for increasing polygon rendering performance. However, those parallel architectures may have the consistency problem when more than one rasterizer try to access the data at the same address. This paper proposes a consistency-free memory architecture for sort-last parallel rendering processors, in which a consistency-free pixel cache architecture is devised and effectively associated with three different memory systems consisting of a single frame buffer, a memory interface unit, and consistency-test units. Furthermore, the proposed architecture can reduce the latency caused by pixel cache misses because the rasterizer does not wait until cache miss handling is completed when the pixel cache miss occurs. The experimental results show that the proposed architecture can achieve almost linear speedup upto four rasterizers with a single frame buffer. 相似文献

5.

Integrated performance models for SPMD applications and MIMDarchitectures

Cremonesi P. Gennaro C. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(7):745-757

Introduces queuing network models for the performance analysis of SPMD (single-program, multiple-data) applications executed on general-purpose parallel architectures such as MIMD (multiple-input, multiple data) and clusters of workstations. The models are based on the pattern of computation, communication and I/O operations of typical parallel applications. Analysis of the models leads to the definition of speedup surfaces which capture the relative influence of processors and I/O parallelism and show the effects of different hardware and software components on the performance. Since the parameters of the models correspond to measurable program and hardware characteristics, the models can be used to anticipate the performance behavior of a parallel application as a function of the target architecture (i.e. the number of processors, number of disks, I/O topology, etc.) 相似文献

6.

异构多核处理器体系结构设计研究 总被引：2，自引：0，他引：2

陈芳园张冬松王志英《计算机工程与科学》2011,33(12):27-36

多核技术成为当今处理器发展的重要方向,异构多核处理器由于可将不同类型的计算任务分配到不同类型的处理器核上并行处理,从而为不同需求的应用提供更加灵活、高效的处理机制而成为当今研究的热点.本文从体系结构的角度探讨了异构多核处理器设计中的关键点,从内核结构、互连方式、存储系统、操作系统支持、测试与验证、动态电压调节等方面分析... 相似文献

7.

A methodology for digital real time simulation of dynamic systems using modern DSPs

《Simulation Practice and Theory》1997,5(2):137-151

The speedup factor in real time simulation of dynamic systems using multiprocessor resources depends on: the architecture of the multiprocessor system, type of interconnection between parallel processors, numerical methods and techniques used for discretization and task assignment and scheduling policy. The minimization of the number of processors needed for real time simulation requires the minimization of processors times for interprocessor communications and efficient scheduling policy. Therefore, this article presents a methodology for the real time simulation of dynamic systems including a new pre-emptive static assignment and scheduling policy. The advantages of applying digital signal processor with parallel architecture, for example TMS320C40, in real time simulation have been described. Some important issues in real time architectures necessary for efficient multiprocessor real time simulations, such as multiple I/O channels, concurrent I/O and CPU processing, direct high speed interprocessor communications, fast context switching, multiple busses, multiple memories, and powerful arithmetic units are inherent to this processor. These features minimize interprocessor communication time and maximize sustained CPU performance. 相似文献

8.

Analyzing Scalability of Parallel Algorithms and Architectures

《Journal of Parallel and Distributed Computing》1994,22(3):379-391

The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithm-architecture combination for a problem under different constraints on the growth of the problem size and the number of processors. It may be used to predict the performance of a parallel algorithm and a parallel architecture for a large number of processors from the known performance on fewer processors. For a fixed problem size, it may be used to determine the optimal number of processors to be used and the maximum possible speedup that can be obtained. The objectives of this paper are to critically assess the state of the art in the theory of scalability analysis, and to motivate further research on the development of new and more comprehensive analytical tools to study the scalability of parallel algorithms and architectures. We survey a number of techniques and formalisms that have been developed for studying scalability issues, and discuss their interrelationships. For example, we derive an important relationship between time-constrained scaling and the isoefficiency function. We point out some of the weaknesses of the existing schemes for measuring scalability, and discuss possible ways of extending them. 相似文献

9.

SCMP: A Single-Chip Message-Passing Parallel Computer

Baker James M. Gold Brian Bucciero Mark Bennett Sidney Mahajan Rajneesh Ramachandran Priyadarshini Shah Jignesh 《The Journal of supercomputing》2004,30(2):133-149

As technology improves and transistor feature sizes continue to shrink, the effects of on-chip interconnect wire latencies on processor clock speeds will become more important. In addition, as we reach the limits of instruction-level parallelism that can be extracted from application programs, there will be an increased emphasis on thread-level parallelism. To continue to improve performance, computer architects will need to focus on architectures that can efficiently support thread-level parallelism while minimizing the length of on-chip interconnect wires. The SCMP (Single-Chip Message-Passing) parallel computer system is one such architecture. The SCMP system includes up to 64 processors on a single chip, connected in a 2-D mesh with nearest neighbor connections. Memory is included on-chip with the processors and the architecture includes hardware support for communication and the execution of parallel threads. Since there are no global signals or shared resources between the processors, the length of the interconnect wires will be determined by the size of the individual processors, not the size of the entire chip. Avoiding long interconnect wires will allow the use of very high clock frequencies, which, when coupled with the use of multiple processors, will offer tremendous computational power. 相似文献

10.

A transputer radar ESM data processor

Richard Beton James Kingdon Colin Upstill 《Concurrency and Computation》1991,3(4):303-313

A parallel radar ESM data processing system using occam and transputers is presented. High-level software solutions for radar ESM to run on multiple transputer systems were investigated, to achieve high throughput cost-effectively. The system is primarily a vehicle to support further algorithm research and to permit benchmarking; it is expandable and flexible, and can be modified to run on different numbers of processors with almost linear improvement in performance. The architecture has a hybrid pipelined structure; the topology within each stage of a concurrent processor pipeline is chosen to optimize the processing and communications for the required function. A novel technique is described for improving the load balancing of such algorithmically decomposed parallel systems. 相似文献

11.

多核环境下AREM模式混合并行计算研究

下载免费PDF全文

赵军吴建平宋君强辜旭赞《计算机工程与应用》2011,47(21):61-63

使用多核处理器已成为构建高性能计算机系统的主流方式。结合多核高性能计算机系统集共享内存结构和分布式内存结构于一体的体系结构特点,对AREM模式开展MPI/OpenMP混合并行计算研究与实现。性能测试结果表明,使用MPI/OpenMP混合并行计算可以将并行应用扩展至更大处理机规模,缩短计算时间,不对原程序结构做大的改动、以增量方式和较小的并行化代价,取得比较好的并行计算效果。相似文献

12.

Web search for a planet: The Google cluster architecture 总被引：11，自引：0，他引：11

Barroso L.A. Dean J. Holzle U. 《Micro, IEEE》2003,23(2):22-28

Amenable to extensive parallelization, Google's web search application lets different queries run on different processors and, by partitioning the overall index, also lets a single query use multiple processors. to handle this workload, Googless architecture features clusters of more than 15,000 commodity-class PCs with fault tolerant software. This architecture achieves superior performance at a fraction of the cost of a system built from fewer, but more expensive, high-end servers. 相似文献

13.

An optimum parallel architecture for high-speed real-time digital signal processing

Lang G.R. Dharssi M. Longstaff F.M. Longstaff P.S. Metford P.A.S. Rimmer M.T. 《Computer》1988,21(2):47-57

The authors describe a parallel processing architecture for real-time digital signal processing that has demonstrated virtually 100% data processing efficiency in a number of areas. The Teamed-Architecture Signal Processor (T-ASP) is a field-proven, commercially available optimal system solution to the extremely high computational and I/O rates encountered in modern digital-signal-processing environments. The design of T-ASP involves the consideration and implementation of many architectural concepts used to enhance the performance of a computer, including programmability, parallel processing, vector processing and pipelining, memory interleaving, double cache memories, multiple high-speed I/O interfaces, and segmentation of the processors for elimination of both CPU and data-handling overhead. The authors discuss hardware architecture design and implementation; hardware management; and software architecture design and implementation.<> 相似文献

14.

Integrated performance models for SPMD applications and MIMD architectures

Cremonesi P. Gennaro C. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(12):1320-1332

This paper introduces queuing network models for the performance analysis of SPMD applications executed on general-purpose parallel architectures such as MIMD and clusters of workstations. The models are based on the pattern of computation, communication, and I/O operations of typical parallel applications. Analysis of the models leads to the definition of speedup surfaces which capture the relative influence of processors and I/O parallelism and show the effects of different hardware and software components on the performance. Since the parameters of the models correspond to measurable program and hardware characteristics, the models can be used to anticipate the performance behavior of a parallel application as a function of the target architecture (i.e., number of processors, number of disks, I/O topology, etc). 相似文献

15.

Google体系结构分析

胡波《现代计算机》2010,(4):107-109

为了适应大规模广范围高密度的搜索,Google的网络搜索应用允许在不同的处理器上运行不同的查询;同时,通过对全局索引进行分解,Google也允许在多个处理器上运行单一的查询.Google的主旨是,使用市场上流行的具有通用的PC,以达到大型机所具有的高性能.通过对已有资料进行搜集,对Google的体系结构、工作性能等进行分析,目的是通过对Google的调研,给网格建设提供一定的借鉴. 相似文献

16.

一种面向多核系统的并行计算任务分配方法 总被引：2，自引：0，他引：2

卢宇彤杨学军所光《计算机研究与发展》2009,46(Z1)

随着多核处理器的普及,目前的大规模并行处理系统普遍采用多核处理器,这对于资源管理和调度提出了更高的要求.提出了基于共享Cache资源划分的方法,建立了面向多核处理器支持Cache资源分配的进程调度模型,设计并实现了并行任务到多核处理器的映射算法,更好地解决了大规模资源管理系统中面向多核处理器的任务分配问题,降低了使用共享Cache的多个进程运行时的相互干扰,提升了应用程序性能. 相似文献

17.

SPP外围子系统的设计与实现

吴中海叶澄清《小型微型计算机系统》1995,16(7):15-20

并行处理机外围子系统的设计和实现技术直接影响整个系统的性能价格比，本文根据ＳＰＰ体系结构的特点和实际应用需要，在前端服务器与ＳＭ／ＳＳＭ之间设计了专用的Ｉ／Ｏ处理机，使得系统Ｉ／Ｏ设备与ＳＭ／ＳＳＭ之间直接进行高速数据传送，从而大大提高系统的Ｉ／Ｏ性能。在Ｉ／Ｏ处理机的设计中，采用了ｉ８６０＋８２３８０＋ＳＲＡＭ的总体结构，从而实现了处理机访问主存和ＤＭＡ控制器访问ＳＲＡＭ之间的并行。相似文献

18.

SMA:前瞻性多线程体系结构 总被引：4，自引：1，他引：3

肖刚周兴铭徐明邓鹍《计算机学报》1999,22(6):582-590

提出了一种新的ＩＬＰ处理器体系结构－前瞻性多线程体系的结构,简称ＳＭＡ．它结合了前瞻性执行机制和多线程执行机制,以整个线程为长步进行前瞻性执行,多个线程并行执行并且共享处理器硬件资源,这样,处理器既通过组合每个线程的指令窗口形成一个大的动态指令窗口,开发出程序中更大的ＩＬＰ,又利用多线程执行机制屏蔽各种长延迟操作,达到较高的资源利用率;介绍了ＳＭＡ执行模型,并讨论了ＳＭＡ处理器的实现和其中的关键技相似文献

19.

多核架构下的数据处理算法优化策略综述

下载免费PDF全文

陈伟杜凌霞陈红《计算机科学与探索》2011,5(12):1057-1075

多核处理器,尤其是单芯片多处理器(chip multi-processor,CMP)能够提供强大的共享内存的并行资源,然而单核处理器上的程序和算法并不能充分利用多核架构提供的并行计算资源,因此必须针对多核体系架构特点,对算法进行改进优化,提高算法的执行性能。以优化程序局部性、减少cache访问冲突、提高线程并行度、充分利用单指令多数据流(single instruction multipledata,SIMD)并行和带宽优化等几方面为出发点,归纳和分析了多核处理器上数据处理算法的相关优化策略,并对多核算法进行了总结评述。最后阐述了该领域亟待解决的诸多问题,展望了未来的研究发展方向。相似文献

20.

Massively parallel architectures for large scale neural networksimulations

Fujimoto Y. Fukuda N. Akabane T. 《Neural Networks, IEEE Transactions on》1992,3(6):876-888

A toroidal lattice architecture (TLA) and a planar lattice architecture (PLA) are proposed as massively parallel neurocomputer architectures for large-scale simulations. The performance of these architectures is almost proportional to the number of node processors, and they adopt the most efficient two-dimensional processor connections for WSI implementation. They also give a solution to the connectivity problem, the performance degradation caused by the data transmission bottleneck, and the load balancing problem for efficient parallel processing in large-scale neural network simulations. The general neuron model is defined. Implementation of the TLA with transputers is described. A Hopfield neural network and a multilayer perceptron have been implemented and applied to the traveling salesman problem and to identity mapping, respectively. Proof that the performance increases almost in proportion to the number of node processors is given. 相似文献