期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

曹继军苏金树《计算机工程与设计》2008,29(10):2431-2435

网络流媒体等新型应用的快速发展对路由器具备组播功能提出了迫切需求.针对如何在基于网络处理器的路由器中高效地实现组播协议进行了研究.在路由器标准功能软件基础上实施扩展,提出了IGMP和PIM等组播协议实现的软件结构,利用网络处理器灵活可编程性与高性能的优点,对其关键技术进行了设计和实现.协议测试结果表明,该路由器组播协议系统运行效果良好,最后,展望了IP组播技术发展. 相似文献

2.

A high-performance application data environment for large-scale scientific computations

Shen X. Liao W.-K. Alok Choudhary Memik G. Kandemir M. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(12):1262-1274

Effective high-level data management is becoming an important issue with more and more scientific applications manipulating huge amounts of secondary-storage and tertiary-storage data using parallel processors. A major problem facing the current solutions to this data management problem is that these solutions either require a deep understanding of specific data storage architectures and file layouts to obtain the best performance (as in high-performance storage management systems and parallel file systems), or they sacrifice significant performance in exchange for ease-of-use and portability (as in traditional database management systems). We discuss the design, implementation, and evaluation of a novel application development environment for scientific computations. This environment includes a number of components that make it easy for the programmers to code and run their applications without much programming effort and, at the same time, to harness the available computational and storage power on parallel architectures. 相似文献

3.

Fei Teng 64 Stream Processing System: Architecture, Compiler, and Programming

Yang Xuejun Yan Xiaobo Xing Zuocheng Deng Yu Jiang Jiang Du Jing Zhang Ying 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(8):1142-1157

The stream architecture is a novel microprocessor architecture with wide application potential. It is critical to study how to use the stream architecture to accelerate scientific computing programs. However, existing stream processors and stream programming languages are not designed for scientific computing. To address this issue, we design and implement a 64-bit stream processor, Fei Teng 64 (FT64), which has a peak performance of 16 Gflops. FT64 supports two kinds of communications, message passing and stream communications, based on which, an interconnection architecture is designed for a FT64-based high-performance computer. This high-performance computer contains multiple modules, with each module containing eight FT64s. We also design a novel stream programming language, Stream Fortran 95 (SF95), together with the compiler SF95Compiler, so as to facilitate the development of scientific applications. We test nine typical scientific application kernels on our FT64 platform to evaluate this design. The results demonstrate the effectiveness and efficiency of FT64 and its compiler for scientific computing. 相似文献

4.

A highly flexible,distributed multiprocessor architecture for network processing

《Computer Networks》2003,41(5):563-586

Network processors (NPs) are an emerging field of programmable processors that are optimized to implement data plane packet processing networking functions. Unlike the general-purpose CPUs that rely heavily on caching for improving performance, the lack of locality in packet processing and need for high-performance I/O have forced designers to come up with innovative architectures that can hide memory latency while still processing packets at high data rates. Most of these NPs use some type of multiprocessing in combination with a hierarchy of memory types to achieve high performance. In addition, to keep up with packets arriving at high data rates over multiple incoming media interfaces, an NP must perform fast I/O and memory operations such as packet storage, table lookup, and extraction of fields in packet headers. We describe an architecture that uses a combination of distributed memory architecture and one or more multithreaded processors to achieve the necessary performance. We describe the challenges in programming such a processor including the issues related to consistency and maintaining packet ordering. We also present a programming model for generic network applications that uses software pipelines. We then demonstrate the use of the programming model in implementing two applications, namely, mapping traffic management algorithms onto a multithreaded architecture and an implementation of a media gateway based on voice-over-AAL2. 相似文献

5.

主从式单边异构多核处理器编程模型和编译架构

下载免费PDF全文

李春江杨学军《计算机工程与科学》2009,31(8)

主从式单边异构体系结构的异构多核处理器广泛应用于面向专门应用领域的计算加速,如异构多核嵌入式处理器、DSP、SoC等;高性能的该类处理器也可用于一些大规模科学和工程计算问题的处理。主从式单边异构处理器对编程模型和编译技术提出了很多挑战性问题,如编程模型的选择、编程语言的设计、编译器架构设计以及运行库的设计等。本文分析了这一类处理器结构特点和执行模型,认为功能卸载模型是最适用于这一体系结构的编程模型;并分析了面向功能卸载模型的编程语言设计关键问题,提出了编译系统的架构,讨论了相应的运行库设计问题。相似文献

6.

GPU并行计算编程技术介绍

王泽寰王鹏《数据与计算发展前沿》2013,4(1):81-87

近年来GPU通用计算蓬勃发展。程序开发者和GPU通用计算应用程序的数量增长很快。针对不同的应用程序的要求和程序开发者不同的使用习惯,围绕着CUDA架构的 GPU,NVIDIA及其合作伙伴共同开发了很多种不同的编程技术。本文详细介绍了它们的特点和适用对象。希望可以帮助广大开发人员针对自己的编程习惯和程序要求选择最为合适的编程技术。相似文献

7.

A Scalable Farm Skeleton for Hybrid Parallel and Distributed Programming

Steffen Ernsting Herbert Kuchen 《International journal of parallel programming》2014,42(6):968-987

Multi-core processors and clusters of multi-core processors are ubiquitous. They provide scalable performance yet introducing complex and low-level programming models for shared and distributed memory programming. Thus, fully exploiting the potential of shared and distributed memory parallelization can be a tedious and error-prone task: programmers must take care of low-level threading and communication (e.g. message passing) details. In order to assist programmers in developing performant and reliable parallel applications Algorithmic Skeletons have been proposed. They encapsulate well-defined, frequently recurring parallel and distributed programming patterns, thus shielding programmers from low-level aspects of parallel and distributed programming. In this paper we take on the design and implementation of the well-known Farm skeleton. In order to address the hybrid architecture of multi-core clusters we present a two-tier implementation built on top of MPI and OpenMP. On the basis of three benchmark applications, including a simple ray tracer, an interacting particles system, and an application for calculating the Mandelbrot set, we illustrate the advantages of both skeletal programming in general and this two-tier approach in particular. 相似文献

8.

Designing an API at an appropriate abstraction level for programming social robot applications

《Journal of Visual Languages and Computing》2017

Whilst robots are increasingly being deployed as social agents, it is still difficult to program them to interact socially. To create usable tools for programming these robots, tool developers need to know what abstraction levels are appropriate for programming social robot applications. We explore this through the iterative design and evaluation of an API for programming social robots. The results show that high level primitives, with a close mapping to social interaction, are suitable for programming social robot applications. However, the abstraction level should not be so high that it takes away too much control from programmers. This has the potential to enable programmers to produce high quality social robot applications with less programming effort. 相似文献

9.

The Alpha 21364 network architecture

《Micro, IEEE》2002,22(1):26-35

The Alpha 21364 processor provides a high-performance, scalable, and reliable network architecture with a router that runs at 1.2 GHz and has a peak bandwidth of 22.4 Gbytes/s. Supporting configurations of up to 128 processors, this network architecture is well suited for communication-intensive server applications 相似文献

10.

基于网络处理器的MPLS VPN协议的研究与实现 总被引：1，自引：0，他引：1

王勇军黄清元《计算机工程与科学》2006,28(6):9-12

MPLS VPN是下一代互联网的主流安全协议之一,本文针对如何在基于网络处理器的高性能路由器中高效实现MPLS VPN协议开展研究.本文在路由器标准功能的软件基础上进行扩展,提出了基于网络处理器的MPLS VPN协议实现软件结构;利用网络处理器灵活可编程性与高性能的优点,对其关键技术进行了设计与实现;充分发挥了网络处理器在快速协议扩展方面的优势,同时也对网络处理器软件升级的方法进行了有益探索. 相似文献

11.

Exploring high-performance processor architecture beyond the exascale

Xiang-Hui Xie Xun Jia 《浙江大学学报:C卷英文版》2018,19(10):1224-1229

The ever-increasing need for high performance in scientific computation and engineering applications will push high-performance computing beyond the exascale. As an integral part of a supercomputing system, highperformance processors and their architecture designs are crucial in improving system performance. In this paper, three architecture design goals for high-performance processors beyond the exascale are introduced, including effective performance scaling, efficient resource utilization, and adaptation to diverse applications. Then a high-performance many-core processor architecture with scalar processing and application-specific acceleration (Massa) is proposed, which aims to achieve the above three goals by employing the techniques of distributed computational resources and application-customized hardware. Finally, some future research directions regarding the Massa architecture are discussed. 相似文献

12.

基于遗传算法的网络处理器异构资源映射方法研究

张晓明赵科张民选《小型微型计算机系统》2007,28(2):341-345

随着深亚微米工艺的迅速发展,现代网络处理器芯片广泛采用MPSoC(Multi-Processor System on Chip)体系结构实现,继而需要一种新的设计方法指导网络处理器体系结构设计.本文研究了网络处理器的设计方法,提出了一种基于遗传算法的网络应用到网络处理器异构硬件资源映射方法.该方法首先对网络处理器设计的问题空间进行分析,采用加权数据流进程网络描述网络应用,并参数化各种硬件资源,最后构建遗传算法来完成网络应用到异构硬件资源的映射,形成网络处理器体系结构设计方案. 相似文献

13.

An accelerator design for speedup of Java execution in consumer mobile devices

Lu Yan Zheng Liang 《Computers & Electrical Engineering》2009,35(6):904-919

In today’s consumer electronics market, Java has become one of the most important programming languages for the rapid development of mobile applications – spanning from home appliances/controllers, mobile and communication devices, to network-centric applets. However, the demand for high-performance low-power Java-based consumer mobile applications puts forward new challenges to the system design and implementation. This paper analyzes the energy consumption, execution efficiency, and speed issues of Java applications in a typical consumer mobile device environment. By adopting a hardware-assisted approach, we introduce a Java accelerator with a companion Java virtual machine. The accelerator is designed in an asynchronous style, and can be integrated with most existing processors and operating systems. The core architecture, design philosophy, and implementation considerations are presented in detail in this paper. 相似文献

14.

How multimedia workloads will change processor design 总被引：1，自引：0，他引：1

Diefendorff K. Dubey P.K. 《Computer》1997,30(9):43-45

Workloads drive architecture design and will change in the next two decades. For high-performance, general-purpose processors, there is a consensus that multimedia will continue to grow in importance. The authors predict these processors will incorporate more media processing capabilities, eventually bringing about the demise of specialized media processors, except perhaps, in embedded applications. These enhanced general-purpose processor capabilities will arise from multimedia applications that require real-time response, continuous-media data types and significant fine-grained data parallelism 相似文献

15.

高通量众核处理器设计

叶笑春李文明张洋张浩王达范东睿《数据与计算发展前沿》2020,2(1):70-84

【目的】随着云计算、物联网以及人工智能等新型高通量应用的迅速兴起,高性能计算的主要应用从传统的科学与工程计算为主逐步演变为以新兴数据处理为核心,这给传统处理器带来了巨大的挑战,而高通量众核处理器作为面向此类应用的新型处理器结构成为重要的研究方向。【方法】针对上述问题,本文分析了高通量典型应用特征,从数据处理端、传输端以及存储端三个核心环节开展了高通量众核处理器关键技术设计探讨,包括实时任务动态调度、高密度片上网络设计、片上存储层次优化等。【结果】实验结果显示上述机制可以有效确保任务的服务质量,提升网络的数据吞吐率,以及简化片上存储层次。【结论】随着万物互联时代对高并发强实时处理的迫切需求,高通量众核处理器有望成为未来数据中心的核心处理引擎。相似文献

16.

Scalable mpNoC for massively parallel systems – Design and implementation on FPGA

M. Baklouti Y. Aydi Ph. Marquet J.L. Dekeyser M. Abid 《Journal of Systems Architecture》2010,56(7):278-292

The high chip-level integration enables the implementation of large-scale parallel processing architectures with 64 and more processing nodes on a single chip or on an FPGA device. These parallel systems require a cost-effective yet high-performance interconnection scheme to provide the needed communications between processors. The massively parallel Network on Chip (mpNoC) was proposed to address the demand for parallel irregular communications for massively parallel processing System on Chip (mppSoC). Targeting FPGA-based design, an efficient mpNoC low level RTL implementation is proposed taking into account design constraints. The proposed network is designed as an FPGA based Intellectual Property (IP) able to be configured in different communication modes. It can communicate between processors and also perform parallel I/O data transfer which is clearly a key issue in an SIMD system. The mpNoC RTL implementation presents good performances in terms of area, throughput and power consumption which are important metrics targeting an on chip implementation. mpNoC is a flexible architecture that is suitable for use in FPGA-based parallel systems. This paper introduces the basic mppSoC architecture. It mainly focuses on the mpNoC flexible IP based design and its implementation on FPGA. The integration of mpNoC in mppSoC is also described. Implementation results on a Stratix II FPGA device are given for three data-parallel applications ran on mppSoC. The obtained good performances justify the effectiveness of the proposed parallel network. It is shown that the mpNoC is a lightweight parallel network making it suitable for both small as well as large FPGA-based parallel systems. 相似文献

17.

A single-chip multiprocessor

Nayfeh B.A. Olukotun K. 《Computer》1997,30(9):79-85

Presents the case for billion-transistor processor architectures that will consist of chip multiprocessors (CMPs): multiple (four to 16) simple, fast processors on one chip. In their proposal, each processor is tightly coupled to a small, fast, level-one cache, and all processors share a larger level-two cache. The processors may collaborate on a parallel job or run independent tasks (as in the SMT proposal). The CMP architecture lends itself to simpler design, faster validation, cleaner functional partitioning, and higher theoretical peak performance. However for this architecture to realize its performance potential, either programmers or compilers will have to make code explicitly parallel. Old ISAs will be incompatible with this architecture (although they could run slowly on one of the small processors) 相似文献

18.

Study of OpenMP applications on the InfiniBand-based software distributed shared-memory system

Inho Park Seon Wook Kim 《Parallel Computing》2005,31(10-12):1099

For the past decades computer engineers have focused on building high-performance and large-scale computer systems with low-cost. One of the examples is a distributed-memory computer system like a cluster, where fast processing nodes to use commodity processors are connected through a high speed network. But it is not easy to develop applications on this system, because a programmer must consider all data and control dependences between processes and program them explicitly. For alleviating this problem the distributed virtual shared-memory (DVSM) system has been proposed. It is well known that the performance of the DVSM system highly depends on the network’s performance and programming semantics, and currently its performance is very limited on a conventional network. Recently many advanced hardware-based interconnection technologies have been introduced, and one of them is the InfiniBand Architecture (IBA) which supports shared-memory programming semantics by means of remote direct-memory access (RDMA) and atomic operations. In this paper, we present the implementation of our InfiniBand-based DVSM system and analyze the performance of SPEC OMP benchmarks in detail by comparing with the DVSM based on the traditional network architecture and the hardware shared-memory multiprocessor (SMP) system. As experiment result, we show that our DVSM system to use full features of the IBA can improve the performance significantly over the IPoIB-based traditional system on the IBA, and furthermore the performance of one application on the IBA-based DVSM system is better than on the hardware SMP. 相似文献

19.

The basics of performance-monitoring hardware

Sprunt B. 《Micro, IEEE》2002,22(4):64-71

Most modern, high-performance processors have special, on-chip hardware that monitors processor performance. Data collected by this hardware provides performance information on applications, the operating system, and the processor. These data can guide performance improvement efforts by providing information that helps programmers tune the algorithms used by the applications and operating system, and the code sequences that implement those algorithms 相似文献

20.

基于网络处理器IXP2400系统的软件设计 总被引：1，自引：0，他引：1

葛敬国《计算机科学》2006,33(2):269-273

网络处理器高性能的包处理能力及可编程的灵活性适应了当前网络发展需求，广泛应用于高端路由器、边缘多业务宽带接入、媒体网关和安全等领域。基于网络处理器成功构建一个网络系统的关键在于网络处理器软件系统的设计与开发，其核心问题就是要软件系统充分发挥网络处理器灵活性和高性能的特点，面向网络处理器的硬件体系结构编程，合理利用网络处理器，为优化数据包处理的各种硬件资源设计高效的多处理器、多线程并行机制。本文以网络处理器IXP2400实现高速网络应用为例，介绍基于网络处理器系统的软件开发过程和设计方法，探讨开发高性能的微码软件的策略和技术。首先介绍了基于网络处理器系统的硬件体系结构配置和软件开发框架、应用软件的系统分析和总体设计，着重分析了基于网络处理器系统的多微引擎、多线程的并行处理机制，以及互斥问题和包排序问题的解决方法，最后讨论了系统的性能评估方法。相似文献