期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘祯刘斌郑凯《软件学报》2007,18(12):3115-3123

路由器需要以较低的代价灵活、高速地实现路由查找这一基本功能.为网络处理器设计了一种基于软件的路由查找高速缓存算法.网络处理器片上高速存储器中的一部分空间被划分出来,由指令代码来维护一个路由查找结果缓存表.通过选择合适的哈希函数,平衡表项之间的冲突并刷新复杂度,该算法可以缩短路由查找的延迟,减少多处理单元对存储器总线的竞争,为其他网络应用提供更多的处理时间.基于真实网络流量的实验表明,即便每个处理单元中仅有少量表项,网络处理器的吞吐量仍然可以得到有效的提升. 相似文献

2.

支持 AltiVec技术的多媒体协处理单元的研究 *

黄小平樊晓桠张盛兵《计算机应用研究》2008,25(10):3161-3164

通过对嵌入式处理器进行多媒体处理能力的扩展可增强其对多媒体数据的处理能力。以 32 bit龙腾嵌入式处理器为基础 ,研究 AltiVec技术以及超标量技术 ,设计了该处理器中支持 AltiVec技术的多媒体协处理单元。该单元采用五级流水线 ,将指令动态调度技术分配到不同的流水线中 ,在提高处理性能的同时保证了设计频率。通过多媒体基准程序测试 ,该单元的指令 IPC为 1. 2, SMIC0. 18μm工艺库下 ,频率为 350 MHz,该协处理单元提高了龙腾处理器的性能。相似文献

3.

一种神经网络并行处理器的体系结构

钱艺李占才李昂王沁《小型微型计算机系统》2007,28(10):1902-1906

神经网络处理系统所能实现神经网络模型的种类越多其通用性越好,应用范围就越广泛.提出了一种神经网络并行处理器的体系结构,能以较高的并行度实现典型的前馈网络-BP网络和典型的反馈网络-Hopfield网络的算法.该处理器以SIMD(Single Instruction Multiple Data)为主要计算结构,并结合这两种网络算法的特点设计了一维脉动阵列和全联通的互连网络,能够方便灵活地实现处理单元之间的数据共享.实验结果表明该体系结构有效地提高了神经网络的运行速度. 相似文献

4.

基于网络处理器的高速转发模块设计

贾玉君孟芳《计算机与网络》2009,(13):47-50

随着网络应用业务的不断增长,传统网络业务处理方案已不能满足新一代智能化网络设备设计要求。网络处理器具有强大协议处理能力和灵活可编程性特点,是下一代网络的核心技术之一。在分析研究了NP-2网络处理器的功能特点之后,介绍了一种基于网络处理器的高速转发模块设计方案,包括硬件设计、数据包转发流程,并给出了有关的转发性能测试数据。经测试验证,这种设计方案具有很高的报文处理和转发效率。相似文献

5.

高速防火墙的研究与实现

李晓明《微机发展》2004,14(6):104-105,108

随着宽带网络的日益普及，网络安全已成为信息技术领域一个重要的议题，现有的防火墙架构已无法满足高速网络环境下的应用。网络处理器是专为IP网络包处理设计的芯片，能够以线速处理网络传输数据。文中论述了网络处理器的体系结构和功能，并通过分析防火墙的几种不同架构，给出一种基于网络处理器的实现方案，为千兆网络提供了路由、过滤，以及网络地址转换等安全防护措施的实现。最后指出了该方案在网络安全领域的应用前景以及网络安全设备的发展方向。相似文献

6.

一种面向高性能计算机的超节点控制器的研究 总被引：1，自引：0，他引：1

王凯陈飞李强李晓民安学军孙凝晖《计算机研究与发展》2011,48(1):1-8

传统高性能计算机的节点由一个处理单元和一个节点控制器组成.为了有效地维护高速缓存一致性,处理单元中的处理器个数会非常有限.因此一台具有千万亿次处理能力的高性能计算机将会有上万个节点,这对互连网络的延迟和带宽都提出了非常高的要求.超节点控制器能够同时连接多个处理单元构成一个超节点,这能够减小互连网络的规模,从而降低互连网... 相似文献

7.

基于分布式架构的显控技术研究与实现

下载免费PDF全文

蔡委哲杨东华邱晗潘奇《计算机测量与控制》2024,32(1):79-84

针对显控终端日益增长的功能、性能需求和国产化应用要求,开展了显控终端架构研究,并提出一种分布式显控架构设计,该架构以计算处理单元为核心,其它硬件模块作为协处理单元,有效整合各个协处理单元的能力,实现复杂的计算处理功能。结合分布式处理架构,采用基于微服务的综合显控软件架构,承载各模块的任务分配、指令分发、数据收集、功能集成,实现显控终端的语多点触控、语音处理、人脸认证、健康监测等应用。相比传统的集中式设计,分布式显控架构解决了多任务频繁切换效率低、通信瓶颈、网络难扩展等问题,具有高性能、高可靠、可扩展等特点。该技术得到充分应用与验证。验证结果表明该分布式架构显控终端可以满足功能需求,可以有效提升终端设备的整体性能。相似文献

8.

网络处理器中协处理器设计方法研究及实现

下载免费PDF全文

张晓明王勇军张民选《计算机工程与科学》2007,29(3):80-83

随着深亚微米工艺的迅速发展,现代网络处理器芯片广泛采用MPSoC体系结构实现。针对网络处理器中协处理器的特点,本文研究了其设计方法,提出了三种多个处理单元间的协处理器共享机制,而后在基于NiosⅡ软核的网络处理器中实现了多种协处理器结构,以支持不同的设计需求。相似文献

9.

一种运算簇间互连通信单元的设计

李斌谢憬毛志刚《计算机工程》2013,(9)

在高性能并行处理器设计中,权衡通信效率与硬件设计开销是一个关键的问题。基于此,在基于簇状处理单元的线性阵列处理器架构前提下,提出一种基于多运算簇处理器结构的运算簇间互连通信设计方案,包括通信单元结构和典型数字信号处理数据传输的应用案例分析。实验结果表明,与传统线性阵列处理器结构相比,该方案可使互连通信单元的相应性能提升30%以上。相似文献

10.

网络处理器处理资源调度研究

张超於志勇张凡《微处理机》2009,30(2)

在多约束和多优化目标的要求下进行网络处理器的处理资源调度,是利用网络处理器开发高性能应用系统的挑战之一.我们首先综述了已有的针对网络处理器平台的处理资源调度算法,然后提出了DAG式网络处理任务在异构、全连通、硬件多线程、片上多处理器结构上的处理资源调度模型,弥补了以往模型在处理单元结构、任务分解方式、调度时机选择等方面的假设或简化,准确体现了现代高级网络处理器的结构和处理特点,最后给出了该模型的模拟实现. 相似文献

11.

网络处理器设计中的存储瓶颈问题

马思瑶尹佳斌孙志刚《计算机研究与发展》2009,46(Z2)

网络处理器设计中的存储瓶颈问题是指网络处理进行FIB(forwarding information base)查表、QoS调度、计数器管理等操作对外部控制存储器访问的延时与网络处理性能难以匹配的问题.目前网络处理器设计采用并行处理的方法隐藏访存延时,但由于设计复杂性和功耗问题,大规模并行技术难以在40Gbps以上的网络处理中继续应用.对当前网络处理器中存储瓶颈问题及其解决方法进行研究,指出其局限性,并针对未来更高性能网络处理,如100Gbps接口网络处理的设计提出了一种新的网络处理模型. 相似文献

12.

Synchronous dataflow architecture for network processors 总被引：1，自引：0，他引：1

Carlstrom J. Boden T. 《Micro, IEEE》2004,24(5):10-18

Network processors are programmable, highly integrated communications circuits optimized to provide processing at high data and packet rates. The packet instruction set computer (PISC) architecture is a synchronous dataflow architecture developed for network processors. It uses a deep pipeline that contains two types of processing elements: PISC processors, which perform programmable data manipulation, and I/O processors, which provide access to shared resources such as look-up table memory, hardware accelerators, or coprocessors. 相似文献

13.

Efficient nonblocking switching networks for interprocessorcommunications in multiprocessor systems

Fong-Chih Shao Yavuz Oruc A. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(2):132-141

The performance of a multiprocessor system depends heavily on its ability to provide conflict free paths among its processors. In this paper, we explore the possibility of using a nonblocking network with O(N log N) edges (crosspoints) to interconnect the processors of an N processor system, We combine Bassalygo and Pinsker's implicit design of strictly nonblocking networks with an explicit construction of expanders to obtain a strictly nonblocking network with -765.18N+352.8N log N edges and 2+log(N/5) depth. We present an efficient parallel algorithm for routing connection requests on this network and implement it on three parallel processor topologies. The implementation on a parallel processor whose processing elements are interconnected as in the Bassalygo-Pinsker network requires O(N log N) processing elements, O(N log N) interprocessor links and it takes O(log N) steps to route any single connection request where each step involves a small number (≈72) of bit-level operations. A contracted or folded version of the same implementation reduces the processing element count to O(N) without increasing the link count or the routing time. Finally, we establish that the same algorithm takes O(log³ N) steps on a perfect shuffle processor with O(N) processing elements. These results improve the crosspoint, depth and routing time complexities of the previously reported strictly nonblocking networks 相似文献

14.

Improving IPS by network processors

Pablo Cascón Julio Ortega Yan Luo Eric Murray Antonio Díaz Ignacio Rojas 《The Journal of supercomputing》2011,57(1):99-108

Many present applications usually require high communication throughputs. Multiprocessor nodes and multicore architectures, as well as programmable NICs (Network Interface Cards) provide new opportunities to take advantage of the available multigigabits per second link bandwidths. Nevertheless, to achieve adequate communication performance levels efficient parallel processing of network tasks and interfaces should be considered. In this paper, we leverage network processors as heterogeneous microarchitectures with several cores that implement multithreading and are suited for packet processing, to investigate on the use of parallel processing to accelerate the network interface, and thus the network applications developed above it. More specifically, we have implemented an intrusion prevention system (IPS) with such a network processor. We describe the IPS we have developed that after its offloaded implementation allows faster packet processing of both normal and corrupted traffic. The benefits from placing the IPS close to the network, by using specialized network processors, give many times lower latency and higher bandwidth available to the legitimate traffic. 相似文献

15.

协议处理中一种基于包调度的优化策略

下载免费PDF全文

张志斌郭莉方滨兴《计算机工程》2007,33(17):20-22,2

网络带宽的增长给协议处理程序的性能提出了更高要求。而程序的Cache行为是目前影响程序性能的重要因素。该文通过对协议处理中指令Cache行为的形式化分析证明,在批量处理中要获得最优的Cache行为是一个NP难问题,提出了一种基于离线包调度的指令Cache行为优化策略,分析了该策略对处理性能可能带来的影响。相似文献

16.

网络处理器的分析与研究 总被引：54，自引：0，他引：54

谭章熹林闯任丰源周文江《软件学报》2003,14(2):253-267

目前,网络在提高链路速率的同时出现了大量的新协议及新服务,而传统的网络设备一般采用专用硬件芯片或者基于纯粹的软件方案,很难兼顾性能与灵活性两方面的要求.为此,一种并行可编程的网络处理器被引入到路由器(交换机)的处理层面.它基于ASIP技术对网络程序处理进行了优化,同时还兼有硬件和软件两种方案的特点.网络处理器的出现将经典的"存储-转发"结构变为"存储-处理-转发",这为复杂的QoS控制和负载处理提供了可能.从网络处理器本身及其应用两个角度出发,介绍了相关的研究工作,分析了系统特点和面临的挑战,并展望其未来的发展方向. 相似文献

17.

Performance models for network processor design

Wolf T. Franklin M.A. 《Parallel and Distributed Systems, IEEE Transactions on》2006,17(6):548-561

To provide a variety of new and advanced communications services, computer networks are required to perform increasingly complex packet processing. This processing typically takes place on network routers and their associated components. An increasingly central component in router design is a chip-multiprocessor (CMP) referred to as "network processor" or NP. In addition to multiple processors, NPs have multiple forms of on-chip memory, various network and off-chip memory interfaces, and other specialized logic components such as CAMs (content addressable memories). The design space for NPs (e.g., number of processors, caches, cache sizes, etc.) is large due to the diverse workload, application requirements, and system characteristics. System design constraints relate to the maximum chip area and the power consumption that are permissible while achieving defined line rates and executing required packet functions. In this paper, an analytic performance model that captures the processing performance, chip area, and power consumption for a prototypical NP is developed and used to provide quantitative insights into system design trade offs. The model, parameterized with a networking application benchmark, provides the basis for the design of a scalable, high-performance network processor and presents insights into how best to configure the numerous design elements associated with NPs. 相似文献

18.

Predicting communication protocol performance on superscalar architectures using instruction dependency

《Performance Evaluation》2006,63(9-10):939-955

Increasing diversity in telecommunication workloads leads to greater complexity in communication protocols. This occurs as channel bandwidth rapidly increases. These factors result in larger computational loads for network processors that are increasingly turning to high performance microprocessor designs. This paper presents an analytical method for estimating the performance of instruction level parallel (ILP) processors executing network protocol processing applications. Instruction dependency information extracted while executing an application is used to calculate upper and lower bounds for throughput, measured in instructions per cycle (IPC). Results using UDP/TCP/IP applications show that the simulated IPC values fall between the analytically derived upper and lower bounds, validating the model. The analytical method is much less expensive than cycle-accurate simulation, but reveals similar throughput performance predictions. This allows the architectural design space for network superscalar processors to be explored more rapidly and comprehensively, to reveal the maximum IPC that is possible for a given application workload and the available hardware resources. 相似文献

19.

A single-chip multiprocessor for multimedia: the MVP 总被引：2，自引：0，他引：2

Guttag K. Gove R.J. Van Aken J.R. 《Computer Graphics and Applications, IEEE》1992,12(6):53-64

The multimedia video processor (MVP) architecture, which incorporates a variety of parallel processing techniques to deliver very high performance to a wide range of imaging and graphics applications, is described. The MVP combines, on a single semiconductor chip, multiple fully programmable processors with multiple data streams connected to shared RAMs through a crossbar network. Each of the independent processors can execute many operations in parallel every cycle. The architecture is scalable and supports different numbers of processors to meet the cost and performance requirements of different markets. MVP's target environment and the development of MVP are outlined 相似文献

20.

A sparse matrix algorithm on the Boolean vector machine

Robert A. Wagner Merrell L. Patrick 《Parallel Computing》1988,6(3):359-371

The Boolean Vector Machine (BVM) is a large network of extremely small processors with very small memories operating in SIMD mode using bit serial arithmetic. Individual processors communicate via a hardware implementation of the Cube Connected Cycles (CCC) network. A prototype BVM with 2048 processing elements, each with 200 binary bits of memory, is currently being built using VLSI technology.

The BVM's bit-serial arithmetic and the small memories of individual processors are apparently a drawback to its effectiveness when applied to large numerical problems. In this paper we analyze an implementation of a basic matrix-vector iteration algorithm for sparse matrices on the BVM. We show that a 2²⁰ Pe BVM can deliver over 1 billion (10⁹) useful floating-point operations per second for this problem. The algorithm is expressed in a new language (BVL) which has been defined for programming the BVM. 相似文献