首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
IXP2400网络处理器及其微引擎中多线程实现的研究   总被引:2,自引:0,他引:2  
网络处理器兼顾了ASIC的高性能和RISC芯片的可编程灵活性,能较好地满足数据通信高速发展的要求,在将来的网络设备中,有广阔的应用前景。IXP2400是Intel公司推出的第二代网络处理器。它采用了高性能的并行体系结构来处理复杂的算法、包内容检测、流量管理和线速转发。多线程技术是IXP2400实现高速数据处理的关键技术。该文介绍了IXP2400的硬件结构及软件开发,并分析了其微引擎中多线程实现的有关技术。  相似文献   

2.
32位多线程包处理微引擎的设计   总被引:1,自引:0,他引:1  
硬件多线程技术是网络处理器中的核心技术,本文介绍了一个专门面向网络协议处理的硬件多线程包处理微引擎NRS05的设计,详细介绍了其流水线的整体结构,提出了一种基于混合多线程的动态调度策略实现了长延时操作的隐藏,保证单线程性能能够满足应用需求的同时保证了各线程在执行核上运行的公平性,并将多线程技术和流水线技术进行了结合,解决了传统处理器中指令间因控制相关导致的流水线停顿问题,最后给出了设计的综合结果及包处理性能.  相似文献   

3.
网络处理器的高速处理和灵活的可编程性,使它成为当今网络中数据处理的有效解决方案。本文深入探讨网络处理器的软件开发模型。首先,介绍Intel IXP2400网络处理器硬件结构和软件开发平台,然后给出基于网络处理器的路由转发系统的设计实例,阐述网络处理器开发的关键环节,最后提出网络处理器软件开发所面临的主要问题和挑战。  相似文献   

4.
IXP2400是Intel公司生产的第二代网络处理器,主要应用于开发高性能、可扩展的网络设备。在IXP2400网络处理器的基础上,分析研究了包处理技术在多线程的环境下面临的两个关键的问题——同步和包排序。  相似文献   

5.
网络处理器是一种支持高速报文处理和转发的可编程通信集成电路.作为路由器中的重要组件,网络处理器设计不但强调高性能,还要求足够的灵活性以支持未来的网络协议.针对控制流网络处理器固定拓扑结构及指令级并行性开发方面的不足,采用粗粒度数据流设计思想,提出了一种粗粒度数据流网络处理器体系结构及原型--DynaNP.DynaNP不但可利用处理引擎内控制流执行方式获得高可编程性,还利用处理引擎间数据流执行方式有效开发报文处理中的任务级并行性.此外,DynaNP提供了处理路径动态配置机制,可有效提高系统流量.DynaNP的原型系统基于SoPC技术设计实现.多个PE和功能模块通过片上高速通信网络连接,其中,核心处理引擎采用嵌入式RISC处理器核LEON3实现,并采用指令集扩展技术优化网络协议处理.该原型系统可有效验证粗粒度数据流网络处理器的功能和关键技术.  相似文献   

6.
陈军 《福建电脑》2005,(3):20-21,23
MPLS是下一代宽带网络的核心技术之一,它在无连接的IP网络中引入面向连接的机制,提供IP QoS保证;同时在第二层和第三层支持“多协议”。网络处理器兼顾了ASIC的高性能和RISC芯片的可编程灵活性,能较好满足数据通信高速发展的要求。IXP2400是Intel公司推出的第二代网络处理器。它采用了高性能的并行体系结构来实现包的线速转发。本文介绍了基于IXP2400的MPLS转发实现的有关技术。  相似文献   

7.
主要对千兆通讯的网络处理芯片IXP1200网络处理器进行研究和分析,着重探讨和研究其先进的多级并行设计机制。主要从体系结构和并行设计技术两个角度对IXP1200网络处理器的数控分层和多层次并行等设计机制进行了介绍。突出了其利用多线程、多处理器的先进设计结构来优化设计、提高处理速度的设计理念和实现过程,并在最后进一步详细讨论了如何利用特定微码指令来实现IXP1200网络处理器的指令并行和多线程并行的程序调度方法和设计技术。  相似文献   

8.
随着网络用户数据量的增长和网络带宽的增加,对数据处理能力的要求也在提高,这就不仅要求高性能的应用系统平台,还要有与之相配套的高性价比的安全平台。基于NP的网络处理平台以其高性能和灵活性,有效地解决这样的问题。网络处理器通过灵活的软件体系提供硬件级的处理性能,其特点是:NP针对数据分组处理,采用优化体系结构、专用指令集、硬件单元,满足高速数据分组线速处理要求;具有软件编程能力,能够迅速实现新的标准、服务、应用,满足网络业务复杂多样化需求,灵活性好;设备具有软件升级能力,满足用户设备硬件投资保护需求。信息安全领域是…  相似文献   

9.
本文介绍了一种基于IXP2400网络处理器的防火墙设计方案。首先介绍了基于IXP2400网络处理器防火墙的工作原理;然后提出一种三层转发的安全转发模式防火墙的体系结构设计和具体的实现方案,设计中,引进多级处理设备和多线程的实现技术,保证整个系统的稳定性、各实现层次的独立性和安全性。  相似文献   

10.
基于IXP1200网络处理器的边缘路由器实现   总被引:1,自引:0,他引:1  
基于网络处理器的路由器开发是一个热点。文中介绍了基于网络处理器的路由器体系结构,分析了IXP1200网络处理器的硬件体系结构。最后介绍了一种基于IXP1200网络处理器的边缘路由器实现方案。  相似文献   

11.
网络处理器体系结构分析   总被引:5,自引:0,他引:5  
该文旨在分析网络处理器能够同时满足高性能和灵活性要求的体系结构。而传统的网络设备单纯采用专用芯片或者基于RISC的通用处理器(GPPs),很难兼顾这两者要求。该文根据网络处理器的处理空间,将其映射为5个逻辑模块,这些模块由网络处理器中各个功能部件实现。然后分析了网络处理器的SMP和Pipeline两种并行结构,并进一步分析了隐藏延迟等实现加速的技术。最后分析了网络应用发展变化对网络处理器体系结构设计的挑战,并提出了解决办法。  相似文献   

12.
网络处理器的分析与研究   总被引:54,自引:0,他引:54  
目前,网络在提高链路速率的同时出现了大量的新协议及新服务,而传统的网络设备一般采用专用硬件芯片或者基于纯粹的软件方案,很难兼顾性能与灵活性两方面的要求.为此,一种并行可编程的网络处理器被引入到路由器(交换机)的处理层面.它基于ASIP技术对网络程序处理进行了优化,同时还兼有硬件和软件两种方案的特点.网络处理器的出现将经典的"存储-转发"结构变为"存储-处理-转发",这为复杂的QoS控制和负载处理提供了可能.从网络处理器本身及其应用两个角度出发,介绍了相关的研究工作,分析了系统特点和面临的挑战,并展望其未来的发展方向.  相似文献   

13.
In this paper, we propose a new load distribution strategy called ‘send-and-receive’ for scheduling divisible loads, in a linear network of processors with communication delay. This strategy is designed to optimally utilize the network resources and thereby minimizes the processing time of entire processing load. A closed-form expression for optimal size of load fractions and processing time are derived when the processing load originates at processor located in boundary and interior of the network. A condition on processor and link speed is also derived to ensure that the processors are continuously engaged in load distributions. This paper also presents a parallel implementation of ‘digital watermarking problem’ on a personal computer-based Pentium Linear Network (PLN) topology. Experiments are carried out to study the performance of the proposed strategy and results are compared with other strategies found in literature.  相似文献   

14.
HOG特征是一种简单高效的常用来进行物体检测的特征描述子,广泛应用于行人检测等领域,然而在处理海量图片时却面临着严峻的性能挑战。解决方法之一就是通过使用"神威太湖之光"超级计算机的处理器节点对海量图像背景下的行人检测算法进行加速。主要采用了两种并行方案:一种是一个处理器同时处理4张图片,另一种是同时处理256张图片。大量的串行和并行处理的实验测试结果表明,对高分辨率多幅图像的并行处理可采用第一种方案,加速比可达83倍;对低分辨率图像可采用第二种方案,加速比最高可达到95。两种并行设计方案在"神威太湖之光"的多处理器节点上具有很好的可扩展性能。  相似文献   

15.
This paper examines measures for evaluating the performance of algorithms for single instruction stream–multiple data stream (SIMD) machines. The SIMD mode of parallelism involves using a large number of processors synchronized together. All processors execute the same instruction at the same time; however, each processor operates on a different data item. The complexity of parallel algorithms is, in general, a function of the machine size (number of processors), problem size, and type of interconnection network used to provide communications among the processors. Measures which quantify the effect of changing the machine-size/problem-size/network-type relationships are therefore needed. A number of such measures are presented and are applied to an example SIMD algorithm from the image processing problem domain. The measures discussed and compared include execution time, speed, parallel efficiency, overhead ratio, processor utilization, redundancy, cost effectiveness, speed-up of the parallel algorithm over the corresponding serial algorithm, and an additive measure called "sprice" which assigns a weighted value to computations and processors.  相似文献   

16.
针对流水线结构的网络处理器上包处理任务图的调度,提出带有负载平衡的最早完成时间调度算法EFT—LB,有效地平衡了处理器流水线中每个处理器的负载,增大了流水线系统的吞吐量。相关对比试验的结果表明系统性能有了明显提高。  相似文献   

17.
Distributed-memory parallel computers and networks of workstations (NOWs) both rely on efficient communication over increasingly high-speed networks. Software communication protocols are often the performance bottleneck. Several current and proposed parallel systems address this problem by dedicating one general-purpose processor in a symmetric multiprocessor (SMP) node specifically for protocol processing. This protocol processing convention reduces communication latency and increases effective bandwidth, but also reduces the peak performance since the dedicated processor no longer performs computation. In this paper, we study a parallel machine with SMP nodes and compare two protocol processing policies: the Fixed policy, which uses a dedicated protocol processor; and the Floating policy, where all processors perform both computation and protocol processing. The results from synthetic microbenchmarks and five macrobenchmarks show that: (i) a dedicated protocol processor benefits light-weight protocols much more than heavy-weight protocols, (ii) a dedicated protocol processor is generally advantageous when there are four or more processors per node, (iii) multiprocessor node performance is not as sensitive to interrupt overhead as uniprocessor node because a message arrival is likely to find an idle processor on a multiprocessor node, thereby eliminating interrupts, (iv) the system with the lowest cost-performance will include a dedicated protocol processor when interrupt overheads are much higher than protocol weight—as in light-weight protocols.  相似文献   

18.
Network processors are designed to handle the inherently parallel nature of network processing applications. However, partitioning and scheduling of application tasks and data allocation to reduce memory contention remain as major challenges in realizing the full performance potential of a given network processor. The large variety of processor architectures in use and the increasing complexity of network applications further aggravate the problem. This work proposes a novel framework, called FEADS, for automating the task of application partitioning and scheduling for network processors. FEADS uses the simulated annealing approach to perform design space exploration of application mapping onto processor resources. Further, it uses cyclic and r-periodic scheduling to achieve higher throughput schedules. To evaluate dynamic performance metrics such as throughput and resource utilization under realistic workloads, FEADS automatically generates a Petri net (PN) which models the application, architectural resources, mapping and the constructed schedule and their interaction. The throughput obtained by schedules constructed by FEADS is comparable to that obtained by manual scheduling for linear task flow graphs; for more complicated task graphs, FEADS’ schedules have a throughput which is upto 2.5 times higher compared to the manual schedules. Further, static scheduling of tasks results in an increase in throughput by upto 30% compared to an implementation of the same mapping without task scheduling.  相似文献   

19.
在机群系统中结点分配策略根据一定的原则为作业确定运行结点是提高系统性能的关键。通过对机群结点分配策略的研究,作者发现当前基于负载平衡自适应的结点分配策略为并行作业选择负载最轻的结点,这不利于系统性能的充分发挥。作者提出了一种新的自适应负载平衡结点分配算法:受限负载平衡结点分配。  相似文献   

20.
CP-PACS: A massively parallel processor at the University of Tsukuba   总被引:1,自引:0,他引:1  
Computational Physics by Parallel Array Computer System (CP-PACS) is a massively parallel processor developed and in full operation at the Center for Computational Physics at the University of Tsukuba. It is an MIMD machine with a distributed memory, equipped with 2048 processing units and 128 GB of main memory. The theoretical peak performance of CP-PACS is 614.4 Gflops. CP-PACS achieved 368.2 Gflops with the Linpack benchmark in 1996, which at that time was the fastest Gflops rating in the world.CP-PACS has two remarkable features. Pseudo Vector Processing feature (PVP-SW) on each node processor, which can perform high speed vector processing on a single chip superscalar microprocessor; and a 3-dimensional Hyper-Crossbar (3-D HXB) Interconnection network, which provides high speed and flexible communication among node processors.In this article, we present the overview of CP-PACS, the architectural topics, some details of hardware and support software, and several performance results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号