期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李金库张德运高磊《微电子学与计算机》2005,22(12):28-32

针对网络处理器的核心问题一微处理器并行调度，以Intel IXP2400网络处理器为例．提出并实现了一种基于线程池的两阶段微处理器调度算法TS—MPSBPT。该算法将IPv4转发系统分成两个阶段．阶段内微处理器采用线程池工作模式，既解决了微指令空间不足问题，又可以充分发挥系统并行处理特性：该算法通过判断微处理器的空闲线程数和已处理数据包字节数，将数据包分配给线程池中负载最轻的微处理器。实现算法的负载均衡原则：通过将同类型m包或同一个流IP包分配给同一个微处理器，提高eache命中率和局部内存利用率．实现算法的局部性原则。实验结果表明，采用TS—MPSBPT算法的IPv4转发系统负载均衡，与Radisvs公司原IPv4转发程序相比．新系统包转发速率有较大提高．在包长度较小时效果尤其显著。相似文献

2.

基于EPIC的动态多线程微体系结构技术

成玉马卓张民选《微电子学与计算机》2004,21(12):4-8

本文研究如何在EPIC技术的基础上，融合开发线程级并行的多线程技术，为新一代处理器微体系结构作一些基础性研究。相似文献

3.

多核环境中的高效率调试方法

韩青《今日电子》2007,(7):70-72

毫无疑问,多核多线程是未来处理器的发展方向.回首处理器的发展历程,并行技术从指令级的超标量发展到线程级的超线程或者并发多线程,再到今天处理器级的多内核,总的趋势都没有改变. 相似文献

4.

利用硬件抽象机模拟执行技术设计JAVA处理器

王海晨赵祥模《微电子学与计算机》2011,28(1):104-107

提出了一个基于硬件抽象机的流水线微处理器设计框架,创造性地使用了一种基于标签结构的模拟执行技术.基于这一框架,描述了一个堆栈抽象机的工作原理,实现了一个Java指令级并行处理器.利用堆栈硬件抽象机和堆栈指令折叠技术的组合解决了Java处理器中的堆栈依赖瓶颈问题.软件模拟证明了该处理器能够最大限度地挖掘出Java程序中的指令级并行,并且拥有更高的处理能力. 相似文献

5.

多核DSP提升RNC分组处理能力

姚钢《电子设计技术》2008,15(12):46-46

由于与频率提高相关的功耗／散热问题日益突出,指令级并行架构（ILP）及存储能力已近极限,硅芯片已难以支撑处理器性能的大幅度提升。在单芯片上集成多个核,每个核同时处理多条线程而非不断提高处理器时钟速度,已是业界共识。TI认为,通过改进无线网络控制器（RNC）的分组处理功能,是满足无线网络数据及语音流量大幅增长以及应用多样性需求的可行之道。相似文献

6.

网络处理器Intel IXP1200应用

张钢钢白英杰《电子产品世界》2001,(9):64-65

网络处理器是近年来新出现的一类专用于网络通信设备中的微处理器芯片,它综合了RISC芯片和ASIC的优点。本文介绍了网络处理器技术和lntel公司IXP1200芯片的结构及应用特点,同时分析了提高处理性能的关键。相似文献

7.

网络处理器Intel IXP1200应用 总被引：3，自引：0，他引：3

张钢钢白英杰徐媛《电子产品世界》2001,(17):64-65

网络处理器是近年来新出现的一类专用于网络通信设备中的微处理器芯片,它综合了RISC芯片和ASIC的优点.本文介绍了网络处理器技术和Intel公司IXP1200芯片的结构及应用特点,同时分析了提高处理性能的关键. 相似文献

8.

基于NUMA MPSoC的FFT并行化算法设计及实现

张冰杜高明李丽杨盛光《微电子学与计算机》2007,24(12):109-112

如何充分利用多个处理器任务级并行或线程级并行的特点提高性能已成为MPSoC设计的关键问题之一。在建立基于非均匀存储型（Non—UniformMemoryAccessArchitecture，NUMA）MPSoC平台的基础上，以快速傅里叶变换为例．遵循减少核间通讯及平均分配工作负载的原则，提出其并行化方法，设计出相应的并行程序及底层驱动．在FPGA原型芯片的运行环境下分析系统性能。试验结果表明，在4核MPSoC的FPGA原型系统中最高加速比可达2．65．具有较好的并行执行效率。相似文献

9.

基于SEP32031处理器的ARM-μ Clinux中断处理技术的研究与实现

下载免费PDF全文

邹志烽王学香张宇《电子器件》2007,30(2):654-657

在现代嵌入式操作系统中,中断处理技术是一项重要的技术.通过中断技术,使得处理器能够和外设并行地工作,提高了CPU的执行效率.首先结合了软件和硬件两个方面分析arm-μ clinux的中断处理技术,然后基于SEP3203微处理器介绍了中断向量表的软件实现,最后通过网络设备中断例程的注册阐述了如何实现用户中断例程的安装. 相似文献

10.

基于网络处理器的路由器体系结构 总被引：2，自引：0，他引：2

张途赫志勇《电信交换》2003,(1):12-17

目前许多半导体厂商开始销售一种称为网络处理器的芯片。网络处理器和通用微处理器很相似，但在报文处理能力方面作了优化，从而特点适合于网络通信设备。文章从路由器的体系结构出发，对传统通用处理器和网络处理器实现报文处理的方法进行了比较，最后对网络处理器转发引擎作了详细的分析。相似文献

11.

基于网络处理器的队列管理和队列调度分析

杨中亮游军玲钱华林葛敬国《微电子学与计算机》2008,25(3):21-24

对利用网络处理器实现队列操作进行了研究.通过队列管理和队列调度在IntelIXP2805网络处理器上的实现,验证了服务质量机制在网络处理器平台上的可行性.实践证明,队列操作基本满足对数据包线速处理的要求,网络处理器硬件资源利用率较高. 相似文献

12.

粗粒度多核系统任务级多线程调度研究

张多利陈楠汪杨宋宇鲲《微电子学与计算机》2020,(1):46-52

多核系统是当今处理器发展的主方向,如何合理高效进行任务调度,确保全部处理核心处于有效工作状态是当今多核系统研究的一个重要方向.多核任务调度的关键难点在于发掘任务并行性,为解决这一问题,本文借鉴指令级多线程思想,结合多核系统中任务的粗粒度特性,提出了一种新型的粗粒度多线程多核体系结构,建立了多线程取指策略、资源分配策略和线程切换机制,同步完成了这一结构多线程调度器电路设计.围绕此调度器构建了一个粗粒度多核计算平台,并在FPGA芯片上进行硬件实现,实验结果表明,该设计方案相对于单线程使多核计算平台的任务并行度平均提高约34.29%. 相似文献

13.

基于ADSP-TS201的多DSP并行系统 总被引：1，自引：1，他引：0

黄瑞皮兴宇《现代电子技术》2006,29(21):37-39

介绍了一种基于ADSP TS201的多DSP并行系统设计与实现,以及其作为软件无线电平台的实际应用。重点分析了多DSP并行系统实现中的主要关键技术,即多DSP并行网络结构的设计、并行系统的控制管理软件的实现以及并行解调处理模块的任务分配。通过具体应用说明,该多DSP并行处理系统充分地体现了软件无线电的基本思想———模块化、扩展性和软件加载能力,作为软件无线电的通用硬件处理平台具备易修改、易维护的特点。相似文献

14.

Load Balancing for Parallel Forwarding 总被引：1，自引：0，他引：1

《Networking, IEEE/ACM Transactions on》2005,13(4):790-801

Workload distribution is critical to the performance of network processor based parallel forwarding systems. Scheduling schemes that operate at the packet level, e.g., round-robin, cannot preserve packet-ordering within individual TCP connections. Moreover, these schemes create duplicate information in processor caches and therefore are inefficient in resource utilization. Hashing operates at the flow level and is naturally able to maintain per-connection packet ordering; besides, it does not pollute caches. A pure hash-based system, however, cannot balance processor load in the face of highly skewed flow-size distributions in the Internet; usually, adaptive methods are needed. In this paper, based on measurements of Internet traffic, we examine the sources of load imbalance in hash-based scheduling schemes. We prove that under certain Zipf-like flow-size distributions, hashing alone is not able to balance workload. We introduce a new metric to quantify the effects of adaptive load balancing on overall forwarding performance. To achieve both load balancing and efficient system resource utilization, we propose a scheduling scheme that classifies Internet flows into two categories: the aggressive and the normal, and applies different scheduling policies to the two classes of flows. Compared with most state-of-the-art parallel forwarding schemes, our work exploits flow-level Internet traffic characteristics. 相似文献

15.

System design for pixel-parallel image processing

Gealow J.C. Herrmann F.P. Hsu L.T. Sodini C.G. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1996,4(1):32-41

A system design for performing low-level image processing tasks in real time is presented. The design is based on large processor-per-pixel arrays implemented using integrated circuit technology. Two integrated circuit architectures are summarized: an associative parallel processor and a parallel processor employing DRAM cells. In both architectures, the layout pitch of one-bit-wide logic is matched to the pitch of memory cells to form high-density processing element arrays. The system design features an efficient control path implementation, providing high processing element array utilization without demanding complex controller hardware. Sequences of array instructions are generated by a host computer before processing begins, then stored in a simple controller. Once processing begins, the host computer initiates stored sequences to perform pixel-parallel operations. A programming framework implemented using the C++ programming language supports application development. A prototype system employs associative parallel processor devices, a controller, and the programming framework. Three sample applications, smoothing and segmentation, median filtering, and optical flow, establish the suitability of the system for real-time image processing 相似文献

16.

Runtime Support for Multicore Packet Processing Systems

Wolf T. Ning Weng 《IEEE network》2007,21(4):29-37

Network processors promise a flexible, programmable packet processing infrastructure for network systems. To make full use of the capabilities of network processors, it is imperative to provide the ability to dynamically adapt to changing traffic patterns in the form of a network processor runtime system. The differences from existing operating systems and the main challenges lie in the multiprocessor nature of NPs, their on-chip resource constraints, and real-time processing requirements. In this article we explore the key design trade-offs that need to be considered when designing a network processor operating system. In particular, we explore the performance impact of application analysis on partitioning, traffic characterization, workload mapping, and runtime adaptation. We present and discuss qualitative and quantitative results in the context of a particular application analysis and mapping framework. The observations and conclusions are generally applicable to any runtime environment for network processors. 相似文献

17.

在图像处理应用中几种并行计算技术的比较

杨伟健姚庆栋《信号处理》2000,16(4):367-371

本文讨论了三种采用不同并行技术的代表处理器芯片在图像处理底层算法中的性能以及资源利用率的问题，给出芯片系列内部纵向和系列间的横向比较结果，可看出阵列并行处理在图像处理底层方面的潜力以及研制多媒体芯片如何综合选用并行处理技术。相似文献

18.

多核DSP信号处理并行设计

下载免费PDF全文

夏际金常越梁之勇宋皓《雷达科学与技术》2013,11(6):617-620

并行计算是实现高性能计算的一个重要发展方向。随着信号处理、通信等领域对处理能力需求的不断提升,DSP的并行开发技术也得到了较快发展。多器件并行和片上多核的方法可以有效提高处理性能。多核并行处理相对于传统单核DSP要进行多任务并行设计,使系统设计更加复杂。文中在探讨了利用8核处理器进行信号处理开发的关键技术的基础上,采用Round—Robin方式设计了一种多核并行信号处理模式,并对多核的同步、Cache一致性、任务并行分配等进行了论述。相似文献

19.

Optimal Periodic Memory Allocation for Image Processing With Multiple Windows

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(3):403-416

One major issue in designing image processors is to design a memory system that supports parallel access with a simple interconnection network. This paper presents an efficient memory allocation to minimize the number of memory modules and processing elements with a parallel access capability when multiple windows with arbitrary shapes are specified. This paper also presents an efficient search method based on regularity of window-type image processing. We give some practical examples including a stereo-matching processor for acquiring 3-D information, and an optical-flow processor for motion estimation. These examples show that the numbers of memory modules are reduced to 2.7% and 10%, respectively, in comparison with a basic approach. It is also shown that the search time is less than 1 ms for practical image sizes and window sizes. 相似文献

20.

A Parallel Architecture for Network Control and Mobility Tracking in Wireless Systems

Asthana Abhaya Krzyzanowski Paul 《Wireless Personal Communications》1997,4(2):237-256

In a wireless system the network logically rearranges itself rapidly whenever terminals move from cell to cell. This ability to adapt itself to changing locations of its terminals adds a new layer of complexity to wireless control software. With ever increasing demand for more capacity and the addition of new service features, many limitations and bottlenecks in the underlying network infrastructure are uncovered. While distributed architectures provide a method for increasing processing capacity, they also introduce concerns regarding reliability, communication latency and cost. In this paper we have attempted to combine the significant characteristics of both fixed and distributed architectures in a single system. Specifically, we present the design of a wireless hub processor, based on a communications oriented active memory technology, and illustrate how the procedures for mobility management, resource management and call processing map on to such a parallel architecture. A key attribute of the architecture is that it scales in processing capacity and size, while maintaining a common locus of control for administration, maintenance and reliability. Finally, we present an example of a navigation application to validate the architecture. This example shows how a roving computer or PDA connected to a global positioning system and having a wireless communication link can deal with a low bandwidth link to a server. The server provides user-tailored map data. An active memory architecture not only provides the server with a scalable architecture but also aids the client. 相似文献