首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we propose a novel reconfigurable processor using dynamically partitioned single‐instruction multiple‐data (DP‐SIMD) which is able to process multimedia data. The SIMD processor and parallel SIMD (P‐SIMD) processor, which is composed of a number of SIMD processors, are usually used these days. But these processors are inefficient because all processing units (PUs) should process the same operations all the time. Moreover, the PUs can process different operations only when every SIMD group operation is predefined. We propose a processor control method which can partition parallel processors into multiple SIMD‐based processors dynamically to enhance efficiency. For performance evaluation of the proposed method, we carried out the inverse transform, inverse quantization, and motion compensation operations of H.264 using processors based on SIMD, P‐SIMD, and DP‐SIMD. Experimental results show that the DP‐SIMD control method is more efficient than SIMD and P‐SIMD control methods by about 15% and 14%, respectively.  相似文献   

2.
针对可重构密码处理器对于不同域上的序列密码算法兼容性差、实现性能低的问题,该文分析了序列密码算法的多级并行性并提出了一种反馈移位寄存器(FSR)的预抽取更新模型。进而基于该模型设计了面向密码阵列架构的可重构反馈移位寄存器运算单元(RFAU),兼容不同有限域上序列密码算法的同时,采取并行抽取和流水处理策略开发了序列密码算法的反馈移位寄存器级并行性,从而有效提升了粗粒度可重构阵列(CGRA)平台上序列密码算法的处理性能。实验结果表明与其他可重构处理器相比,对于有限域(GF)(2)上的序列密码算法,RFAU带来的性能提升为23%~186%;对于GF(2u)域上的序列密码算法,性能提升达约66%~79%,且面积效率提升约64%~91%。  相似文献   

3.
Reconfigurable array processors have emerged as powerful solution to speed up computationally intensive applications. However, they may suffer from a data access bottleneck as the frequency of memory access rises. At present, the distributed cache design in the reconfigurable array processor has a large cache failure rate, and the frequent access to external memory leads to a long delay in memory access. To mitigate this problem, we present a Runtime Dynamically Migration Mechanism (RDMM) of distributed cache for reconfigurable array processor based on the feature of obvious locality and high parallelism in accessing data. This mechanism allows temporary, static data to be dynamically scheduled to migrate data with a high access frequency from the remote cache to the processor's local migration storage table based on how often the reconfigurable array processors access the remote cache. We can accurately get the data on the shortest path by way of data search strategy based on migration storage tables, thereby effectively reducing the access delay of the entire system, increasing the memory bandwidth of the reconfigurable array processor. We leverage the hardware platform of reconfigurable array processor to test the proposed mechanism. The experimental results show that RDMM reduces access delay by up to 35.24% compared with the tradition distributed cache at the highest conflict rate. And compared with the Ref.[19], Ref.[20], Ref.[21] and Ref.[23], the working frequency can be increased by 15%, the hit rate can be increased by 6.1%, and the peak bandwidth can be increased by about 3×.  相似文献   

4.
对多区结构网格大规模CFD流场模拟的高效并行方法进行了研究,以天河超级计算机平台的CPU同构计算环境和CPU+MIC异构计算环境为例,重点讨论了CFD应用特点与超级计算机运行环境相适应的性能优化与改进策略,发展了一系列多层次并行与性能优化方法.通过在天河2高性能计算平台上进行了多个算例的数值模拟,验证了这些优化方法的并行效果;在CPU+MIC异构平台上模拟的最大CFD问题规模达到6800亿个网格单元,共使用137.6万CPU+MIC处理器核,测试结果表明在CPU+MIC异构平台上移植优化后的程序性能提高2.6倍左右,且具有良好的可扩展性.  相似文献   

5.
可重构处理器阵列的系统级建模研究   总被引:1,自引:1,他引:0  
由于粗粒度可重构体系结构设计空间复杂,设计满足应用需求的CGRA需要建立系统级仿真模型进行性能评估.文中提出一种可重构处理器阵列的系统级模型,使用SystemC事务级语言实现建模.模型采用多层互连网络结构实现任意2个处理器间的通信,并且处理器的资源能够通过参数快速地进行配置.仿真实验表明,模型适用于应用算法到粗粒度可重构体系结构映射的模拟仿真.  相似文献   

6.
It’s a promising way to improve performance significantly by adding reconfigurable processing unit (RPU) to a general purpose processor. In this paper, a Reconfigurable Multi-Core (RMC) architecture combining general multi-core and reconfigurable logic is proposed. Reconfigurable logic is separated into RPUs logically, which are coupled with general purpose cores as co-processors via a full crossbar switch. An RPU Manager (RPU-M) is also designed to manage RPUs. To verify RMC, a simulation method based on the Simics and Virtex 5 FPGA is adopted, which simplifies the simulation and assures the evaluation accuracy of hardware function cores. Five workloads are selected to test RMC, including 3-DES, AES, SHA2, IDCT and JPEG_ENC. The experimental results show a 3.10 times average speedup over software implementation on the original multi-core, and the data and control communication overhead on RMC is acceptable.  相似文献   

7.
以一种动态可一阵作为整体结构,同时把所有的接口与控制电路集成在几片现场可编程门阵列(FPGA)芯片中,实现了一种高性能红外图像制导处理机。在这个系统框架中高地 一些针对特定算法模块的专用处理器,如实现噪声白化和去除面目目标的单帧预处理器,单帧点目标粗检处理器等,这些处理器都能够集成在单片FPGA中,这种处理机具有模块化,接口标准化、结构灵活、易扩展的特点。  相似文献   

8.
Emerging hybrid reconfigurable platforms tightly couple capable processors with high performance reconfigurable fabrics. This promises to move the focus of reconfigurable computing systems from static accelerators to a more software oriented view, where reconfiguration is a key enabler for exploiting the available resources. This requires a revised look at how to manage the execution of such hardware tasks within a processor-based system, and in doing so, how to virtualize the resources to ensure isolation and predictability. This view is further supported by trends towards amalgamation of computation in the automotive and avionics domains, where such properties are essential to overall system reliability. We present the virtualized execution and management of software and hardware tasks using a microkernel-based hypervisor running on a commercial hybrid computing platform (the Xilinx Zynq). The CODEZERO hypervisor has been modified to leverage the capabilities of the FPGA fabric, with support for discrete hardware accelerators, dynamically reconfigurable regions, and regions of virtual fabric. We characterise the communication overheads in such a hybrid system to motivate the importance of lean management, before quantifying the context switch overhead of the hypervisor approach. We then compare the resulting idle time for a standard Linux implementation and the proposed hypervisor method, showing two orders of magnitude improved performance with the hypervisor.  相似文献   

9.
Even in the face of increasing network bandwidth, there is a desire among service providers to improve network security, availability, and performance. These improvements require increasingly complex computations on network packets. Current networking platforms cannot keep up, leading to less than desired throughput or functionality. Network processors deliver high networking throughput, but not the complex processing capabilities required. High-performance general-purpose processors deliver the complex processing needed, but not the network throughput. Combination platforms that include high-performance general-purpose CPUs and network processors hold the promise of greatly increasing platform performance, enabling desired edge application improvements. This article presents Twin Cities, a heterogeneous multiprocessor research platform we have constructed from a standard IXP1240 platform, a high-volume Intel/spl reg/ Pentium/spl reg/ III processor platform, and custom hardware. This platform provides a high-performance path (high throughput, low latency) between the two processors and presents a shared memory model to the programmer. We motivate and describe the Twin Cities platform, discuss the applications it targets, and present performance measurements.  相似文献   

10.
高远  何赞园  李静岩 《电讯技术》2023,63(5):688-694
随着国产处理器的不断成熟以及国家信息化建设的自主可控要求提高,基于国产处理器的先进通信计算机架构(Advanced Telecommunications Compute Architecture, ATCA)的产品平台设计成为潮流和趋势。FT-2000/4国产处理器凭借其稳定的性能表现被广泛应用于自主可控产品的设计中,但是ATCA平台下,FT-2000/4国产处理器由于传统网络I/O的过高开销,往往无法满足高速线路接入需求,极大限制了其应用场景。针对上述问题,提出了一种基于DPDK软件数据包加速处理技术的数据接入模块设计,并且改进了数据分发策略,提出了一种自适应流量控制算法。实验表明,该模块不仅极大提升了国产化ATCA单板的数据接入能力,自适应流量控制算法也能够通过动态调整数据匹配分发端口的方式,有效缓解流分发策略不合理带来的业务节点超负荷问题。  相似文献   

11.
This paper presents performance improvements and energy savings from mapping real-world benchmarks on an embedded single-chip platform that includes coarse-grained reconfigurable logic with a microprocessor. The reconfigurable hardware is a 2-D array of processing elements connected with a mesh-like network. Analytical results derived from mapping seven real-life digital signal processing applications, with the aid of an automated design flow, on six different instances of the system architecture are presented. Significant overall application speedups relative to an all-software solution, ranging from 1.81 to 3.99 are reported being close to theoretical speedup bounds. Additionally, the energy savings range from 43% to 71%. Finally, a comparison with a system coupling a microprocessor with a very long instruction word core shows that the microprocessor/coarse-grained reconfigurable array platform is more efficient in terms of performance and energy consumption.  相似文献   

12.
DReAC:一种新型动态可重构协处理器   总被引:1,自引:1,他引:0       下载免费PDF全文
本文提出了一种应用于数据并行和高密度计算任务的新型动态可重构协处理器——DReAC.DReAC可以独立地以并行或流水工作模式重构协处理器内部数据路径,完成主处理器分配的任务.DReAC由全局控制器、计算阵列和阵列数据缓冲区三部分组成.文中简要介绍了DReAC系统模型,并使用该模型模拟了部份典型算法在DReAC中的实现.仿真结果表明,在典型的多媒体和信号处理应用中,DReAC能够达到通用处理器的10倍以上的速度,甚至在某些应用中远优于其他可重构处理器的性能.  相似文献   

13.
The B3G concept can be realized in two complementary ways. The first solution is the integration of the diverse radio access technologies into one composite radio environment. The alternative solution is provided by the concept of reconfigurable (adaptive) networks. Composite radio networks, sometimes also referred to as cooperative networks, jointly handle a difficult condition. Reconfigurable networks on the other hand, support B3G Systems by providing technologies that enable network elements and terminals to dynamically adapt to the environment requirements and conditions, in principle, by means of self-management. This paper provides proof on the business advantages of reconfigurable networks. In this context the paper performs an evaluation of the investment in both composite radio and reconfigurable networks, presenting a methodology that can be used for the financial assessment of such networks by applying investment appraisal techniques. Concrete results for both cases are presented and analyzed. The analysis clearly proves that reconfigurable networks can provide significant business benefits for network operators.  相似文献   

14.
Network processors (NPs) are emerging as very promising platforms for developing reconfigurable and high‐performance network devices, due to their capability to combine the flexibility of general‐purpose processors with the high‐performance features of hardware‐based systems. They represent the most suitable solutions for implementing complex and dynamic tasks, such as packet classification and scheduling, which are key operations, for example, in DS networks. Programmability and reconfigurability allow NP‐based devices to be continuously adapted to the new network requirements, obtaining a high time in market. This paper illustrates the compound process that leads to the implementation of a reconfigurable multidimensional packet filtering on the Intel® IXP2400 NP. The multidimensional multibit trie is chosen as the best algorithm to be implemented and it is modified to exploit the specific features of NP. The different tasks are mapped on the NP computational resources and an optimized implementation is performed, with subsequent experimental validation. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

15.
A heterogeneous reconfigurable platform enables the flexible implementation of baseband wireless functions at energy levels between 10 and 100 MOPS/mW, six times higher than traditional digital signal processors. A 5.2 mm×6.7 mm prototype processor, targeted for voice compression, is implemented in a 0.25-μm 6-metal CMOS process, and consumes 1.8 mW at an average operation rate of 40 MHz. It combines an embedded microprocessor with an array of computational units of different granularities, connected by a hierarchical reconfigurable interconnect network  相似文献   

16.
We define portable reconfigurable computing platforms as those which have some form of configurable logic coupled with other on-chip or off-chip processing units such as soft processors, embedded processors, and voltage-scalable processors. In the first part of this paper, we present and test a unique methodology where we dynamically change the active area of a field programmable gate array (FPGA) to vary the battery usage and lifetime of the system, by running it on several different taskgraph structures and report an average of 14% and as high as 21%, less battery capacity used, as compared to nonoptimal execution. In the second part of this paper, we integrate the above methodology with more traditional voltage and frequency scaling techniques for portable systems and present a heuristic iterative algorithm for single and multiple processing units. The iterative heuristic algorithm finds a sequence of tasks along with an appropriate design point (implementation option) for each task, such that a deadline is met and the amount of battery energy used is as small as possible. We have used several real-world benchmarks to test the effectiveness of this methodology and we will present the results.  相似文献   

17.
In this paper, we propose a configuration-aware data-partitioning approach for reconfigurable computing. We show how the reconfiguration overhead impacts the data-partitioning process. Moreover, we explore the system-level power-performance tradeoffs available when implementing streaming embedded applications on fine-grained reconfigurable architectures. For a certain group of streaming applications, we show that an efficient hardware/software partitioning algorithm is required when targeting low power. However, if the application objective is performance, then we propose the use of dynamically reconfigurable architectures. We propose a design methodology that adapts the architecture and algorithms to the application requirements. The methodology has been proven to work on a real research platform based on Xilinx devices. Finally, we have applied our methodology and algorithms to the case study of image sharpening, which is required nowadays in digital cameras and mobile phones.  相似文献   

18.
介绍了一种基于软件无线电平台的重构加载方法,通过研究可重构软件无线电硬件体系结构,FPGA可执行设备重构加载原理、协议及Davinci系列处理器高速并行外部存储器接口(EMIF),提出了一种基于DSP+ FPGA的重构加载方案,实现了FPGA设备驱动和重构加载软件设计.实验结果表明,软件无线电重构加载方案可高速、准确、可靠地完成波形文件重构加载及不同通信模式的无缝切换.  相似文献   

19.
The virtual path (VP) concept has been gaining attention in terms of effective deployment of asynchronous transfer mode (ATM) networks in recent years. In a recent paper, we outlined a framework and models for network design and management of dynamically reconfigurable ATM networks based on the virtual path concept from a network planning and management perspective. Our approach has been based on statistical multiplexing of traffic within a traffic class by using a virtual path for the class and deterministic multiplexing of different virtual paths, and on providing dynamic bandwidth and reconfigurability through virtual path concept depending on traffic load during the course of the day. In this paper, we discuss in detail, a multi-hour, multi-traffic class network (capacity) design model for providing specified quality-of-service in such dynamically reconfigurable networks. This is done based on the observation that statistical multiplexing of virtual circuits for a traffic class in a virtual path, and the deterministic multiplexing of different virtual paths leads to decoupling of the network dimensioning problem into the bandwidth estimation problem and the combined virtual path routing and capacity design problem. We discuss how bandwidth estimation can be done, then how the design problem can be solved by a decomposition algorithm by looking at the dual problem and using subgradient optimization. We provide computational results for realistic network traffic data to show the effectiveness of our approach. We show for the test problems considered, our approach does between 6% to 20% better than a local shortest-path heuristic. We also show that considering network dynamism through variation of traffic during the course of a day by doing dynamic bandwidth and virtual path reconfiguration can save between 10% and 14% in network design costs compared to a static network based on maximum busy hour traffic  相似文献   

20.
We propose an extension to the Energy-Delay-Product (EDP) metric to compare different processors considering not only their energy consumption and execution time but also reliability. The Energy-Delay-FIT Product (EDFP) allows a pragmatic evaluation of the most suitable device to run an application.We consider three representative benchmarks and apply EDFP to compare Intel Xeon-Phi co-processors, NVIDIA K40 Graphics Processing Units (GPUs), and AMD Kaveri Accelerated Processing Units (APUs). Our results show that HPC processors have higher power consumption and are more prone to be corrupted than APUs. However, the overall trade-off is attenuated by HPC processors efficiency, which makes them the most suitable candidates for the great majority of the considered applications.Additionally, we use EDFP to compare optimized and naive implementations of three benchmarks as executed on NVIDIA GPUs. Our results show that the naive implementation has generally better EDFP only for small input sizes while the optimized implementations are more efficient and reliable once the GPU resources are saturated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号