首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators   总被引:4,自引:0,他引:4  
The PICO-NPA system automatically synthesizes nonprogrammable accelerators (NPAs) to be used as co-processors for functions expressed as loop nests in C. The NPAs it generates consist of a synchronous array of one or more customized processor datapaths, their controller, local memory, and interfaces. The user, or a design space exploration tool that is a part of the full PICO system, identifies within the application a loop nest to be implemented as an NPA, and indicates the performance required of the NPA by specifying the number of processors and the number of machine cycles that each processor uses per iteration of the inner loop. PICO-NPA emits synthesizable HDL that defines the accelerator at the register transfer level (RTL). The system also modifies the user's application software to make use of the generated accelerator.The main objective of PICO-NPA is to reduce design cost and time, without significantly reducing design quality. Design of an NPA and its support software typically requires one or two weeks using PICO-NPA, which is a many-fold improvement over the industry norm. In addition, PICO-NPA can readily generate a wide-range of implementations with scalable performance from a single specification. In experimental comparison of NPAs of equivalent throughput, PICO-NPA designs are slightly more costly than hand-designed accelerators.Logic synthesis and place-and-route have been performed successfully on PICO-NPA designs, which have achieved high clock rates.  相似文献   

2.
Journal of Signal Processing Systems - The open-source hardware/software framework TaPaSCo aims to make reconfigurable computing on FPGAs more accessible to non-experts. To this end, it provides an...  相似文献   

3.
The explosive growth of the mobile multimedia industry has accentuated the need for efficient VLSI implementations of the associated computationally demanding signal processing algorithms. In particular, the short battery life caused by excessive power consumption of mobile devices has become the biggest obstacle facing truly mobile multimedia. We propose novel hardware accelerator architectures for two of the most computationally demanding algorithms of the MPEG-4 video compression standard––the forward and inverse shape adaptive discrete cosine transforms (SA-DCT/IDCT). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09 μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art.
Noel O’ConnorEmail:
  相似文献   

4.
杨炼  彭涛 《电视技术》2012,36(23):87-90,115
为了提供用户差异化服务体验,更好地保证端到端的QoS机制,EPS系统支持通过EPS承载对分组数据进行传送,有效地实现QoS参数之间的映射。通过对业务流模板(TFT)的研究,针对移动终端软件实现上行IP分组包与数据无线承载(DRB)的绑定过程中,存在影响终端速率和耗占系统资源的弊端,提出TFT功能硬化方案,提高IP分组包与UL_PF匹配速率,减少IP分组包与EPS承载的绑定及处理时间,最终实现IP数据在EPS承载上传输。  相似文献   

5.
Modular arithmetic is a building block for a variety of applications potentially supported on embedded systems. An approach to turn modular arithmetic more efficient is to identify algorithmic modifications that would enhance the parallelization of the target arithmetic in order to exploit the properties of parallel devices and platforms. The Residue Number System (RNS) introduces data-level parallelism, enabling the parallelization even for algorithms based on modular arithmetic with several data dependencies. However, the mapping of generic algorithms to full RNS-based implementations can be complex and the utilization of suitable hardware architectures that are scalable and adaptable to different demands is required. This paper proposes and discusses an architecture with scalability features for the parallel implementation of algorithms relying on modular arithmetic fully supported by the Residue Number System (RNS). The systematic mapping of a generic modular arithmetic algorithm to the architecture is presented. It can be applied as a high level synthesis step for an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA) design flow targeting modular arithmetic algorithms. An implementation with the Xilinx Virtex 4 and Altera Stratix II Field Programmable Gate Array (FPGA) technologies of the modular exponentiation and Elliptic Curve (EC) point multiplication, used in the Rivest-Shamir-Adleman (RSA) and (EC) cryptographic algorithms, suggests latency results in the same order of magnitude of the fastest hardware implementations of these operations known to date.  相似文献   

6.
ARISE introduces a systematic approach for extending once an embedded processor to support thereafter the coupling of an arbitrary number of custom computing units (CCUs). A CCU can be a hardwired or a reconfigurable unit, which can be utilized following a tight and/or loose model of computation. By selecting the appropriate model of computation for each part of the application, the complete application space is considered for acceleration, resulting in significant performance improvements. Also, ARISE offers modularity and scalability and is not restricted by the opcode space and operands limitation problems that exist in such type of machines. To support these features we introduce a machine organization that allows the cooperation of a processor and a set of CCUs. To control the CCUs we extend once the instruction set of the processor with eight instructions. To efficiently incorporate these features to an embedded processor, we propose a micro-architecture implementation that minimizes the control and communication overhead between the processor and the CCUs. To evaluate our proposal, we extended a MIPS processor with the ARISE infrastructure and implemented it on a Xilinx field-programmable gate array (FPGA). Implementation results, demonstrate that the timing model of the processor is not affected. Also, we implemented a set of benchmarks on the ARISE evaluation machine. Performance results prove significant improvements and reduced communication overhead compared to a typical coprocessor approach.  相似文献   

7.
A Cost-Efficient Scheduling Algorithm of On-Demand Broadcasts   总被引:3,自引:0,他引:3  
Sun  Weiwei  Shi  Weibin  Shi  Bole  Yu  Yijun 《Wireless Networks》2003,9(3):239-247
In mobile wireless systems data on air can be accessed by a large number of mobile users. Many of these applications including wireless internets and traffic information systems are pull-based, that is, they respond to on-demand user requests. In this paper, we study the scheduling problems of on-demand broadcast environments. Traditionally, the response time of the requests has been used as a performance measure. In this paper we consider the performance as the average cost of request composed of three kinds of costs – access time cost, tuning time cost, and cost of handling failure request. Our main contribution is a self-adaptive scheduling algorithm named LDFC, which computes the delay cost of data item as the priority of broadcast. It costs less compared with some previous algorithms in this context, and shows good adaptability as well even in pure push-based broadcasts.  相似文献   

8.
从《十八岁的天空》里气质清纯的校园美女蓝菲琳,到《神话》里为爱付出生命的吕素.金莎在戏剧上的表演功力已是游刃有余:从广为流传的《被风吹过的夏天》,到歌迷热捧的原创单曲《星月神话》.金莎在音乐道路上朝着“创作才女”的方向不懈努力。  相似文献   

9.
从<十八岁的天空>里气质清纯的校园美女蓝菲琳,到<神话>里为爱付出生命的吕素,金莎在戏剧上的表演功力已是游刃有余;从广为流传的<被风吹过的夏天>,到歌迷热捧的原创单曲<星月神话>,金莎在音乐道路上朝着"创作才女"的方向不懈努力.  相似文献   

10.
In future broadband fixed wireless access systems the overall design procedure is critical for their successful commercial deployment as well as their efficient operation and management. The problem addressed in this paper is twofold. Specifically, at a first phase the radio access network planning problem is addressed, which aims at finding the minimum-cost configuration of Access Point Transceivers (APTs) given thegeographical layout of the area to be covered. At the second phase, the interconnecting planning problem is addressed and aims at finding the minimum-cost configuration of the AccessPoint Controllers (APCs) and Inter-Working Units (IWUs) given the Access PointTransceivers layout. Both problems are formally defined, optimally formulated, and solved by computationally efficient heuristics. Finally, results are provided and subsequent conclusions are drawn.  相似文献   

11.
DS1963S及其在SHA中的应用   总被引:1,自引:0,他引:1  
在简要介绍了SHA算法 (一种数据加密算法 )的基础上 ,通过介绍1-Wire器件DS1963S的特性及工作原理 ,给出了DS1963S在SHA中的应用方法  相似文献   

12.
SHA1 IP的设计及速度优化   总被引:1,自引:0,他引:1  
论文简要介绍了SHA1算法的基本流程,并给出了一种硬件实现方案,文中着重介绍了提高IP的工作速度所采用的三种速度优化方案,并在文章的最后对速度优化的结果进行了比较,可以看出通过优化IP的工作速度得到了显著的提高。  相似文献   

13.
Journal of Signal Processing Systems - The wide landscape of memory-hungry and compute-intensive Convolutional Neural Networks (CNNs) is quickly changing. CNNs are continuously evolving by...  相似文献   

14.
偏转腔工作于超高真空状态,腔中的时变场可以使粒子的运动方向发生偏转,在加速器领域有着广泛的应用.偏转腔根据工作状态,有常温结构和超导结构.本文主要介绍了现有常温和超导偏转腔的主要类型,偏转腔的历史发展及在各个领域的应用.最后,高能物理研究所实验室研制了用于进行束团长度测量的工作于TM210模式的偏转腔,此偏转腔工作频率...  相似文献   

15.
16.
《Spectrum, IEEE》2003,40(1):40-43
For corporations the world over, the tech bubble of the late 1990s was an orgy of excess, which, like all parties that go on too long and involve far too much consumption, ended in a brutal hangover. Information technology (IT) departments simply bought too many servers, storage devices, and PCs in preparation for Y2K, the introduction of the euro, and an e-commerce bonanza that, like an absinthe-induced hallucination, seemed very real at the time, but vanished following the dot-com crash. Overall, the IT market is maturing its way to sustainable, albeit unspectacular, growth. The paper considers how system complexity is driving customers and vendors to seek solace and solutions in software.  相似文献   

17.
We consider the problem of establishing a route and sending packets between a source/destination pair in ad hoc networks composed of rational selfish nodes whose purpose is to maximize their own utility. In order to motivate nodes to follow the protocol specification, we use side payments that are made to the forwarding nodes. Our goal is to design a fully distributed algorithm such that (1) a node is always better off participating in the protocol execution (individual rationality), (2) a node is always better off behaving according to the protocol specification (truthfulness), (3) messages are routed along the most energy-efficient (least cost) path, and (4) the message complexity is reasonably low. We introduce the COMMIT protocol for individually rational, truthful, and energy-efficient routing in ad hoc networks. To the best of our knowledge, this is the first ad hoc routing protocol with these features. COMMIT is based on the VCG payment scheme in conjunction with a novel game-theoretic technique to achieve truthfulness for the sender node. By means of simulation, we show that the inevitable economic inefficiency is small. As an aside, our work demonstrates the advantage of using a cross-layer approach to solving problems: Leveraging the existence of an underlying topology control protocol, we are able to simplify the design and analysis of our routing protocol and reduce its message complexity. On the other hand, our investigation of the routing problem in the presence of selfish nodes disclosed a new metric under which topology control protocols can be evaluated: the cost of cooperation.  相似文献   

18.
直接射频采样技术是数字接收机的发展新趋势。由于 ADC 器件的水平限制,直接射频采样技术在接收机中的应用受到很大的限制。采用 SHA(采样保持器)+ADC 的系统结构,设计了一种支持超宽带信号输入的数字接收机,实现了射频信号的直接采样。简述了采样保持器的工作原理,介绍了直接射频采样数字接收机的系统组成,详细介绍了数据采集子板的设计。综合 FPGA 分析工具 CHIPSCOPE 与MATLAB 软件,对数字接收机进行了测试和指标分析。结果表明,该数字接收机在采样保持器带宽范围内,可以满足常规指标要求,简化了系统设计,降低了成本,具有一定的应用价值。  相似文献   

19.
设计了一种家庭网关设备,可以实现PON(无源光网络)接入,同时提供有线上网和无线覆盖(WiFi 2.4 GHz 5.8 GHz)、电话、视频服务。用一个设备解决多个家庭设备的联网问题,而且速度较快。在周围用户较多的情况下,能够有较好地用户体验。由于此设备支持大功率WiFi11ac功能,输出功率达20 dBm,可以在2.4 GHz信号密集区域有效避开干扰,同时提供高达1.3 Gbit·s-1的连接速率,可以满足多用户大数据量传输的要求。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号