首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described.
Joseph LancasterEmail:
  相似文献   

2.
Heterogeneous reconfigurable systems provide drastically higher performance and lower power consumption than traditional CPU-centric systems. Moreover, they do it at much lower costs and shorter times to market than non-reconfigurable hardware solutions. They also provide the flexibility that is often required for the engineering of modern robust and adaptive systems. Due to their heterogeneity, flexibility and potential for highly optimized application-specific instantiation, reconfigurable systems are adequate for a very broad class of applications across different industry sectors. What prevents the reconfigurable system paradigm from a broad proliferation is the lack of adequate development methodologies and electronics design tools for this kind of systems. The ideal would be a seamless compilation of a high-level computation process specification into an optimized mixture of machine code executed on traditional CPU-centric processors and on the application-specific decentralized parallel data-flow-dominated reconfigurable processors and hardware accelerators. Although much research and development in this direction was recently performed, the adequate methodologies and tools necessary to implement this compilation process as an effective and efficient hardware/software co-synthesis flow are unfortunately not yet in place. This paper focuses on the recent developments and development trends in the design methods and synthesis tools for reconfigurable systems. Reconfigurable system synthesis performs two basic tasks: system structure construction and application process mapping on the structure. It is thus more complex than standard (multi-)processor-based system synthesis for software-programmable systems that only involves application mapping. The system structure construction may involve the macro-architecture synthesis, the micro-architecture synthesis, and the actual hardware synthesis. Also, the application process mapping can be more complicated and dynamic in reconfigurable systems. This paper reviews the recent methods and tools for the macro- and micro-architecture synthesis, and for the application mapping of reconfigurable systems. It puts much attention to the relevant and currently hot topic of (re-)configurable application-specific instruction set processors (ASIP) synthesis, and specifically, ASIP instruction set extension. It also discusses the methods and tools for reconfigurable systems involving CPU-centric processors collaborating with reconfigurable hardware sub-systems, for which the main problem is to decide which computation processes should be implemented in software and which in hardware, but the hardware/software partitioning has to account for the hardware sharing by different computation processes and for the reconfiguration processes. The reconfigurable system area is a very promising, but quite a new field, with many open research and development topics. The paper reviews some of the future trends in the reconfigurable system development methods and tools. Finally, the discussion of the paper is summarized and concluded.  相似文献   

3.
In this paper, we propose a methodology for accelerating application segments by partitioning them between reconfigurable hardware blocks of different granularity. Critical parts are speeded-up on the coarse-grain reconfigurable hardware for meeting the timing requirements of application code mapped on the reconfigurable logic. The reconfigurable processing units are embedded in a generic hybrid system architecture which can model a large number of existing heterogeneous reconfigurable platforms. The fine-grain reconfigurable logic is realized by an FPGA unit, while the coarse-grain reconfigurable hardware by our developed high-performance data-path. The methodology mainly consists of three stages; the analysis, the mapping of the application parts onto fine and coarse-grain reconfigurable hardware, and the partitioning engine. A prototype software framework realizes the partitioning flow. In this work, the methodology is validated using five real-life applications. Analytical partitioning experiments show that the speedup relative to the all-FPGA mapping solution ranges from 1.5 to 4.0, while the specified timing constraints are satisfied for all the applications.  相似文献   

4.
Dynamic and Partial FPGA Exploitation   总被引:1,自引:0,他引:1  
Today's field programmable gate array (FPGA) architectures, like Xilinx's Virtex-II series, enable partial and dynamic run-time self-reconfiguration. This feature allows the substitution of parts of a hardware design implemented on this reconfigurable hardware, and therefore, a system can be adapted to the actual demands of applications running on the chip. Exploiting this possibility enables the development of adaptive hardware for a huge variety of applications. A novel method for communication interfaces using look up table (LUT)-based communication primitives enables an exact separation of reconfigurable parts and a fast and intelligent bus-system. A new adaptive software/hardware reconfigurable system is presented in this paper, using a real application in the automotive domain implemented on a Xilinx Virtex-II 3000 FPGA to present results  相似文献   

5.

In order to improve the search performance of rich text content, a cloud search engine system based on rich text content is designed. On the basis of traditional search engine hardware system, several hardware devices such as Solr index server, collector, Chinese word segmentation device and searcher are installed, and the data interface is adjusted. On the basis of hardware equipment and database support, this paper uses the open source Apache Tika framework to obtain the metadata of rich text documents, implements word segmentation according to the rich text content and semantics, and calculates the weight of each keyword. Input search keywords, establish a text index, use BM25 algorithm to calculate the similarity between keywords and text, and output the search results of rich text according to the similarity calculation results. The experimental results show that the design system has high recall rate, high throughput, and the construction time of each data item index in different files is short, which improves the search efficiency and search accuracy.

  相似文献   

6.
In this paper, we propose a method for speeding-up Digital Signal Processing applications by partitioning them between the reconfigurable hardware blocks of different granularity and mapping critical parts of applications on coarse-grain reconfigurable hardware. The reconfigurable hardware blocks are embedded in a heterogeneous reconfigurable system architecture. The fine-grain part is implemented by an embedded FPGA unit, while for the coarse-grain reconfigurable hardware our developed high-performance coarse-grain data-path is used. The design flow mainly consists of three steps; the analysis procedure, the mapping onto coarse-grain blocks, and the mapping onto the fine-grain hardware. In this work, the methodology is validated using five real-life applications; an OFDM transmitter, a medical imaging technique, a wavelet-based image compressor, a video compression scheme and a JPEG encoder. The experimental results show that the speedup, relative to an all-FPGA solution, ranges from 1.55 to 4.17 for the considered applications.  相似文献   

7.
针对目前PC算法无法实现图像实时处理以及固定硬件平台很难实现算法修改或者升级的问题,设计一种基于SOPC可重构的图像采集与处理系统,实现了图像数据的片上实时处理以及在不改变硬件电路结构而完成算法修改或者升级的功能。此系统围绕两块Xilinx FPGA芯片进行设计,通过FPGA以及其Microblaze 32 bit软核处理器和相关接口模块实现硬件电路设计,结合FPGA开发环境ISE工具和EDK工具协作完成软件设计。由于采用SOPC技术和可重构技术,此设计具有设计灵活、处理速度快和算法可灵活升级等特点。  相似文献   

8.
The process of DNA sequence matching and database search is one of the major problems of the bioinformatics community. Major scientific efforts to address this problem have provided algorithms and software tools for molecular biologists since the early 1970s. At the algorithmic and software level BLAST is by far the most popular tool. It has been developed and continues to be maintained and distributed by the NCBI organization. The BLAST algorithm and software is computationally very intensive and as a result several computer vendors use it as a benchmark. On the other hand no systematic approach for hardware speedup of BLAST and its variants for different query and database size has been reported to date. In this paper we present our architecture that implements the BLAST algorithm for all of its major versions, and for any size of database and query. The system has been fully designed and partially implemented with reconfigurable logic. It consists of software and hardware parts and achieves a speedup of several times up to thousands of times vs general purpose computers.
Apostolos DollasEmail:
  相似文献   

9.
Many radar sensor systems demand high performance front-end signal processing. The high processing throughput is driven by the fast analog-to-digital conversion sampling rate, the large number of sensor channels, and stringent requirements on the filter design leading to a large number of filter taps. The computational demands range from tens to hundreds of billion operations per second (GOPS). Fortunately, this processing is very regular, highly parallel, and well suited to VLSI hardware. We recently fielded a system consisting of 100 GOPS designed using custom VLSI chips. The system can adapt to different filter coefficients as a function of changes in the transmitted radar pulse. Although the computation is performed on custom VLSI chips, there are important reasons to attempt to solve this problem using adaptive computing devices. As feature size shrinks and field programmable gate arrays become more capable, the same filtering operation will be feasible using reconfigurable electronics. In this paper we describe the hardware architecture of this high performance radar signal processor, technology trends in reconfigurable computing, and present an alternate implementation using emerging reconfigurable technologies. We investigate the suitability of a Xilinx Virtex chip (XCV1000) to this application. Results of simulating and implementing the application on the Xilinx chip is also discussed.  相似文献   

10.
This paper describes the acceleration of an infrared automatic target recognition (IR ATR) application with a co-processor board that contains multiple field programmable gate array (FPGA) chips. Template and pixel level parallelism is exploited in an FPGA design for the bottleneck portion of the application. The implementation of this design achieved a speedup of 21 compared to running on the host processor. The paper then describes an FPGA resource manager (RM) developed to support concurrent applications sharing the FPGA board. With the RM, the system is dynamically reconfigurable. That is, while part of the co-processor board is busy computing, another part can be reconfigured for other purposes. The IR ATR application was ported to work with the RM and has been shown to adapt to the amount of reconfigurable hardware that is available at the time the application is executed.  相似文献   

11.
Scientific application kernels mapped to reconfigurable hardware have been reported to have 10times to 100times speedup over equivalent software. These promising results suggest that reconfigurable logic might offer significant speedup on applications in science and engineering. To accurately assess the benefit of hardware acceleration on scientific applications, however, it is necessary to consider the entire application including software components as well as the accelerated kernels. Aspects to be considered include alternative methods of hardware/software partitioning, communications costs, and opportunities for concurrent computation between software and hardware. Analysis of these factors is beyond the scope of current automatic parallelizing compilers. In this paper, a case study is presented in which a simulation of metropolitan road traffic networks is mapped onto a reconfigurable supercomputer, the Cray XD1. Five different methods are presented for mapping the application onto the combined hardware/software system. An approach for approximating the performance of each method is derived through analytic equations. Our results, both analytically and empirically, show that key predictors of performance (which are often not considered in reported speedup of kernel operations) are not necessarily maximum parallelism, but must account for the fraction of the problem that runs on the reconfigurable logic and the amount data flow between software and hardware.  相似文献   

12.
This paper presents performance improvements and energy savings from mapping real-world benchmarks on an embedded single-chip platform that includes coarse-grained reconfigurable logic with a microprocessor. The reconfigurable hardware is a 2-D array of processing elements connected with a mesh-like network. Analytical results derived from mapping seven real-life digital signal processing applications, with the aid of an automated design flow, on six different instances of the system architecture are presented. Significant overall application speedups relative to an all-software solution, ranging from 1.81 to 3.99 are reported being close to theoretical speedup bounds. Additionally, the energy savings range from 43% to 71%. Finally, a comparison with a system coupling a microprocessor with a very long instruction word core shows that the microprocessor/coarse-grained reconfigurable array platform is more efficient in terms of performance and energy consumption.  相似文献   

13.
14.
着眼于我国航天测控系统安全可控的发展新要求,在充分论证研究的基础上提出基于国产软硬件平台,采用构件和服务总线技术设计开发灵活便捷、自主可控的航天测量船测控仿真系统。文中描述了系统功能需求,提出了可重构的系统体系结构,重点论述了服务总线技术实现和船载测控设备外测数据仿真模型设计。应用结果证明该系统设计合理、灵活通用,能够满足航天测量船测控系统全面验证需求。  相似文献   

15.
吴将  朱志宇  沈舒 《电视技术》2014,38(7):50-53,44
针对现有可重构计算硬件平台配置时间长、灵活性受限的缺陷问题,介绍了一种基于PC机的FPGA可重构硬件平台结构的设计方法,该结构允许PCI总线快速重构,整个系统的硬件设计可以按以下两个部分进行设计:固定部分和可重构部分。最后在FPGA资源上的验证结果表明该设计能够有效实现FPGA的硬件重构,而且其物理硬件设计简单。  相似文献   

16.
Day after day, embedded systems add more compute-intensive applications inside their end products: cryptography or image and video processing are some examples found in leading markets like consumer electronics and automotive. To face up these ever-increasing computational demands, the use of hardware accelerators synthesized in field-programmable gate arrays (FPGA) lets achieve processing speedups of orders of magnitude versus their counterpart CPU-based software approaches. However, the inherent increment in physical resources penalizes in cost. To address this issue, dynamically reconfigurable hardware technology definitively reached its maturity. SRAM-based reconfigurable logic goes beyond the classical conception of static hardware resources distributed in space and held invariant for the entire application life cycle; it provides a new design abstraction featured by the temporal partitioning of such resources to promote their continuous reuse, reconfiguring them on the fly to play a different role in each instant. This new computing paradigm lets balance the design of embedded applications by partitioning their functionality in space and time—through a series of mutually-exclusive processing tasks synthesized multiplexed in time on the same set of resources—and achieving thus cost savings in both area and power metrics. However, the exploitation of this system versatility requires special attention to avoid performance degradation. Such technical aspects are addressed in this work intended to be a survey on reconfigurable hardware technology and aimed at defining an open, standard and cost-effective system architecture driven by flexible coprocessors instantiated on demand on reconfigurable resources of an FPGA. This concept fits well with the functional features demanded to many embedded applications today and its feasibility has been proved with a state-of-the-art commercial SRAM-based FPGA platform. The achieved results highlight dynamic partial reconfiguration as a potential technology to lead the next computing wave in the industry.  相似文献   

17.
针对目前国内二次电池检测设备的落后现状,提出了一种基于ARM9嵌入式微处理器和复杂可编程逻辑器件(CPLD)可重构技术的嵌入式智能检测控制系统的硬件设计方法,利用CPLD的内部可编程的特性,解决了嵌入式系统因为控制逻辑变化而在硬件电路上必须作相应变化的问题。实验结果表明,该方法一方面可以提升硬件电路系统的通用性和柔性,另一方面可以提高开发效率、缩短产品开发周期。  相似文献   

18.
Embedded systems present significant security challenges due to their limited resources and power constraints. This paper focuses on the issues of building secure embedded systems on reconfigurable hardware and proposes a security architecture for embedded systems (SAFES). SAFES leverages the capabilities of reconfigurable hardware to provide efficient and flexible architectural support for security standards and defenses against a range of hardware attacks. The SAFES architecture is based on three main ideas: (1) reconfigurable security primitives; (2) reconfigurable hardware monitors; and (3) a hierarchy of security controllers at the primitive, system and executive level. Results are presented for reconfigurable AES and RC6 security primitives and highlight the value of such an architecture. This paper also emphasizes that reconfigurable hardware is not just a technology for hardware accelerators dedicated to security primitives as has been focused on by most studies but a real solution to provide high-security and high-performance for a system.  相似文献   

19.
Reconfigurable array processors have emerged as powerful solution to speed up computationally intensive applications. However, they may suffer from a data access bottleneck as the frequency of memory access rises. At present, the distributed cache design in the reconfigurable array processor has a large cache failure rate, and the frequent access to external memory leads to a long delay in memory access. To mitigate this problem, we present a Runtime Dynamically Migration Mechanism (RDMM) of distributed cache for reconfigurable array processor based on the feature of obvious locality and high parallelism in accessing data. This mechanism allows temporary, static data to be dynamically scheduled to migrate data with a high access frequency from the remote cache to the processor's local migration storage table based on how often the reconfigurable array processors access the remote cache. We can accurately get the data on the shortest path by way of data search strategy based on migration storage tables, thereby effectively reducing the access delay of the entire system, increasing the memory bandwidth of the reconfigurable array processor. We leverage the hardware platform of reconfigurable array processor to test the proposed mechanism. The experimental results show that RDMM reduces access delay by up to 35.24% compared with the tradition distributed cache at the highest conflict rate. And compared with the Ref.[19], Ref.[20], Ref.[21] and Ref.[23], the working frequency can be increased by 15%, the hit rate can be increased by 6.1%, and the peak bandwidth can be increased by about 3×.  相似文献   

20.
文章介绍了一款新型可重构SoC电路,较详细地描述了它的内部结构和特点,并制定应用方案,分别重构SPI和DDS模块,对该电路进行验证.应用方案中,利用SPI与VS1003连接,通过该SPI接口控制并发送歌曲数据给VS1003,VS1003对数据进行解码处理,最后驱动功放播放歌曲.利用DDS模块产生信号数据,经过D/A转换...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号