期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Research Challenges for On-Chip Interconnection Networks 总被引：2，自引：0，他引：2

Owens J.D. Dally W.J. Ho R. Jayasimha D.N. Keckler S.W. Li-Shiuan Peh 《Micro, IEEE》2007,27(5):96-108

On-chip interconnection networks are rapidly becoming a key enabling technology for commodity multicore processors and SoCs common in consumer embedded systems, the National Science Foundation initiated a workshop that addressed upcoming research issues in OCIN technology, design, and implementation and set a direction for researchers in the field. 相似文献

2.

Guest Editors' Introduction: The Network-on-Chip Paradigm in Practice and Research

Ivanov A. De Micheli G. 《Design & Test of Computers, IEEE》2005,22(5):399-403

The network-on-chip paradigm is an emerging paradigm that effectively addresses and presumably can overcome the many on-chip interconnection and communication challenges that already exist in today's chips or will likely occur in future chips. Effective on-chip implementation of network-based interconnect paradigms requires developing and deploying a whole new set of infrastructure IPs and supporting tools and methodologies. This special issue illustrates how, to date, engineers have successfully deployed NoCs to meet certain very-aggressive specifications. At the same time, the articles reveal many issues and challenges that require solutions if the NoC paradigm will indeed become a panacea or quasi-panacea for tomorrows SoCs. 相似文献

3.

Design and implementation of high-speed buffered crossbars with efficient load balancing for multi-core SoCs

George Kornaros Theofanis Orphanoudakis 《Microprocessors and Microsystems》2010,34(7-8):301-315

A large increase of the number of devices integrated in a single chip in conjunction with the significant demands of modern applications for performance has led the designers to a system development methodology based on integrating multiple pre-verified intellectual property cores. Yet, design productivity requirements push designers to focus on key micro-architectural solutions to manage more efficiently the scaling of multi-core SoCs as well as to increase the degree of design automation, particularly as rapid prototyping using reconfigurable computing is becoming mainstream. In this paper we present a novel interconnect architecture based on optimized components to efficiently manage SoCs that follow either a multi-core based approach or are built to support SIMD-style applications that can exploit the processing power of a pool of hardware resources; first we analyze the design of a crossbar featuring shared-memory combined input-crosspoint buffering as a solution for efficient implementation of on-chip interconnection; second we describe the design of a load-balancer featuring configurable proportional allocation of on-chip resources and in-order delivery as a solution for efficient scheduling and execution of processing tasks. The main focus of the paper is to describe and evaluate the mechanisms designed to distribute and manage data transfers so as to implement an efficient interconnection of the integrated cores and control access to available (either on-chip or off-chip) resources for the implementation of a number of embedded systems and applications. Each of these challenges is handled by the proposed architecture in an efficient way in terms of performance, cost in silicon and flexibility. 相似文献

4.

CuNoC: A dynamic scalable communication structure for dynamically reconfigurable FPGAs

S. Jovanović C. Tanougast C. Bobda S. Weber 《Microprocessors and Microsystems》2009,33(1):24-36

The growing complexity of integrated circuits imposes to the designers to change and direct the traditional bus-based design concepts towards NoC-based. Networks on-chip (NoCs) are emerging as a viable solution to the existing interconnection architectures which are especially characterized by high level of parallelism, high performances and scalability. The already proposed NoC architectures in the literature are destined to System-on-chip (SoCs) designs. For a FPGA-based system, in order to take all benefits from this technology, the proposed NoCs are not suitable. In this paper, we present a new paradigm called CuNoC for intercommunication between modules dynamically placed on a chip for the FPGA-based reconfigurable devices. The CuNoC is based on a scalable communication unit characterized by unique architecture, arbitration policy base on the priority-to-the-right rule and modified XY adaptive routing algorithm. The CuNoC is namely adapted and suited to the FPGA-based reconfigurable devices but it can be also adapted with small modifications to all other systems which need an efficient communication medium. We present the basic concept of this communication approach, its main advantages and drawbacks with regards to the other main already proposed NoC approaches and we prove its feasibility on examples through the simulations. Performance evaluation and implementation results are also given. 相似文献

5.

Managing Security in FPGA-Based Embedded Systems

Huffmire Ted Brotherton Brett Sherwood Timothy Kastner Ryan Levin Timothy Nguyen Thuy D. Irvine Cynthia 《Design & Test of Computers, IEEE》2008,25(6):590-598

FPGAs combine the programmability of processors with the performance of custom hardware. As they become more common in critical embedded systems, new techniques are necessary to manage security in FPGA designs. This article discusses FPGA security problems and current research on reconfigurable devices and security, and presents security primitives and a component architecture for building highly secure systems on FPGAs. 相似文献

6.

How to make your own processor architecture (review of Processor Design: System-on-Chip Computing for ASICs and FPGAs by Nurmi, J., Ed.; 2007) [Book reviews]

Davidson Scott 《Design & Test of Computers, IEEE》2008,25(1):96-98

This is a review of Processor Design: System-on-Chip Computing for ASICs and FPGAs (edited by Jari Nurmi). Because processors are now embedded in SoCs and programmable devices, a system designer is not limited to chips available from major manufacturers. The theory is that a system built of specialized processors will be more efficient, and this book covers a wide range of such customized computer architectures. The book includes three main types of chapters. The first group consists of background information; the second focuses on stages of the processor design process; and the third includes examples of architecture types and experimental architectures, mostly from universities. 相似文献

7.

A Self Distributing Virtual Machine for Adaptive Multicore Environments

Jan Haase Andreas Hofmann Klaus Waldschmidt 《International journal of parallel programming》2010,38(1):19-37

The use of parallel systems is no longer limited to dedicated clusters as multicore chips are more and more appearing in embedded applications. To meet power, performance and cost targets these systems need to be adaptive. The reconfiguration features of recent FPGAs make new approaches for this type of parallel computing possible: Dynamic reconfiguration at runtime offers an important step to adaptive behavior of systems-on-chip (SoCs). This article analyzes the challenges of such an adaptive SoC. It is shown that many of the requirements for an adaptive FPGA-realization are met by the SDVM, the scalable dataflow-driven virtual machine which has been successfully implemented and tested on a cluster of workstations. The SDVM has evolved to a virtualization layer for multicore-FPGAs, now called SDVM^R. This virtualization layer allows a transparent runtime-reconfiguration of the underlying hardware to adapt to the changing system environment. Results for a basic application for both systems are presented. 相似文献

8.

A scalable and fault-tolerant network routing scheme for many-core and multi-chip systems

Wen-Chung Tsai Kuo-Chih Chu Yu-Hen Hu Sao-Jie Chen 《Journal of Parallel and Distributed Computing》2012

Current on-chip network and inter-chip interconnection are designed separately. However, this traditional design methodology faces a great challenge: in a multi-chip system, each many-core chip contains hundreds or thousands of processors. The increasing number of on-chip processors must share one input/output unit to interface with the inter-chip interconnection. The increased network usage at the chip interface may create an uneven traffic load in the on-chip network. That is, traffic jams could occur in the chip area around the input/output unit. New technologies, such as through silicon via and silicon interposer, can directly connect networks on chips. These technologies can improve communication performance and reduce power consumption by omitting the input/output unit. This paper proposes a novel routing scheme to deal with the network scalability issues related to the many-core and multi-chip system-in-package paradigm. The proposed scheme can also enhance the fault-tolerance of nano-scale communication in deep-submicron designs. 相似文献

9.

Novel interconnection technology for heterogeneous integration of MEMS�CLSI multi-chip module

Kang-Wook Lee Mitsumasa Koyanagi 《Microsystem Technologies》2010,16(3):441-447

We developed novel interconnection technology for heterogeneous integration of MEMS and LSI multi-chip module, in which MEMS and LSI chips would be horizontally integrated on substrate and vertically stacked each others. The cavity chip composed of deep Cu TSV-beam lead interconnections was developed for interconnecting MEMS chips with high step height of more than few hundreds micrometer without the degradation of sensing elements. Fundamental characteristics were successfully obtained from pressure sensing MEMS chip with 360 μm thickness, which was connected to Si substrate by the cavity chip. MEMS and LSI chips were vertically integrated by using the cavity chip without any changing of chip design and extra processes. This interconnection technology can give strong solution for heterogeneous integration of MEMS and LSI chips multi-chip module. 相似文献

10.

MMNNN: A tree-based Multicast Mechanism for NoC-based deep Neural Network accelerators

《Microprocessors and Microsystems》2021

Network-on-Chip (NoC) devices have been widely used in multiprocessor systems. In recent years, NoC-based Deep Neural Network (DNN) accelerators have been proposed to connect neural computing devices using NoCs. Such designs dramatically reduce off-chip memory accesses of these platforms. However, the large number of one-to-many packet transfers significantly degrade performance with traditional unicast channels. We propose a multicast mechanism for a NoC-based DNN accelerator called Multicast Mechanism for NoC-based Neural Network accelerator (MMNNN). To do so, we propose a tree-based multicast routing algorithm with excellent scalability and the ability to minimize the number of packets in the network. We also propose a router architecture for single-flit packets. Our proposed router transfers flits to multiple destinations in a single process and has no head-of-line blocking issue, offering higher throughput and lower latency than traditional wormhole router architectures. Simulation results show that our proposed multicast mechanism offers excellent performance in classification latency, average packet latency, and energy consumption. 相似文献

11.

A generic FPGA prototype for on-chip systems with network-on-chip communication infrastructure

Mohammad Arjomand Amirali Boroumand Hamid Sarbazi-Azad 《Computers & Electrical Engineering》2014

As System-on-Chips (SoCs) grow in complexity and size, proposals of networks-on-chip (NoCs) as the on-chip communication infrastructure are justified by reusability, scalability, and energy efficiency provided by the interconnection networks. Simulation and mathematical analysis offer flexibility for the evaluations under various network configurations. However, the accuracy of such analyzing methods largely depends on the approximations made. On the other hand, prototyping can be used to improve the evaluation accuracy by bringing the design closer to reality. In this paper, we propose a FPGA prototype that is general enough to model different video-processing SoCs where different cores communicate via NoC. To model NoC, we accurately implement a fully-synthesized on-chip router supporting multiple virtual channels. For the processing nodes, on the other side, we propose a general and simple traffic generator capable of modeling different synthetic functions (i.e. Poisson and self-similar). Indeed, the application traffic is modeled using 1-D hybrid cellular automata which can effectively generate high quality pseudorandom patterns. Finally, for the energy efficiency, the proposed prototype is capable to support multiple frequency regions. To realize the voltage–frequency island partitioned SoC, we use the utilities that Xilinx FPGA platform offers to design Globally Synchronous Locally Asynchronous (GALS) systems via Delay-Locked Loop elements. 相似文献

12.

PICO: automatically designing custom computers 总被引：1，自引：0，他引：1

《Computer》2002,35(9):39-47

The paper discusses the PICO (program in, chip out) project, a long-range HP Labs research effort that aims to automate the design of optimized, application-specific computing systems - thus enabling the rapid and cost-effective design of custom chips when no adequately specialized, off-the-shelf design is available. PICO research takes a systematic approach to the hierarchical design of complex systems and advances technologies for automatically designing custom nonprogrammable accelerators and VLIW processors. While skeptics often assume that automated design must emulate human designers who invent new solutions to problems, PICO's approach is to automatically pick the most suitable designs from a well-engineered space of designs. Such automation of embedded computer design promises an era of yet more growth in the number and variety of innovative smart products by lowering the barriers of design time, designer availability, and design cost. 相似文献

13.

A NoC-based simulator for design and evaluation of deep neural networks

《Microprocessors and Microsystems》2020

The astonishing development in the field of artificial neural networks (ANN) has brought significant advancement in many application domains, such as pattern recognition, image classification, and computer vision. ANN imitates neuron behaviors and makes a decision or prediction by learning patterns and features from the given data set. To reach higher accuracies, neural networks are getting deeper, and consequently, the computation and storage demands on hardware platforms are steadily increasing. In addition, the massive data communication among neurons makes the interconnection more complex and challenging. To overcome these challenges, ASIC-based DNN accelerators are being designed which usually incorporate customized processing elements, fixed interconnection, and large off-chip memory storage. As a result, DNN computation involves large memory accesses due to frequent load/off-loading data, which significantly increases the energy consumption and latency. Also, the rigid architecture and interconnection among processing elements limit the efficiency of the platform to specific applications. In recent years, Network-on-Chip-based (NoC-based) DNN becomes an emerging design paradigm because the NoC interconnection can help to reduce the off-chip memory accesses while offers better scalability and flexibility. To evaluate the NoC-based DNN in the early design stage, we introduce a cycle-accurate NoC-based DNN simulator, called DNNoC-sim. To support various operations such as convolution and pooling in the modern DNN models, we first propose a DNN flattening technique to convert diverse DNN operation into MAC-like operations. In addition, we propose a DNN slicing method to evaluate the large-scale DNN models on a resource-constraint NoC platform. The evaluation results show a significant reduction in the off-chip memory accesses compared to the state-of-the-art DNN model. We also analyze the performance and discuss the trade-off between different design parameters. 相似文献

14.

The route to a defect tolerant LUT through artificial evolution

Asbjoern Djupdal Pauline C. Haddow 《Genetic Programming and Evolvable Machines》2011,12(3):281-303

Evolutionary techniques may be applied to search for specific structures or functions, as specified in the fitness function. This paper addresses the challenge of finding an appropriate fitness function when searching for generic rather than specific structures which, when combined wiacteristic of defect tolerance on the circuit. Production defects for integrated circuits are expected to increase considerably. To avoid a corresponding drop in yield, improved defect tolerance solutions are needed. In the case of Field Programmable Gate Arrays (FPGAs), the pre-designed gate array provides a bridge between production and the application designers. Thus, introduction of defect tolerant techniques to the FPGA itself could provide a defect free gate array to the application designer, despite production defects. The search for defect tolerance presented herein is directed at finding defect tolerant structures for an important building block of FPGAs: Look-Up Tables (LUTs). Two key approaches are presented: (1) applying evolved generic building blocks to a traditional LUT design and (2) evolving the LUT design directly. The results highlight the fact that evolved generic defect tolerant structures can contribute to highly reliable circuit designs at the expense of area usage. Further, they show that applying such a technique, rather than direct evolution, has benefits with respect to evolvability of larger circuits, again at the expense of area usage. 相似文献

15.

Power optimization for application-specific networks-on-chips: A topology-based approach

Haytham Elmiligi Ahmed A. Morgan M. Watheq El-Kharashi Fayez Gebali 《Microprocessors and Microsystems》2009,33(5-6):343-355

This paper analyzes the main sources of power consumption in Networks-on-Chip (NoC)-based systems. Analytical power models of global interconnection links are studied at different levels of abstraction. Additionally, power measurement experiments are performed for different types of routers. Based on this study, we propose a new topology-based methodology to optimize the power consumption of complex NoC-based systems at early design phases. The efficiency of the proposed methodology is verified through a case study of an MPEG4 video application. Experimental results show a promising improvement in power consumption (8.55%), average number of hops (10.80%), and number of global links (56.25%) compared to the best known related work. 相似文献

16.

Flexible VLIW processor based on FPGA for efficient embedded real-time image processing

Vincent Brost Fan Yang Charles Meunier 《Journal of Real-Time Image Processing》2014,9(1):47-59

Modern field programmable gate array (FPGA) chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high-density FPGAs, it is now possible to implement a high-performance VLIW (very long instruction word) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient ILP (instruction-level parallelism) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors to shorten the development cycle, and to use the powerful FPGA resources to increase real-time performance. We present a flexible VLIW VHDL processor model with a variable instruction set and a customizable architecture which allows exploiting intrinsic parallelism of a target application using advanced compiler technology and implementing it in an optimal manner on FPGA. Some common algorithms of image processing were tested and validated using the proposed development cycle. We also realized the rapid prototyping of embedded contactless palmprint extraction on an FPGA Virtex-6 based board for a biometric application and obtained a processing time of 145.6 ms per image. Our approach applies some criteria for co-design tools: flexibility, modularity, performance, and reusability. 相似文献

17.

基于无线通信网的桥梁检测系统中转器设计 总被引：1，自引：1，他引：0

杨乐林杨承凯蒋小明陈克寒黄华《计算机工程》2009,35(8):256-258

针对现有桥梁应变多点检测中存在的问题引入无线通信、嵌入式、低功耗等技术,使用C8051微控制器和nRF905无线芯片实现无线组网中转,增加组网能力及测量距离。设计完善的特定无线网络通信协议和基于该协议之上的网络调试接口,使有限的信道得到最大利用,无线数据传输速率得到提高,并增强调试安装测试的灵活性、适应性。该系统不仅能完成传统的静态测试,而且能完成动态测试和单路实时显示。相似文献

18.

OCEAN,a flexible adaptive Network-On-Chip for dynamic applications

Ludovic Devaux Sebastien Pillement 《Microprocessors and Microsystems》2014

The dynamic and partial reconfiguration of FPGAs enables the dynamic placement of applicatives tasks in reconfigurable zones. However, the dynamic management of the tasks impacts the communications since they are not present in the FPGA during all computation time. So, the task manager should ensure the allocation of each new task and their interconnection which is performed by a flexible interconnection network. In this article, various interconnection networks are studied. Each architecture is evaluated with respect to its suitability for the paradigm of the dynamic and partial reconfiguration in FPGA implementations. This study leads us to propose the OCEAN network that supports the communication constraints into the context of dynamic reconfigurations. Thanks to a generic platform allowing in situ characterizations of network performances, fair comparisons of various Networks-On-Chip can be realized. The FPGA and ASICs implementations of the OCEAN network are also discussed. 相似文献

19.

CPS1432交换芯片的串行RapidIO互连技术

张健林锡龙谢江波《单片机与嵌入式系统应用》2014,(12):31-34

RapidIO技术是世界上第一个、也是目前唯一的嵌入式系统互连国际标准(ISO/IEC18372),串行RapidIO是针对高性能嵌入式系统芯片间和板间互连而设计的。本文介绍了CPS1432交换芯片与P2020组成的星型拓扑网络结构,包括硬件设计方案和软件设计要点,对高性能嵌入式互连设计具有很好的借鉴意义。相似文献

20.

一种基于线性反馈移位寄存器的轻量级强PUF 设计

下载免费PDF全文

侯申郭阳李暾李少青《图学学报》2020,41(1):125-131

物理不可克隆函数(PUF),是一种新型硬件安全原语,可以用FPGA 和ASIC 实现, 避免芯片被过度制造和非法克隆。PUF 可以用于安全密钥生成和芯片认证,强PUF 是其中一种重要的分类,强PUF 具有极大的CRP 空间,适用于设备实体的安全认证。经典的以仲裁器PUF 为代表的强PUF 设计面积开销大,唯一性不够理想,难以在一些资源集约的场景,如嵌入式系统和物联网(IoT)设备中应用。为了减小硬件开销,提出了一种新型轻量级强PUF 设计,该设计利用线性反馈移位寄存器对弱PUF 的输出响应进行混淆以获得大量的输出响应,结构简单,易于实现。在28 nm 的FPGA 上实现并评估了该PUF 设计。实验结果表明,该PUF 的随机性为 49.8%,唯一性为50.25%,硬件开销很小。相似文献