首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A heterogeneous multicore system-on-chip (SoC) has been developed for high-definition (HD) multimedia applications that require secure DRM (digital rights management). The SoC integrates three types of processors: two specific-purpose accelerators for cipher and high-resolution video decoding; one general-purpose accelerator (MX); and three CPUs. This is how our SoC achieves high performance and low power consumption with hardware customized for video processing applications that process a large amount of data. To achieve secure data control, hardware memory management and software system virtualization are adopted. The security of the system is the result of the cooperation between the hardware and software on the system. Furthermore, a highly tamper-resistant system is provided on our SiP (System in a package), through DDR2 SDRAMs and a flash memory that contain confidential information in one package. This secure multimedia processor provides a solution to protect contents and to safely deliver secure sensitive information when processing billing transactions that involve digital content delivery. The SoC was implemented in a 90 nm generic CMOS technology.   相似文献   

2.
In this paper, we describe the development of a platform‐based SoC of a 32‐bit smart card. The smart card uses a 32‐bit microprocessor for high performance and two cryptographic processors for high security. It supports both contact and contactless interfaces, which comply with ISO/IEC 7816 and 14496 Type B. It has a Java Card OS to support multiple applications. We modeled smart card readers with a foreign language interface for efficient verification of the smart card SoC. The SoC was implemented using 0.25 µm technology. To reduce the power consumption of the smart card SoC, we applied power optimization techniques, including clock gating. Experimental results show that the power consumption of the RSA and ECC cryptographic processors can be reduced by 32% and 62%, respectively, without increasing the area.  相似文献   

3.
This paper describes a heterogeneous multi-core processor (HMCP) architecture that integrates general-purpose processors (CPUs) and accelerators (ACCs) to achieve exceptional performance as well as low-power consumption for the SoCs of embedded systems. The memory architectures of CPUs and ACCs were unified to improve programming and compiling efficiency. Advanced audio codec-low complexity (AAC-LC) stereo audio encoding was parallelized on a heterogeneous multi-core having homogeneous processor cores and dynamically reconfigurable processor (DRP) ACC cores in a preliminary evaluation of the HMCP architecture. The performance evaluation revealed that 54times AAC encoding was achieved on the chip with two CPUs at 600 MHz and two DRPs at 300 MHz, which achieved encoding of an entire CD within 1- 2 min.  相似文献   

4.
In this paper, a novel ultra-low-power digitally controlled oscillator (DCO) with cell-based design for system-on-chip (SoC) applications is presented. Based on the proposed segmental delay line (SDL) and hysteresis delay cell (HDC), the power consumption can be saved by 70% and 86.2% in coarse-tuning and fine-tuning stages, respectively, as compared with conventional approaches. Besides, the proposed DCO employs a cascade-stage structure to achieve high resolution and wide range at the same time. Measurement results show that power consumption of the proposed DCO can be improved to 140 muW (@200 MHz) with 1.47-ps resolution. In addition, the proposed DCO can be implemented with standard cells, making it easily portable to different processes and very suitable for SoC applications.  相似文献   

5.
This paper shows how a bus topology performs as a System-on-Chip (SoC) interconnection. We measure and analyze Heterogeneous IP Block Interconnection (HIBI) bus for a multiple clock domain, Multiprocessor System-on-Chip (MPSoC) with an MPEG-4 video encoding application on FPGA. The studied MPSoC contains up to 22 IP blocks: 11 soft processors, 8 hardware accelerators and three other components. A novel approach of frequency scaling is used to isolate the impact of various architecture components. The system is benchmarked in various configurations. For example, HIBI is run at 100× speed with respect to processors to resemble ideal interconnection. Based on the measurements with up to 16.9frames/s CIF (352 × 288) encoding speed, estimation for HDTV resolution video encoder is presented. The required optimizations are discussed. Finally, it is shown that 25frames/s 1280 × 720 video encoder needs 55 MHz HIBI but 670 MHz general-purpose soft RISC processors. In practice, the processing performance has to be boosted by implementing hardware acceleration and improving memory hierarchy. Clearly, HIBI is not the limiting factor.  相似文献   

6.
A 32-b 500-MHz 4-1-1-1 operation 4-Mb pipeline burst cache SRAM has been developed. In order to achieve both high bandwidth operation and short latency operation, we developed the following technologies: 1) a prefetched pipeline-burst scheme with double late-write buffers, 2) gate size reduction and a bit-line equalization by source resetting, 3) point-to-point bidirectional coding I/O's to reduce bus noise and power consumption, and 4) a three-level metal 0.25-μm CMOS process technology with six transistor memory cells  相似文献   

7.
非一致Cache体系结构(NUCA)几乎已经成为未来片上大容量Cache的发展方向。本文指出同构单芯片多处理器的设计主要有多级Cache设计的数据一致性问题,核间通信问题与外部总线效率问题,我们也说明多处理器设计上的相应解决办法。最后给出单核与双核在性能、功耗的比较,以及双核处理器的布局规划图。利用双核处理器,二级Cache控制器与AXI总线控制器等IP提出一个可供设计AXI总线SoC的非一致Cache体系结构平台。  相似文献   

8.
A 600-MHz single-chip multiprocessor, which includes two M32R 32-bit CPU cores , a 512-kB shared SRAM and an internal shared pipelined bus, was fabricated using a 0.15-/spl mu/m CMOS process for embedded systems. This multiprocessor is based on symmetric multiprocessing (SMP), and supports modified-exclusive-shared-invalid (MESI) cache coherency protocol. The multiprocessor inherits the advantages of previously reported single-chip multiprocessors, while its multiprocessor architecture is optimized for use as an embedded processor. The internal shared pipelined bus has a low latency and large bandwidth (4.8 GB/s). These features enhance the performance of the multiprocessor. In addition, the multiprocessor employs various low-power techniques. The multiprocessor dissipates 800 mW in a 1.5-V 600-MHz multiprocessor mode. Standby power dissipation is less than 1.5 mW at 1.5 V. Hence, the multiprocessor achieves higher performance and lower power consumption. This paper presents a single-chip multiprocessor architecture optimized for use as an embedded processor and its various low-power techniques.  相似文献   

9.
Microprocessors in today's computers continue to get faster and more powerful. From the Intel 80/spl times/86 series to today's Pentium IV, CPUs have greatly improved the performance. Accordingly, their power consumption has increased dramatically , . To reduce the power loss, an evolution began when the high-performance Pentium processor was driven by a nonstandard, less-than-5-V power supply, instead of drawing its power from the 5-V plane on the system board. In order to provide the power as quickly as possible, the voltage regulator (VR), a dedicated dc-dc converter, is placed in close proximity to power the processor. In the beginning, VRs drew power from the 5-V output of the silver box. As the power delivered through the VR increased so dramatically, it became no longer efficient to use the 5-V bus. Then for desktop and workstation applications, the VR input voltage moved to the 12-V output of the silver box. This trend began when Pentium II processors emerged. Today's Pentium IV processors use 12-V-input VRs.  相似文献   

10.
On-chip L1 and L2 caches represent a sizeable fraction of the total power consumption of microprocessors. In nanometer-scale technology, the subthreshold leakage power is becoming one of the dominant total power consumption components of those caches. In this study, we present optimization techniques to reduce the subthreshold leakage power of on-chip caches assuming that there are multiple threshold voltages, V/sub T/'s, available. First, we show a cache leakage optimization technique that examines the tradeoff between access time and subthreshold leakage power by assigning distinct V/sub T/'s to each of the four main cache components-address bus drivers, data bus drivers, decoders, and static random access memory (SRAM) cell arrays with sense amplifiers. Second, we show optimization techniques to reduce the leakage power of L1 and L2 on-chip caches without affecting the average memory access time. The key results are: 1) two additional high V/sub T/'s are enough to minimize leakage in a single cache-3 V/sub T/'s if we include a nominal low V/sub T/ for microprocessor core logic; 2) if L1 size is fixed, increasing L2 size can result in much lower leakage without reducing average memory access time; 3) if L2 size is fixed, reducing L1 size may result in lower leakage without loss of the average memory access time for the SPEC2K benchmarks; and 4) smaller L1 and larger L2 caches than are typical in today's processors result in significant leakage and dynamic power reduction without affecting the average memory access time.  相似文献   

11.
高效能,低功耗DDR2控制器的硬件实现   总被引:1,自引:0,他引:1  
随着SoC芯片内部总线带宽的需求增加,内存控制器的吞吐性能受到诸多挑战。针对提升带宽性能的问题,可以从两个方面考虑,一个办法是将内存控制器直接跟芯片内部几个主要占用带宽的模块连接,还要能够对多个通道进行智能仲裁,让他们的沟通不必经过内部的AMBA总线,甚至设计者可以利用高效能的AXI总线来加快SoC的模块之间的数据传输。另一个办法就是分析DDR2SDRAM的特性后设计出带有命令调度能力的控制器来减少读写次数,自然就能够降低SoC芯片的功耗,为了节能的考虑还要设计自动省电机制。本文为研究DDR2SDRAM控制器性能的提升提供良好的思路。  相似文献   

12.
A CMOS EDGE baseband and multimedia handset SoC features a dual core (microcontroller and DSP) architecture together with all the necessary interface logic and hardware accelerators interconnected by a multi-layer bus. The DSP memory hierarchy features an instruction cache coupled to a 6-Mbit embedded DRAM instruction memory allowing in the field software flexibility (for example dynamic upgrade of DSP software), while minimizing power and area (closely matching a ROM based solution). The chip is implemented in a 130-nm 6-metal layer CMOS process and is packaged in a 12 /spl times/ 12 ball-grid array. Full chip standby mode current is 690 /spl mu/A (with data retention), resulting in a 500 hour complete GSM/EDGE terminal autonomy.  相似文献   

13.
This paper presents a new data cache design, cache-processor coupling, which tightly binds an on-chip data cache with a microprocessor. Parallel architectures and high-speed circuit techniques are developed for speeding address handling process associated with accessing the data cache. The address handling time has been reduced by 51% by these architectures and circuit techniques. On the other hand, newly proposed instructions increase data cache bandwidth by eight times. Excessive power consumption due to the wide-bandwidth data transfer is carefully avoided by newly developed circuit techniques, which reduce dissipation power per bit to 1/26. Simulation study of the proposed architecture and circuit techniques yields a 1.8 ns delay each for address handling, cache access, and register access for a 16 kilobyte direct mapped cache with a 0.4 μm CMOS design rule  相似文献   

14.
Multiprocessor System on Chips (MPSoCs) are quickly becoming the mainstay in embedded processing platforms due to their hardware and software design flexibility. This flexibility increases the design space for developers, introducing trade-offs between performance and resource/power consumption. This paper presents a comprehensive evaluation of memory customisations for MPSoCs. Custom arrangements of instruction and data cache are presented to optimise off-chip memory consumption and improve system performance. Off-chip memory management and threading are presented to balance the computational load on available processors and improve system performance. The proposed methods are applied to an object detection case study, where performance increases of up to 2.93x are achieved when compared to standard memory designs. Furthermore, the proposed techniques can increase the number of possible processors in an MPSoC by reducing the number of bus interconnects.  相似文献   

15.
This paper proposes a hardware–software (HW-SW) co-simulation framework that provides a unified system-level power estimation platform for analyzing efficiently both the total power consumption of the target SoC and the power profiles of its individual components. The proposed approach employs the trace-based technique that reflects the real-time behavior of the target SoC by applying various operation scenarios to the high-level model of target SoC. The trace data together with corresponding look-up table (LUT) is utilized for the power analysis. The trace data is also used to reduce the number of input vectors required to analyze the power consumption of large H/W designs through the trade-offs between the signal probability in the trace results and its effect on the power consumption. The effect of cache miss on power, occurring in the S/W program execution, is also considered in the proposed framework. The performance of the proposed approach was evaluated through the case study using the SoC design example of IEEE 802.11a wireless LAN modem. The case study illustrated that, by providing fast and accurate power analysis results, the proposed approach can enable SoC designers to manage the power consumption effectively through the reconstruction of the target SoC. The proposed framework maps all hardware IPs into FPGA. The trace based approach gets input vectors at transactor of the each IP and gets power consumption indexing a LUT. This hardware oriented technique reports the power estimation result faster than the conventional ones doing it at S/W level.  相似文献   

16.
Power consumption is an increasingly pressing problem in modern processor design. Since the on-chip caches usually consume a significant amount of power, it is one of the most attractive targets for power reduction. This paper presents a two-level filter scheme, which consists of the L1 and L2 filters, to reduce the power consumption of the on-chip cache. The main idea of the proposed scheme is motivated by the substantial unnecessary activities in conventional cache architecture. We use a single block buffer as the L1 filter to eliminate the unnecessary cache accesses. In the L2 filter, we then propose a new sentry-tag architecture to further filter out the unnecessary way activities in case of the L1 filter miss. We use SimpleScalar to simulate the SPEC2000 benchmarks and perform the HSPICE simulations to evaluate the proposed architecture. Experimental results show that the two-level filter scheme can effectively reduce the cache power consumption by eliminating most unnecessary cache activities, while the compromise of system performance is negligible. Compared to a conventional instruction cache (32 kB, two-way) implemented with only the L1 filter, the use of a two-level filter can result in roughly 30% reduction in total cache power consumption. Similarly, compared to a conventional data cache (32 kB, four-way) implemented with only the L1 filter, the total cache power reduction is approximately 46%.  相似文献   

17.
Dynamic voltage scaling has been widely acknowledged as a powerful technique for trading off power consumption and delay for processors. Recently, variable-frequency (and variable-voltage) parallel and serial links have also been proposed, which can save link power consumption by exploiting variations in the bandwidth requirement. This provides a new dimension for power optimization in a distributed embedded system connected by a voltage-scalable interconnection network. At the same time, it imposes new challenges for variable-voltage scheduling as well as flow control. First, the variable-voltage scheduling algorithm should be able to trade off the power consumption and delay jointly for both processors and links. Second, for the variable-frequency network, the scheduling algorithm should not only consider the real-time constraints, but should also be consistent with the underlying flow control techniques. In this paper, we address joint dynamic voltage scaling for variable-voltage processors and communication links in such systems. We propose a scheduling algorithm for real-time applications that captures both data flow and control flow information. It performs efficient routing of communication events through multihops, as well as efficient slack allocation among heterogeneous processors and communication links to maximize energy savings, while meeting all real-time constraints. Our experimental study shows that on an average, joint voltage scaling on processors and links can achieve 32% less power compared with voltage scaling on processors alone  相似文献   

18.
Cache作为处理器和系统总线之间的桥梁,是芯片功耗的主要来源,低功耗Cache设计在嵌入式芯片设计中具有重要意义.传统Cache设计一般依赖于特定体系结构,难以在不同的系统中进行集成,通用性差.本文提出了一种低功耗高效率的AHB-AXI双总线结构联合Cache的IP设计.实验结果显示,本设计可以显著降低Cache功耗和提高系统性能.  相似文献   

19.
This work presents a reconfigurable mixed-signal system-on-chip (SoC), which integrates switched-capacitor-based field programmable analog arrays (FPAA), analog-to-digital converter (ADC), digital-to-analog converter, digital down converter, digital up converter, 32-bit reduced instruction-set computer central processing unit (CPU) and other digital IPs on a single chip with 0.18 μm CMOS technology. The FPAA intellectual property could be reconfigured as different function circuits, such as gain amplifier, divider, sine generator, and so on. This single-chip integrated mixed-signal system is a complete modern signal processing system, occupying a die area of 7×8 mm2 and consuming 719 mW with a clock frequency of 150 MHz for CPU and 200 MHz for ADC/DAC. This SoC chip can help customers to shorten design cycles, save board area, reduce the system power consumption and depress the system integration risk, which would afford a big prospect of application for wireless communication.  相似文献   

20.
This superscalar microprocessor is the first implementation of a 32-bit RISC architecture specification incorporating a single-instruction, multiple-data vector processing engine. Two instructions per cycle plus a branch can be dispatched to two of seven execution units in this microarchitecture designed for high execution performance, high memory bandwidth, and low power for desktop, embedded, and multiprocessing systems. The processor features an enhanced memory subsystem, 128-bit internal data buses for improved bandwidth, and 32-KB eight-way instruction/data caches. The integrated L2 tag and cache controller with a dedicated L2 bus interface supports L2 cache sizes of 512 KB, 1 MB, or 2 MB with two-way set associativity. At 450 MHz, and with a 2-MB L2 cache, this processor is estimated to have a floating-point and integer performance metric of 20 while dissipating only 7 W at 1.8 V. The 10.5 million transistor, 83-mm2 die is fabricated in a 1.8-V, 0.20-μm CMOS process with six layers of copper interconnect  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号